Classification of Carnatic Thumbnails using CNN-RNN Models

Thesis submitted in partial fulfillment of the requirements for the degree of

Master of Science in in Computer Science and Engineering by Research

by

Amulya Sri Pulijala 201450827 [email protected]

International Institute of Information Technology Hyderabad - 500 032, May 2021 Copyright © Amulya Sri Pulijala, 2021 All Rights Reserved International Institute of Information Technology Hyderabad, India

CERTIFICATE

It is certified that the work contained in this thesis, titled “Classification of Carnatic Thumb- nails using CNN-RNN Models” by Amulya Sri Pulijala, has been carried out under my supervision and is not submitted elsewhere for a degree.

Date Adviser: Dr. Suryakanth V Gangashetty To my Mom and Grandparents Acknowledgements

om ajn˜ana-timir¯ andhasya¯ jn˜an¯ a¯njana-˜ sal´ akay¯ a¯ chaksur unm¯ılitam yena tasmai sr´ ¯ı-gurave namah

Translation: I offer obeisance unto Sr´ ¯ı Guru, who has opened my eyes, which were blinded by the cataract of ignorance, with the collyrium of knowledge. I would like to thank my supervisor Dr. Suryakanth V Gangashetty for his support and care throughout my Masters’ journey. I cannot thank him enough for his unrestricted backing and bearing with me in spite of taking so much time to do every single task. Huge gratitude and respect for his simplicity and timely help. I also admire him for his patience in bearing with me and in guiding me accordingly. I would like to thank Dr. Venkatesh Choppella who inspired me to pursue masters. I was fortunate to attend the classes of Prof. Yegnanarayana Bayya, a phenomenal teacher who changed the way perceive research. I am grateful to Dr. Sakthi Balan and Aditya whose interactions were always encouraging and fruitful. On a personal note, I would like to thank my peers and friends who made my life easy at each and every step. I would like to extend my heartfelt gratitude to Ramakrishna Sir, CVRS Sastry Sir, Narayan Rao Sir from Indian Space Research Organisation (ISRO) who were encouraging me constantly to pursue research. I would like to thank the management of ISRO for giving me permission to complete masters. This thesis would not have been possible without the support of my mother , grandparents BSSSG Murthy and Saroja, husband Phani Mahesh and little daughter Vnnela. I extend my heartful thanks to my mother-in-law Kameswari for her support. Finally, I would like to thank aunt Prameela Vani, Uncle Jayababu and sisters Kavitha and Sahithy for

v vi their continuous support and encouragement. Also grateful to Uncle Phani, aunt Suneetha and brothers Manu and Abhi for their cheer up whenever I feel low. Without their help pursuing masters would have been a dream. This family is a unique gift that am bestowed with and am always grateful for their cooperation in every phase of my life.

At last, I would like to thank Lord almighty and my spiritual guide who taught me ’Bhagavad Gita’ which is the main reason for what am today. I bow my head to his compassionate gesture for teaching me the value and purpose of life.

Amulya Sri Pulijala Abstract

Are repetitive parts representative too in music?

Music signal processing is a sub branch in signal processing which is a promising area these days. Music analysis based on signal processing techniques paved a new way of generation and analysis of music. Music analysis and recognition of various swaras and is inherent to human understanding. Music signal processing include various areas of research such as Synthesis of Music, Transcription, Classification, Music Information Retrieval, Classification, Tala Classification, Instrument/ Voice Identification, Audio Matching, Source Separation, Tonic Identification, Intonation/melodic/rhythmic analysis, Music emotion recognition etc.

The concept of Raga and Tala is integral part of . Raga is the melodic component while Tala is the rhythmic component in the music. Hence, classification and identification of Raga and tala is a paramount problem in the area of Music Information Retrieval (MIR) systems. Although there are seven basic Talas in , a further subdivision of them gives a total of 175 ragas. There are 72 melakartha ragas and more than thousand ragas. Statistical and machine learning approaches are proposed in Literature Survey to classify Ragas and Talas. However, they use complete musical recording for training and testing. As part of this thesis, a novel approach is proposed for the first time in Carnatic music to classify Carnatic music recordings using repetitive structure called Thumbnails. We proposed a parallel CNN-RNN models to classify Ragas and Talas in Carnatic music using ’Thumbnails’.

vii viii

Keywords: Music Signal Processing, Music Information Retrieval, Carnatic Music, Hin- dustani Music, Raga Classification, Tala Classification, Melody/Intonation analysis, Source Separation, Audio Thumbnails, SVM, CNN, RNN Contents

Chapter Page

Abstract ...... vii

1 Introduction to Indian Classical Art and Audio Thumbnailing ...... 1 1.1 Indian Art Music Traditions ...... 1 1.1.1 Swara ...... 2 1.1.2 Raga ...... 3 1.1.3 Tala ...... 5 1.1.3.1 Tala Schemes in Carnatic Music ...... 6 1.1.3.2 Saptha Tala System ...... 6 1.1.4 Tonic ...... 6 1.1.5 Carnatic Concert ...... 7 1.1.6 Classification of Indian Musical Instruments ...... 7 1.2 Machine Learning and Neural Networks ...... 9 1.2.1 Programming Vs Learning ...... 9 1.2.2 Supervised Learning ...... 9 1.2.3 Unsupervised Learning ...... 9 1.3 Deep Learning ...... 10 1.4 Audio Thumbnailing ...... 11 1.5 Motivation and Goals ...... 12 1.6 Thesis Outline ...... 13

2 Literature Survey of Music Signal Processing ...... 15 2.1 Introduction ...... 15 2.2 Areas of Research and Related Work ...... 16 2.2.1 Related Work With Respect to Source Separation ...... 16 2.2.2 Emotion Recognition ...... 18 2.2.3 Raga Classification ...... 18 2.2.4 Tala Classification ...... 19 2.2.5 Intonation/Rhythmic Analysis ...... 20 2.2.6 Tonic Identification ...... 20 2.2.7 Music Note Representation ...... 21 2.3 Existing methodologies in Audio Thumbnailing ...... 21

ix x CONTENTS

2.4 Audio Classification Techniques ...... 25 2.5 Summary and Conclusions ...... 26

3 Classification and Computation of Audio Thumbnails ...... 27 3.1 Proposed Methodology ...... 27 3.1.1 Computation of Self Similarity Matrix ...... 28 3.1.1.1 Enhancement Strategies ...... 30 3.1.2 Generation of Thumbnails ...... 33 3.1.3 Classification Model ...... 35 3.2 Summary and Conclusions ...... 35

4 Results of Classification of Carnatic Thumbnails using CNN-RNN Models ..... 37 4.1 Experimental Setup ...... 40 4.1.1 Dataset for Raga Classification ...... 40 4.1.2 Dataset for Tala Classification ...... 61 4.2 Summary and Conclusions ...... 66

5 Summary and Conclusions ...... 67 5.1 Future Work ...... 68

Related Publications ...... 69

Bibliography ...... 70 List of Figures

Figure Page

1.1 Classification of Musical Instruments ...... 8 1.2 Machine Learning Application adopted from https://www.guru99.com/machine- learning-tutorial.html ...... 10 1.3 Regression Algorithms ...... 11 1.4 Unsupervised learning Algorithms ...... 12

3.1 Sample Self Similarity Matrix ...... 29 3.2 Chromogram of Song Inta Chala in Adi Talam ...... 30 3.3 Detailed view of Self Similarity Matrix ...... 31 3.4 Self Similarity Matrix - Procedure ...... 32 3.5 Architecture of the neural network classification model ...... 36

4.1 Self Similarity Matrix before Enhancing and Smoothing for Song Inta Chala in ...... 40 4.2 Self Similarity Matrix After Enhancing and Smoothing for Song Inta Chala in Adi Tala ...... 41

xi List of Tables

Table Page

1.1 Table illustrating the note of each swara ...... 3 1.2 Table illustrating twelve swaras ...... 4

2.1 Comparison between existing Thumbnailing Approaches ...... 23

4.1 Table Describing Dataset for Raga Classification ...... 42 4.2 Table Describing Dataset for Raga ...... 43 4.3 Table describing Dataset for Raga Darbari ...... 43 4.4 Table describing Dataset for Raga Darbari ...... 43 4.5 Table describing Dataset for Raga Suruthi ...... 44 4.6 Table describing Dataset for Raga ...... 44 4.7 Table describing Dataset for Raga Kuntala Varali ...... 45 4.8 Table describing Dataset for Raga ...... 45 4.9 Table describing Dataset for Raga Nattai ...... 45 4.10 Table describing Dataset for Raga Saurastram ...... 46 4.11 Table describing Dataset for Raga Nilambari ...... 46 4.12 Table describing Dataset for Raga ...... 47 4.13 Table describing Dataset for Raga ...... 47 4.14 Table describing Dataset for Raga ...... 48 4.15 Table describing Dataset for Raga ...... 49 4.16 Table describing Dataset for Raga ...... 50 4.17 Table describing Dataset for Raga Sri ...... 50 4.18 Table describing Dataset for Raga Ahiri ...... 51 4.19 Table describing Dataset for Raga Darbari ...... 51 4.20 Table describing Dataset for Raga ...... 52 4.21 Table describing Dataset for Raga Surthi ...... 53 4.22 Table describing Dataset for Raga Varali ...... 53 4.23 Table describing Dataset for Raga Sahana ...... 54 4.24 Table describing Dataset for Raga Nattai ...... 54 4.25 Table describing Dataset for Raga Kuntalavarali ...... 55 4.26 Table describing Dataset for Raga Saurastram ...... 55 4.27 Table describing Dataset for Raga Nilambari ...... 56

xii LIST OF TABLES xiii

4.28 Table describing Dataset for Raga Vasantha ...... 56 4.29 Table describing Dataset for Raga Kapi ...... 57 4.30 Table describing Dataset for Raga Kedaram ...... 58 4.31 Table describing Dataset for Raga Kanada ...... 58 4.32 Table describing Dataset for Raga Khamus ...... 59 4.33 Table describing Dataset for Raga Sri ...... 60 4.34 Table Describing Dataset for Tala Classification ...... 62 4.35 Table describing Dataset for Tala Adi ...... 62 4.36 Table describing Dataset for Tala Rupaka ...... 63 4.37 Table describing Dataset for Tala Tisra Jati Eka ...... 63 4.38 Table describing Dataset for Tala Khanda Jati Eka ...... 64 4.39 Table describing Dataset for Tala Adi ...... 64 4.40 Table describing Dataset for Tala Rupaka ...... 65 4.41 Table describing Dataset for Tala TisraEka ...... 65 4.42 Table describing Dataset for Tala Khanda Eka ...... 66 Abbreviations

AF - Amplitude Factor

ANN - Artificial Neural Networks

AR - Auto Regression

ASCII - American Standard Code for Information Interchange

CMD - Carnatic Music Dataset

CNN - Convolutional Neural Network

DTW - Dynamic Time Warping

FIR - Finite Impulse Response

FFT - Fast Fourier Transform

GMM - Gaussian Mixture Model

GP - Gaussian Process

GSMM - Gaussian Scaled Mixture Models

HPS - Harmonic-Percussive Separation

ILVS - Inter-Channel Level Vector Sum k-NN - k Nearest Neighbour

LCSS - Longest Common Segment Set

LSH - Locality Sensitive Hashing

xiv LIST OF TABLES xv

LSTM - Long Short Term Memory

MFCC - Mel Frequency Cepstral Coefficients

MIR - Music Information Retrieval

NMF - Non-Matrix Factorization

PCA - Principal Component Analysis

SSM - Self Similarity Matrix

SVM - Support Vector Machine

RELU - Rectified Linear Unit

RNN - Recurrent Neural Network

ZCR - Zero Crossing Rate Chapter 1

Introduction to Indian Classical Art and Audio Thumbnailing

Music has become integral part of our lives today. With the digital revolution that has struck the world and with the growth of computational power, browsing and storage has become ac- cessible and effective. Thus, Audio signal processing became an emerging area paving way for many new areas of research. This thesis addresses classification of carnatic music using audio thumbnailing with the help of convolutional neural networks and recurrent neural networks. Audio Thumbnailing was introduced in carnatic music for the first time as part of this thesis. A brief introduction to Indian classical music and the technologies used is provided as part of this chapter. The organization of this chapter is as follows: Section 1.1 discusses about Indian Art and its elements followed by brief introduction about machine learning and neural networks in Section 1.2. Section 1.3 deals with deep learning approaches and basic introduction to audio thumbnailing is given in Section 1.4. Motivation to this thesis was elaborated in Section 1.5 followed by thesis outline in Section 1.6.

1.1 Indian Art Music Traditions

In general, Indian music refers to both Carnatic and Hindustani music. Hindustani music, 1 also known as North Indian music was very prominent in the northern regions of India, Nepal,

1http://en.wikipedia.org/wiki/Hindustani classical music

1 Afghanistan and Bangladesh and Carnatic music, 2 famous in Southern regions of Indian sub continent. Indian classical music was described with great detail in Samved, one among four vedas [1]. Samveda, which means Veda of melodies and chants 3 which is around 1000 BC. However, the Indian music is an amalgamation of various cultures such as Persian, Greek, Iran [2] [3]. The oldest available books with respect to Indian Classical music are Bharata’s Natya Shas- tra and Saranga ’s Ratnakara [4]. They composed of 22 notes(Practically only 16 are useful). The primary aspects of Carnatic Music are Swara and Raga, Tala and tonic aspects of raga/.

1.1.1 Swara

Indian Classical music has seven swaras known as Saptha swaras. They are Shadjamam (Sa) which is base frequency or Tonic frequency, Rishabam (Ri), Gandharam(Ga), Madhyamam (Ma), Panchamam (Pa) Perfect Fifth, Dhaivatham (Dha), Nishadam (Ni). Of these, Sa and Pa do not admit variations and are called achala (fixed) swaras [4]. The other five (Ri, Ga, Ma, Dha and Ni) has variations in them and indicated by numbers 1,2 and 3 thus corresponding to the total of 16 notes. Among all these 16 notes, Ri3, Ga1, Da3, Ni1 are hardly used. The tonal quality of each note in seven swaras is often associated with call of specific animal which is explained in Table 1.1. There is mythological origin of swaras which says that Lord Shiva addressed congregation and the differences that existed in His tone became the seven swaras of music. Shiva first addressed the audience who are at the center, and then addressed audience in the immediate left and right. The center tone became the basic note or the sadjamam (sa), while the swaras on the immediate left became ni and right became the ri. The Lord then the Lord addressed far left and right. These became the notes dha, ga and ma. Thus seven swaras, Sa, Re, Ga, Ma, Pa, Dha and Ni are formed [5]. The tonal quality of each note in seven swaras is often associated with call of specific animal etc.

2http://en.wikipedia.org/wiki/Carnatic music 3https://en.wikipedia.org/wiki/Samaveda

2 Table 1.1 Table illustrating the note of each swara Name of the Swara Notation swara

Sadjamam Sa Cry of Peackock

Rishabham Ri Lowing of the bull

Gandhara Ga Bleating of a goat

Madhyamam Ma Call of the heron

Panchamam Pa Call of the cuckoo

Dhaivata Dha Neighing of the horse

Nishada Ni Trumpeting of the elephant

Although there are seven swaras (known as Saptha Swaras), Indian system has five extra notes. Manipulation of these gives rise to another concept of classical music called as Raga. These twelve swaras are explained in Table 1.2.

1.1.2 Raga

One of the most important element of carnatic music is raga. It can be described as spine of Indian classical music. The following is the definition of ’raga’ : ”Ranjayati iti Raaga: ” Translation : That which pleases to ear is described as Raga. Matanga, in his epic treatise Brihaddeshi, defines raaga as that which colors the mind of good through a specific swara and varna (meaning color) or through a type of dhvani (meaning sound). Ragas are broadly classified as Melakartha and Janya ragas or sampoorna and asampoorna ragas. A sampoorna raga includes all the seven in both aarohana and the sequence. Any other raga which doesnt consist of all seven swaras is asampoorna. All Melakartha ragas are sampoorna ragas while Janya ragas may be either sampoorna or asampoorna ragas. Other characteristic of Melakartha ragas is that, both the aarohana and avarohana sequences are required to contain

3 Table 1.2 Table illustrating twelve swaras Name of the Swara Notation

Sadjamam Sa

Komal Rishabham R1

Teevra Rishabham R2

Komal Gandhara Ga1

Teevra Gandhara Ga2

Komal Madhyamam Ma1

Teevra Madhyamam Ma2

Panchamam Pa

Komal Dhaivata Dha1

Teevra Dhaivata Dha2

Komal Nishada Ni1

Teevra Nishada Ni2

4 exactly one note from each swara. And aarohana sequence, pitch of next swara must always be higher than the previous swara. However, Janya ragas can contain more or less or equal to seven notes. Aarohana and avarohana of Janya ragas can have repetitions [4]. Chapter 2 explains various raga classification mechanisms described in literature.

1.1.3 Tala

Tala is nothing but the strong tradition of rhythm. In Carnatic Music, we have many talas namely, Adi Talam, Rupaka Talam etc. Combination of first letters of Tandava and Lasya gives us the word tala: Tandava is the dance of Lord Shiva and Lasya, the dance of his consort Shakti. Thus tala is a combination of energy and grace [6]. Tala has no reference in the earliest system of music, popularly referred as ”Samagana”. The Natyasastra of Bharata and the Sangitaratnakara of Sarngadeva are the oldest existing sources as mentioned above and are the main sources of information on Tala. Tala is characterized by 10 features, hence called the Tala Dasapranas (Dasa – 10, Prana – essence) [7]. The following is the brief description of these Dasapranas:

• Anga : Part or Limb

• Jati : type or kind. It describes variations in Anga(Laghu)

• Kriya: Action

• Kaala: Duration or measurement of time

• Graha: Describes where song commences, may not be at the beginning of tala

• Marga: Path. Describes duration of kriya/action. In other words, how tala is performed in various different songs

• Kala: Denotes number of matras in which kriya is subdivided

• Laya: Time gap between two consecutive kriyas. It sets the temp

• Yati: rhythmic pattern in composition with reference to anga

5 • Prasthara: detailed elaboration of rhythmic pattern

1.1.3.1 Tala Schemes in Carnatic Music

Canatic music has various schemes of Talas. The ancient 108 anga Talas (which include the 5 Margi and 103 Talas), the 72 Melakartha Tala system (which was designed to fit the 72 Melakartha Raaga classification), and the Suladi Sapta Tala system are some of the classification schemes.

1.1.3.2 Saptha Tala System

Saptha Tala System has become very famous during Purandaradasa period. As explained earlier, Tala is a cyclic repetition of a given rhythmic pattern. Avartanam is a complete cycle of Tala which consists of Aksharas which are further divided into Mathras. The seven talas are as follows: Dhruva, Matya, Rupaka, Jhampa, Triputa, Ata and Eka talam. These seven talas are subdivided, based on the change in Tala due to change in five Jaathis (Jaathis of Tala means that the amount of beats that a laghu can take). The five Jaathis are as follows: Tisra, Chatusra, Khanda, Misra and Sankeerna. Thus we get total of 35 Talas after dividing on the basis of Jathis. These 35 Talas allow further subdivision based on five Gathis/Nadais(Gathis means speed). The five Gathis are same as above Jaathis. Finally after Jathi and Gathi subdivision of Seven Talas, we get a total of 175 talas in Carnatic Music. Three elements namely, Jaathi name, Tala name and Gati name are required to describe a Tala [8].

1.1.4 Tonic

Tonic/ Sruthi is fundamental concept of Indian classical music, in fact in any tonal music across the world [9] [10]. It is the pitch chosen by the performer which is constant through out the concert and acts as reference frequency. Swara Sa in octave is considered as the Tonic/Sruthi. Generally, instruments like Tambura etc., are played to establish the tonic. Ac- companying instruments also tune to the specific tonic of the performer. Scientifically it is nothing but the linear shift on Time-Frequency graph [11]. Tonic identification can be done

6 either by tuning of drone(tambura) or melody characteristics. With this introduction of basic elements of carnatic music, lets look at the elements of carnatic concert and instruments used in the concert are explained in detail in the consecutive sections. Chapter 2 describes various tonic identification mechanisms in literature.

1.1.5 Carnatic Concert

In any carnatic music concert, the emphasis is always on vocal music/singer. From the perspective of music listener, the carnatic music concert is more like story telling or the con- versation between artist and audience. No definite structure however, people conventionally start with followed by . The Varnam is generally regarded as a warm-up art work, which sets the pace and mood of the concert. This is usually followed by Kriti with or without raga alapanam. Then the performer renders the main musical work of the concert in an elab- orate and detailed fashion. Then comes Tani Avartanam, where percussionists rhythmically embellish. Generally, most artistes sing a Ragam Tanam as the secondary major piece after the main piece. This is followed with a variety of songs, like , Slokas, etc. The concert concludes with mangalam, which is a prayer for peace and prosperity.

1.1.6 Classification of Indian Musical Instruments

Erich Moritz von Hornbostel and Cur Sachs devised a system for classification of musical instruments which is used by Orgnalogists worldwide(people who study musical instruments) [12] [13]. According to Hornbostel Sachs system, Indian musical instruments can be classified as,

• Chordophones (String Instruments)

• Aerophones (Wind Instruments)

• Membranophones (drums-percussion Instruments)

• Idiophones (non-drum percussion Instruments)

7 Chordophones primarily produce their sounds by means of vibration of a string or strings that are stretched between points. Examples of Chordophones include violin, , , etc. Aerophones or wind instruments are the kind of instruments in which air itself acts as the primary vibrating medium for the production of sound. Examples include flute, Har- monium, , Ottu, etc. The term percussion instruments refers to the method of playing the instrument namely striking instrument either by fingers, hand or by using sticks. This class of instruments are classified into two kinds namely, Membranophones and Idio- phones. Idiophones are the kind of music instruments whose substance vibrates to produce sound. Membranophones produce sound by vibration of a stretched membrane. Mrudangam, , , Khanjira, , Chanda etc. These categories are further classified and each of them is explained in the Figure 1.1.

Figure 1.1 Classification of Musical Instruments

8 1.2 Machine Learning and Neural Networks

As part of this section, we will go through basic machine learning concepts and focus on techniques involved in Support Vector Machines (SVM) and Convolutional Neural Networks (CNN) for classification. We also emphasize on supervised classification and their relevance to Convolutional Neural Networks.

1.2.1 Programming Vs Learning

In classical way of programming, a set of rules are defined by the programmer to perform the task. Computer follows those rules and execute the task. Learning is different from this classical methodology of programming. Learning can be informally defined as repetitively improving the performance of a task based on either experience or examples [14]. Here, data driven programming where the machine learns by itself based on examples.

Broadly, we can categorize machine learning algorithms as supervised or unsupervised algorithms. Supervised algorithms use labeled data while the unsupervised use unlabeled data.Categorization of machine learning is explained in detail in Figure 1.2.

1.2.2 Supervised Learning

Supervised learning is categorized based on their output, either as regression or classifi- cation. Regression produces continuous output while classification produce discrete output. Regression algorithms are further divided into various categories as explained in Figure 1.3.

1.2.3 Unsupervised Learning

Unsupervised learning uses unlabelled data as mentioned above. Unsupervised learning is explained in Figure 1.4.

9 Figure 1.2 Machine Learning Application adopted from https://www.guru99.com/machine- learning-tutorial.html

1.3 Deep Learning

Deep learning is a branch of machine learning which has the capability of learning from unlabeled and unstructured data. It replicates the neurons of brain. It has input, output and hidden layers. The word ’Deep’ depicts that there are more than two layers of neurons. Each layer here depicts the layer of knowledge. Neuron is the basic unit of a neural network model. Artificial Neural Network models(ANN) consist of interconnected neurons, and the interconnection between any two neurons has a weigh associated with it. Neural networks can be categorized as Feed forward neural networks, Recurrent neural networks (RNN) or Convolutional Neural Network (CNN). Recurrent Neural Network as the name suggest, is a multilayered neural network which include loops. In other

10 Figure 1.3 Regression Algorithms words, it is generalization of Feed forward neural networks which consists of internal memory to store. Convolutional neural networks is also a multi layered network which is used to extract complex features[14]. Chapter 2 explains various approaches in ANN in literature used for classification.

1.4 Audio Thumbnailing

With the developing pervasiveness of huge databases of sight and sound substance, tech- niques for encouraging quick perusing of such databases or the aftereffects of a database search are turning out to be progressively significant. This is particularly evident with cutting edge interactive media search and recovery frameworks, where the client must have the option to review returned determinations quickly to decide their importance to the first search. So as to improve the effectiveness of perusing, one must consider not just the expense of conveyance, in transmission capacity for example, yet additionally the time required to try out determinations.

11 Figure 1.4 Unsupervised learning Algorithms

In view of the wide assortment of media that one may wish to peruse, strategies that encourage such perusing must be media-subordinate. Here, there should be system where we can repre- sent small, representative portion of every musical recording for quick access and retrieval. We call these small, representative portions as ’Thumbnails’ [15] [16].

The goal of the audio thumbnailing is that, given a musical recording, we should be able to find the most representative part of the musical piece which can be called as ’preview’ to the whole musical recording [17]. Chapter 2 explains various audio thumbnailing methodologies in detail.

1.5 Motivation and Goals

With the advancement of Information Technology, the way we generate, browse, store, lis- ten to the music has changed. Music Information retrieval has significant advancements in the last couple of decades. However, the major focus of research in the field of Music In- formation Retrieval was around Western music. The solutions or methodologies proposed for

12 western music doesn’t work with Indian music. To be precise, the problems that are existing are different for all different musical backgrounds. Hence, there is a need to identify research problems and solve them for various musical traditions [18]. Audio Thumbnailing has been applied on western music, however, the method- ology is not introduced into carnatic music yet. As part of this thesis, we present our work on Generation of Audio Thumbnailing in context of Indian Classical music for the first time. We focus our attention towards classification of Ragas and Tala in Carnatic music using these Audio Thumbnailing. Thumbnail is often described as repetitive and representative part of the musical piece. Our main objectives are as follows:

• Devise an approach to generate thumbnail for Indian classical music songs by using culture specific constraints

• Generate thumbnails for various ragas and talas of Indian Carnatic songs and build a database of these thumbnails

• Build a classification model using SVM and CNN-RNN to classify these ragas

• Evaluate the proposed model with the existing classification approaches which use entire song for training.

1.6 Thesis Outline

In Chapter 2, we give an overview of existing areas of research in Music Signal Processing. Existing methodologies in these respective areas are explained in detail. Existing method- ologies in Audio Thumbnailing was discussed elaborately. A brief literature review of audio classification methodologies was explained.

In Chapter 3, We also provide a detailed description about proposed methodology. Formal computation of Self similarity matrix, Audio Thumbnailing was shown with great details. The architecture of proposed CNN-RNN model was also explained.

13 In Chapter 4, implementation of our methodology along with results is discussed with great detail. Algorithm was presented along with datasets.

In Chapter 5, we provide the details of contributions through the current thesis and discuss the scope for future work.

14 Chapter 2

Literature Survey of Music Signal Processing

2.1 Introduction

Music signal processing is a sub branch in signal processing which is a promising area these days. Music analysis based on signal processing techniques paved a new way of generation and analysis of music. Music analysis and recognition of various swaras and ragas is inherent to human understanding. Music signal processing include various areas of research such as Syn- thesis of Music, Transcription, Classification, Music Information Retrieval, Raga Classifica- tion, Tala Classification, Instrument/ Voice Identification, Audio Matching, Source Separation, Tonic Identification, Intonation/melodic/rhythmic analysis, Music emotion recognition etc. In this chapter, we discuss overview of music signal processing and existing approaches in these areas in great detail. This thesis focuses mainly on Indian Carnatic Music.

The organization of this chapter as follows: In Section 2.2 we give a detailed information about existing areas of research in Music signal processing followed by brief introduction to existing methodologies in audio thumbnailing in Section 2.3. We then presented audio classification techniques in literature in Section 2.4. Finally, we summarize and conclude in the areas of music signal processing in Section 2.5.

15 2.2 Areas of Research and Related Work

This section gives an overview of various analysis methodologies involved in music signal processing. The following are the broad areas that can be classified from the literature.

• Source Separation

• Tonic Identification

• Intonation/Rhythmic analysis

• Ontonlogies

• Automatic Music Stretching Resistance

• Music Emotion Recognition

• Music Note Representation

• Raga Classification/Identification

• Tala Classification/Identification

Source separation, or audio source separation is the act of separating/ isolating the sound sources. There are many applications of source separation like voice separation, instrumen- tal separation, karoke extraction, etc. Literature describes various methodologies of source separation. Each of them are explained briefly in the consecutive paragraphs.

2.2.1 Related Work With Respect to Source Separation

• Source Separation Based on Harmonic Structure: The method consists of three stages: a) estimating the harmonic structure of each source in every frame based on iteration of the mixed spectral peaks, b) clustering the estimated harmonics into the signals they belong to with pitch and formant information, and c) synthesizing the music source in time domain. This method has an advantage of solving octave overlapping problem [19].

16 • Bayesian Harmonic Model: It estimates multiple fundamental frequencies in frequency domain and combines the Bayesian harmonic model to estimate other parameters of the music signals. In the next stage the algorithm separates all estimated parameters [20].

• Coherent Modulation Spectral Filtering [21]

• The ILVS (Inter-channel Level Vector Sum) concept to cluster common signals (such as background music) from each channel. The advantage of this method is, with lower complexity it separates audio signal from multiple channels [22].

• Auto regression models [23]

• Gaussian Scaled Mixtures (GSMM), Autoregressive and Amplitude Factor (AF): three strategies are used namely Gaussian Scaled Mixture Models (GSMM), Autoregressive (AR) models and Amplitude Factor (AF [24].

• Non-negative matrix factorization (NMF) [25] [26] [27] [28] [26]

• Binary Time Frequency Annotation [29]

• Mask optimization in RNN: This method is used to assess the impact of reducing the binary maks in deep neural network by averaging time and frequency bins so that com- putational cost can be reduced. The ideal soft mask is then compared against masks produced by a range of averaging levels [30].

• Wiener filter modification: A FIR filter which is suitable for distilling sound of a part played by a particular instrument from a music ensemble is discussed in this report. The filter separates a monaural music signal into individual tones of every pitch with a particular timbre, the combination of which provides melodies played by an instrument. Wiener filter combined with a window and an evaluation method of current statistics of the mixture in the frequency-domain [31] [32] [33].

• Spectrum Inversion [34]

• Fourier Transform [35] [36]

17 • Principal Component Analysis (PCA) [37] [38]

• Repeating musical structure [39] [40] [41]

2.2.2 Emotion Recognition

Emotion Recognition is another major area in Music Signal Processing that has become popular in recent times. The paper [42] describes a novel method for estimating the emotions elicited by a piece of music from its acoustic signals. The foucs of emotion recognition in recent days is based on a multi-stage regression. After training, the aggregation happens in a predefined way and cannot be adapted to acoustic signals with different musical properties. It proposes a method that adapts the aggregation by taking into account new acoustic signal inputs.

Another important way of carrying out Emotion recognition is proposes a system for de- tecting emotion in music that is based on a deep Gaussian process (GP) [43] [44].The system can be classified as two stages namely, feature extraction and classification. In the first stage, rhythm, dynamics, timbre, pitch, tonality are consider as features. Given musical piece is seg- regated as frames and these five features are extracted for each frame. Next, various statistical parameters such as mean, standard deviation of are calculated to generate 38 feature vector. In the second stage, deep GP is used in order to classify emotions among 9 different classes.

2.2.3 Raga Classification

Classification and identification of ragas is a problem that has been addressed by several works of research in the recent past. Chordia [45] has used the pitch class proles and the bigrams of pitches to classify Ragas. The dataset used in the proposed system consists of 72 minutes of mono-phonic instrumental () data in 17 Ragas played by a single artist. The Harmonic-Percussive Separation (HPS) algorithm is used to extract the pitch. Note onsets are detected by observing the sudden changes in phase and amplitude in the signal spectrum. Then, the pitch-class and the bigrams are calculated. It is shown that bigrams are useful

18 indiscriminating the ragas with the same scale. The proposed model uses several classifiers combined with dimensionality reduction techniques. Using just the pitch class, the system achieves an accuracy of 75. Using only the bigrams of pitches, the accuracy is 82.

Sridhar and Geetha used an approach [46] to identify raga based on individual swaras identified in the music recording. The frequency components extracted from the recording are matched with database containing the swara sequences for each raga.This proposed method used chromagram and Swara histogram features extracted from the dataset [47]. Another approach of raga classification [48] uses Locality Sensitive Hashing (LSH) which finds and matches similar data points are hashed into various buckets. Another approach [49] uses Longest Common Segment Set (LCSS). Another approach [50] used non-linear SVM to classify ragas.

Katte and Tiple proposed a method which used the ‘pakad’ to identify the raga of a musical piece using n-gram matching[51]. [52] uses phrase based raga identification.

2.2.4 Tala Classification

Tala classification is not explored by many in the area of Music Information Retrieval and the problem is unique to genres of Indian subcontinent. Existing researchers have used several statistical methodologies, using complete song.

Alex and his team [53] has worked on Tala Classification with three different types of Talas used 2 dimensional CNN and gave the accuracy of 92%. The Inception V3 architecture of a CNN an accuracy of 50%, followed by kNN, with k set to 3 which gave us an accuracy of 42%. Although statistical methods proved to give better accuracy, they lack robustness and generalizing ability [53]. [54] aimed at estimating the tala or akshara period using self sim-

19 ilarity matrix. [55] compared beat detection, sound energy algorithm and frequency selected sound algorithm in order to classify talas. Deep Neural Network with Group delay was used by [56] for onset detection of mrudangam strokes. [57] used various data driven approaches to generate rhythm/ tala. Gaussian Models were used by [58] to classify Talas and Ragas. However, all these methodologies are using complete musical recording for training purposes. Hence, the feature set and time required for training and testing is significantly more.

2.2.5 Intonation/Rhythmic Analysis

Intonation can be defined as characteristic of Raga. [59] proposed two approaches based on the parametrization of pitch-value distributions: performance pitch histograms, and context- based swara distributions obtained by categorizing pitch contours based on the melodic context. And evaluated both these approaches on a larger dataset of Carnatic music [60] proposed intonation from pitch track. The procedure is as follows : First extract pitch contours from n selected voice segments. Then, obtain histogram normalized by the tonic frequency, from which each prominent peak is automatically labelled and parametrized. [61] analysed intonation with respect to thodi raga.

Work by [57] presented a data driven approach for generating rhythmic analysis in Carnatic Music. [57] used arithmetic partitions where each partition consists of combination of stroke sequences. [62] added knowledge constraints to this data driven approach. Classification is performed using Gaussian Mixture Models. [63] generated rhythm using n-gram Markov chains. [56] used separation driven onset detection methods for detection of stroke locations of percussion instruments.

2.2.6 Tonic Identification

Tonic can be described as the base frequency of the singer. All the instruments used in the concert are adjusted to the tonic frequency of singer. Normalizing the music recording with

20 tonic frequency is the first step in Carnatic music analysis. [11] described methodology of tonic pitch identification into two stages. In the first stage, which is applicable to both vocal and instrumental music, a multi-pitch analysis of the audio signal is performed to identify the tonic pitch-class. In the second stage the octave in which the tonic of the singer is estimated and is thus/needed only for the vocal performances. [64] used group delay to identify the tonic frequency. [65] proposed two stage process, wherein the first phase, tonic and raga have been determined independently using the features extracted from pitch histogram. In the second phase, raga and tonic are updated iteratively using the derived note information. [66] used multi-pitch analysis for the same. [67] uses strategies like, template matching, concert method and segmented histogram to evaluate the tonic frequency.

2.2.7 Music Note Representation

[68] formed a Unicode based music notation representation language as opposed to many ASCII representations [69] and [70]. [71] performs segmentation of Carnatic music and then finds the pitch associated with respect to each segment. [72] attempted identification of swaras using Statistical T Test while the authors of [73] proposed swara identification using stochastic models.

2.3 Existing methodologies in Audio Thumbnailing

As mentioned in Chapter 1, Thumbnail acts as preview to the musical recording giving the first impression of the audio to the listener [17]. Thus, these audio thumbnails help us to browse, navigate and find the required musical piece easily.

Audio thumbnailing focuses on extracting the most representative part of an audio segment. Hence, thumbnails are generated based on the chorus section of the song [74]. However, since chorus can often be absent in a song, this method is not viable. In another method to extract song thumbnails [75], the MFCC and spectral contrast features are used to determine repeated segments. Segments of two seconds long are clustered to determine the occurrence frequency

21 of each segment. The energy and position of segments are also considered. Although such short segments are highly repetitive in nature, they do not cover an appreciable length of the music recording to be used as a representative structure. Hence, they do not make audio thumbnails with much practical usage and merging such neighbouring salient segments do not necessarily make good thumbnails.

The method suggested in [76] extracts music thumbnails based on song structure analysis, where the pitch and FFT coefficients of audio frames are computed and repeated segments are detected. Based on the result of repetition detection, the structure of the song is estimated. But the method was only tested on 26 Beatles songs that have clear recurrent structures and leading vocals. Another approach for song structure analysis is described, where a song is segmented into intro, verse, chorus, bridge, instrumental and ending. Good results were ob- tained when tested on a small data set. However, the computational complexity involved may make it difficult to work in real-time or deal with a large database [77]. Results published in [78] also describes generating audio thumbnails using audio object localization using neural networks and clustering algorithms. Table-2.1 explains detailed comparison study between various Audio thumbnailing approaches in the literature.

22 Table 2.1: Comparison between existing Thumbnailing Ap- proaches

Author Task de- Prepr Features PostPrep Approach Distance Clustering Thumbnail Def Heuristic scr Logan, Music Fixed MFCC state Hierarchical/HMM Most frequent part First half;longest Chu summa- length segment rization Tzanetakis, Audio Lowlevel state Novelty Essential elements Concatenation of Cook thumb- segments near the 23 nailing boundaries Bartsch, Audio Beat MFCC/ Moving sequence representative por- First 3/4 Wake- thumb- tracking Chroma average tion field nailing filter Cooper, Music MFCC Novelty state Cosine SVM Most often re- Concatenate seg- Foote Summa- detection peated segments ments from top 2 rization SV clusters Goto Chorous Chroma moving sequence Cosine KeChorus is most 7.7 to 40 sec detection average representative sec- filter tion Author Task de- Prepr Features PostPrep Approach Distance Clustering Thumbnail Defi- Heuristic scr nition Lu, Repeating CQT threshold/vocalsequence Euclidean structure analysis Wang, pattern detection Zhang discov- ery Levy, Audio Beat Pitch PCA state Euclidean HMM and k Most representa- most frequently Sandler, thumb- tracking means tive segment repeated(highest Casey nailing energy) Peeters Music MFCC, PCA sequence Euclidian Structure analysis 24 structure Chroma analysis Muller, Audio Chroma Smoothing/threshold Cosine DTW Segment that best Segment that best Grosche thumb- represents record- explains the repet- nailing ing itive structure 2.4 Audio Classification Techniques

As we have seen in Chapter 1, Music Information Retrieval became an emerging area and grabbed the attention of research in industry. Classification is the most fundamental problem in Music Information Retrieval (MIR). Now, when it comes to Carnatic music, the most common elements are Raga, Tala and Swara. Thus, Raga and Tala classification becomes significant problem to solve. As part of this section, we will review existing audio classification methodologies in literature.

Support Vector Machines (SVM), Gaussian Mixture model (GMM) [79] and k-nearest neighbour (k-NN) [80] are the most commonly used classifiers [81]. SVM and k-NN are classifiers generally used in audio classification.

Another important classifier is convolutional neural network (CNN) which is the multilay- ered neural network by taking convolutions over given input signal [82]. Literature [83] shows combination of CNN and RNN was powerful in classification of audio signals. Chapter 3 explains the methodology used in the thesis. [84] used convolutional neural network for audio classification of 1 million 10sec excerpts data and proved it to be effective.

[85] proposed technique which first learns an over complete word reference which can be utilized to meagrely deteriorate log-scaled spectrograms. It at that point trains a proficient encoder which rapidly maps new contributions to approximations of their scanty portrayals utilizing the scholarly word reference. This maintains a strategic distance from costly iterative techniques typically required to gather scanty codes. We at that point utilize these meagre codes as contributions for a direct Support Vector Machine (SVM) and gives the accuracy of 84.2%.

25 2.5 Summary and Conclusions

In this chapter, we present detailed overview of existing literature and methodologies in the area of music signal processing. However, these techniques mentioned above are not the end of the music signal processing, there are many open challenges and exciting problems. From literature, we observe that there is a wide scope to explore in each of the following areas.

• Tala Classification for all 175 talas in Carnatic music is unsolved problem.

• Audio matching, Version identification, composer identification, album detection, gen- eration of ragas automatically, identification of the song name etc are some of the chal- lenges that are existing in the area of music signal processing.

• Raga and Tala classification is performed using the complete song due to which the feature set would be too high.

In the next chapter, we will discuss computation of audio thumbnails and architecture of CNN-RNN network. We will formally the generate Self-Similarity Matrices.

26 Chapter 3

Classification and Computation of Audio Thumbnails

In this chapter, the focus is on classification of Carnatic songs using thumbnails. The con- cept of Raga and Tala is integral part of Indian Classical music. As explained earlier, Raga is the melodic component while Tala is the rhythmic component in the music. Hence, clas- sification and identification of ragas and talas is a paramount problem in the area of Music Information Retrieval (MIR) systems. Many statistical and machine learning approaches are proposed in Literature Survey to classify them. However, they use complete musical recording for training and testing. As part of this thesis, a novel approach is proposed for the first time in Carnatic music to classify Ragas/Talas using repetitive structure called Thumbnails. The organization of this chapter is further detailed. Section 3.1 contains a detailed explana- tion of the proposed methodology which include generation of thumbnails and architecture of neural network. Section 3.2 discusses the summary and conclusion.

3.1 Proposed Methodology

We propose a method of classification using Audio thumbnails. As mentioned in Chapter 1, thumbnails are the repetitive segment of the musical recording. The algorithm proposed is presented in detail below: Before generating thumbnails, the musical piece is normalized with respect to the tonic frequency of the singer. Tonic identification is performed in two stages as in [86]. We per- formed multi-pitch analysis of given audio signal in order to identify pitch class. Advantage

27 of multi-pitch analysis is, it gives the drone sound which constantly runs in the background. In the next stage, estimate the octave in which tonic of singer lies and then analyses the predominant melody. The audio signal is then normalized with the help of the tonic frequency before computing chroma features.

3.1.1 Computation of Self Similarity Matrix

We use self-similarity matrices (SSM) to generate thumbnails. Self similarity matrix is an important feature in any time series as it captures repetitions in the form of path-like structures and homogeneous portions gives block-like regions. These repetitions are captured and the most repetitive path is found and named as thumbnail.

As mentioned earlier, Self Similarity matrix is generally used to capture the structural prop- erties of audio recording [87]. Consider a feature space z such that

For example, given any feature sequence Y = (y1,y2,...,yN). Self Similarity Matrix S, con- tains elements after comparing all elements with each other. Hence, the Self similarity matrix S  R NXN will be of size N 2 and is defined as :

S(n, m) := s(xn, m) (3.1)

We define S(m,n) as score. These self similarity matrices have an important property which describe that, S(xm, xm) = 1 for all n  N Hence, this results into matrix with diagonal with greater values. Repetitive patterns results as path like structures while homogeneous portions as block like structures. Figure 3.1 shows sample Self Similarity Matrix which is derived using Chroma based features. Chroma features are an array of 12-dimensional vectors from the short-term Fourier Transform of the musical recording which shows the pitch distribution over time. Figure 3.2 shows sample chromogram. Now, lets formally define the concept of path in a sequence. As mentioned above, in given feature sequence :

Y = (y1, y2, ..., yN) (3.2)

28 Figure 3.1 Sample Self Similarity Matrix and S is the Self similarity matrix. There exists a segment,

α = [a : b] ⊆ [1 : N] (3.3) where a and b are start and end points respectively [17].

Now, we will define the length of α as follows:

| α |= b − a + 1 (3.4)

Figure 3.3 shows a path like structure in Self Similarity matrix in detail. Now, for every path P in SSM, we can have two projections namely, φ1 and φ2 as shown. We will consider φ2(P) as α and call the other segment as induced segment [17]. We will define score σ(P) as :

L X σ(P ) = S(ml, nl) (3.5) l=1 In short, this score will give us the measure of a relation between α and the induced segment portion. Group similar paths together using any clustering mechanism

29 Figure 3.2 Chromogram of Song Inta Chala in Adi Talam

Briefly, computation of Self Similarity matrix can be summarized as

• Derive a feature sequence from the give musical recording.

• From the derived feature sequence, form a self similarity matrix.

• Compute the score for all the paths derived out of Self similarity matrix.

• Find out similar segments using the score defined above and form clusters.

Once the SSM is formed, enhance the SSM using various enhancement methodologies. As part of this thesis, we are using thresholding and smoothing. Figure 3.4 explains the steps in detail.

3.1.1.1 Enhancement Strategies

There are many enhancement strategies like transposition invariance, diagonal smoothing, thresholding etc. However, as part of this thesis, we used smoothing and thresholding.

30 Figure 3.3 Detailed view of Self Similarity Matrix

A very simple diagonal filtering works for us in most of the cases. A convolutional filter averaging along the diagonal in the forward direction as well as the backward direction en- hances the path structures while suppressing block structures. Formally, we define smoothing procedure as follows [17]:

Consider S be the SSM formed which is of Size N XN and L is the length parameter.

Consider smoothed SSM be SL as :

L−1 1 X S (n, m) = S(n + l, m + l)forn, m[1 : N − L + 1] (3.6) L L l=0

SL(n,m) is acquired by averaging the closeness estimations of two after effects of length L. A higher L suppresses more information and is lossy than a lower choice of L. The smoothing parameter is tuned to L = 50 in accordance to the observation made above. SSM before and after smoothing are clearly shown in Chapter 4.

31 Figure 3.4 Self Similarity Matrix - Procedure

Out of all the thresholding mechanisms, application of Global thresholding is the simplest.

Consider SSM as S and SSM after thresholding as ST and thresholding parameter τ as :

 S(n, m) if S(n, m) ≥ τ ST(n, m) := (3.7) 0, Otherwise

Relative Thresholding: Thresholding out the top 20% of the values and scaling them linearly gives us the enhanced path structures which are used to find the fitness measure for thumbnail- ing.

32 3.1.2 Generation of Thumbnails

As explained earlier, we will assign a measure, formally known as Fitness measure that assigns a fitness value to every segment. This fitness score gives us two important details, namely how good the given segment describes the other segments and how much of the entire audio piece is covered by these related segments. Now, we define a thumbnail as the segment which has maximum fitness.

For this, we extend the notion of path to path family. The concept of Path family gives the relation between α and all other segments in the musical recording [17]. For this, we will first define segment family as:

A := α1, α2, ... (3.8)

We will define Coverage of A as:

K X γ(A) := | αk | (3.9) k=1

Path family is defined as follows:

P := {P 1,P 2, ...., P K} (3.10)

Using the equation 3.6, we will now define the score σ(P ) of the path family as:

K X σ(P ) = σ(pk) (3.11) k=1

33 Now, we should describe an algorithm to compute optimal path family for a seg- ment using Dynamic Time Wrapping (DTW). Lets consider two sequences as X =

(x1, x2, ..., xN)andY = (y1, y2, ..., yM)

Dynamic Time Wrapping must return an optimal path which aligns the sequences X and Y. For this, we use score maximizing paths as path of this thesis.

Lets consider paths over segment α = [a : b] with M := |α| , we will take sub-matrix Sα from the SSM S. Then, we should compute the score matrix D DRN,M+1 which is defined as follows:

D(n, m) := S{α}(n, m) + max{D(i, j) | (i, j)φ(n, m)} (3.12) where φ(n, m) is the set of predecessors which contains cells that come before (n,m) in path family. This is formallly defined as:

X φ(n, m) = {(n − i, m − j) | (i, j) } ∩ [1 : N]X[1 : M] (3.13)

Now, lets define, optimal path family P* which is the max of all path families. Formally,it is as follows: P * = argmaxσ(P ) (3.14)

Normalized score σ(α) is defined as:

σ(P *)− | α | σ(α) := (3.15) PK k=1 Lk Normalized coverage γ(α) is defined as:

γ(A*)− | α | γ(α) := (3.16) N Fitness φ(α) over every segment α in the musical recording is given as:

σ(α)γ(α) φ(α) := 2. (3.17) σ(α) + γ(α)

34 Audio thumbnail is the one, with maximum fitness score which is as follows:

α* := argmaxφ(α) (3.18)

3.1.3 Classification Model

Once the thumbnail is selected, The next step is to build the classification model. In this regard, we propose to use the audio thumbnail to extract the feature vector for classification which consists of the chroma vector, Mel-spectrogram features and the spectral contrast fea- tures. In addition to this, we propose a parallel CNN-RNN model inspired by [88]. The intuition behind this architecture is to train the classifier with characteristic properties learned individ- ually from the convolution layers and the recurrent layers. For instance, convolution neural networks are effective at capturing spatial relationship in the dataset while the LSTM takes care of the temporal nature of the feature vectors by learning sequence patterns. The Figure 3.5 shows the architecture of proposed methodology.

3.2 Summary and Conclusions

In this chapter, we discussed generation of Self Similarity Matrices and their properties with sample example in detail. Methodology to enhance the SSM is also explained along with formally defining musical recording as feature sequence. Explained the generation of SSM and thumbnail with sample figures. Computation of Fitness score, coverage score and thumbnail is explained formally. The architecture of proposed CNN RNN model is discussed in great detail.

In the next chapter we will discuss the results of classification of Carnatic Thumbnails using the proposed model.

35 Figure 3.5 Architecture of the neural network classification model

36 Chapter 4

Results of Classification of Carnatic Thumbnails using CNN-RNN Models

As mentioned earlier, Raga and Tala are the basic elements of carnatic music. Raga classification and tala classification is being performed by researchers in many different ways. Chapter 2 explains them in detail. However, all those methods use complete song for training and classification of data. Here, for the first time in carnatic music, we used the concept of audio thumbnailing for the classification of Raga and Tala.

Before generation of thumbnails, the audio song has to be normalized with the tonic frequency of the singer. Tonic identification is a solved problem and respective literature is explained in detail in Chapter 2.

The algorithm for tonic identification and audio thumbnailing is implemented in Python and is explained below in algorithm 1. Each of the functions in the algorithm are explained below in detail. Before generating thumbnails, the musical piece is normalized with respect to the tonic frequency of the singer. Tonic/ Sruthi is a fundamental concept of In-dian classical mu- sic. It is the pitch chosen by the performer which is constant through out the concert and acts as reference frequency [11]. Swara Sa in octave is considered as the Tonic/Sruthi. Generally, instruments like Tambura etc., are played to establish the tonic. Accompanying instruments also tune to the specific tonic of the performer. Scientifically it is nothing but the linear shift

37 on Time-Frequency graph. Tonic identification can be done either by tuning of drone(tambura) or melody characteristics. Tonic identification is performed in two stages as in [86]. We per- formed multi-pitch analysis of given audio signal in order to identify pitch class. Advantage of multi-pitch analysis is, it gives the drone sound which constantly runs in the background. In the next stage, estimate the octave in which tonic of singer lies and then analyse the predomi- nant melody. The audio signal is then normalized with the help of the tonic frequency before computing chroma features. Compute SSM : Chroma vector representation of the song example is used to compute the self-similarity matrix. This procedure is formally explained in Chapter 3.

Enhance SSM : Due to significant distortions that are caused by variations in parameters such as dynamics, timbre, modulation, articulation etc, it is difficult to automatically extract important structural elements such as the paths of high similarity from the self-similarity matrix [13]. In order to enhance the repetitive structures in the self-similarity matrix, the paths parallel to the diagonal are smoothed out using convolutional filters. The irrelevant noisy structures in the matrix are suppressed using thresholding and scaling. Figures 4.1 and 4.2 shows Self similarity matrix before and after applying enhancement strategies of same musical piece. Fitness Score : A dynamic programming algorithm introduced in [13] is used to compute the fitness score for the audio segment. The steps involved are summarized below:

• The cumulative score matrix D is computed for a subsection of the self-similarity matrix along the path like structures. Score matrix D computed for each music segment gives the path score. In an intuitive sense, the path score gives the extent of the repetitive nature of the audio segment in question throughout the music recording.

• Coverage score is computed as the cumulative length of the path-like structures in the subsection of the self similarity matrix. The coverage score takes into account how much of the total length of the musical recording does the optimal segment cover.

38 • As the optimal thumbnail should be characterized with both repetitive nature as well as the coverage, both the scores computed above are used to define the fitness measure of the audio segment.

Algorithm 1: Audio Thumbnailing Algorithm Input : Chroma vectors of the music recording chroma, thresholding parameter thresh, smoothing parameter L Output : audio segment with maximum fitness Function Main(chroma, thresh, L): ssm ← ComputeSSM(chroma) ssm ← EnhanceSSM(ssm, threshold,L) while 0 ≤ i ≤ j ≤| audio | do audio is the time series of the music recording

φ(i,j) ← FitnessScore(ssmenh,i,j)

compute maximum fitness score return maximum fitness score

These functions are implemented using librosa library [89] in Python. A thumbnail is then selected after computation of fitness score. As mentioned above SVM and CNN RNN is trained for the classification. SVM parameters are selected and tuned for extensive grid search algorithm [90]. A six dimensional feature set consists of features such as, short term fourier transform, mel frequency cepstrum, chroma, melspectogram, spectral contrast, tonnetz. Support Vector Machine classifier is optimized by three fold cross validation using GridSearch CV in sklearn library [91]. The performance of Hyper parameters [92] [93] are evaluated and tuned for effective accuracy. Best parameters found on dataset with Kernel radial basis is, C: 10 and gamma as e -8.

Our CNN-RNN structure is inspired by [88]. As part of this architecture we train the classifier with characteristics learned from both convolutional as well as recurrent layers.

39 Figure 4.1 Self Similarity Matrix before Enhancing and Smoothing for Song Inta Chala in Adi Tala

The intuition behind taking this parallel approach is that, convolutional properties captures spatial relationship among features well, while recurrent is effective for capturing temporal characteristics. The CNN has 4 convolutional layers interleaved with 2 pooling operations, then followed by a dense layer. RNN is implemented with LSTM layer and batch size of 128. CNN RNN is implemented in Python using Keras library [94].

We have classified both ragas and talas in Carnatic music using audio thumbnails.

4.1 Experimental Setup

4.1.1 Dataset for Raga Classification

The dataset of carnatic music recordings has been taken from the Carnatic Music Dataset (CMD) of the project Dunya and, additionally, scraped from carnatic recordings on YouTube.

40 Figure 4.2 Self Similarity Matrix After Enhancing and Smoothing for Song Inta Chala in Adi Tala

The combination of both these sources gives us a balanced mix of studio recorded sound and live performances. Dunya provides a collection of music corpora and related software tools, that have been developed as part of the CompMusic Project, created with the aim of studying particular music traditions [95]. Table 4.1 illustrates the division of the training and test sets along with accuracies for each raga. Multiple thumbnails were extracted from each audio sam- ple for training, one from 3min piece. For example, if the recording is of 9 min, 3 thumbnails are extracted. Complete songs were used for testing purposes.

Training Dataset for each raga is given below:

41 Table 4.1 Table Describing Dataset for Raga Classification Name of the Raga No.of thumbnails No of audio samples used for testing Accuracy

Ahiri 14 7 92.2

Darbar 28 13 87.4

Darbari Kannada 12 6 76.2

Suruti 18 6 76.2

Varali 22 14 89.3

Sahana 21 13 74.7

Nattai 20 11 66.20

Kuntalavarali 12 8 78.5

Sowrastram 18 10 83.2

Nilambari 14 8 78.5

Vasantha 21 13 77.1

Kapi 47 28 85.0

Kedaram 13 7 59.8

Kanada 41 22 68.8

Khamas 23 13 67.5

Sri 22 13 84.6

42 Table 4.2 Table Describing Dataset for Raga Ahiri

SlNo Song Name Artist Name Duration Thumbnails

1 Kusumakara Vimana Rudham DK Jayaraman (M) 3.58 1

2 Mukhi bale Pani Unnikrishnan (M) 6.46 2

3 Etula Kaapaduduvo Gayathri venkataraghavan (F) 9.29 3

4 Manasi Dussahamayo NJ Nandini (F) 9.18 3

5 Maayamma Jayaraman (M) 9.36 3

6 Ninnukori varnam MS Subbalakshmi (F) 6.3 2

Table 4.3: Table describing Dataset for Raga Darbari

Sl. No. Song Name Artist Name Duration Thumbnails 1 Paripalayamam Kodantapanai Trichur Brothers (M) 12.4 4 2 Ramalobhamela Nanurakshmibchu ML Vasanthakumari (F) 10.23 3 3 Smaramanasa pathmanabacharanam RK Srikantan (M) 6.06 2 4 Adiya Patham Nityasree Mahadevan (F) 0.27 3 5 Raghavendra guru Sanjay Subramanyam 12.36 4 6 Sree Venugopala Koteswarayyar (M) 7.31 2 7 Dhaari Theliyaka M Balamurali krishna (M) 6.24 2 8 Nayanana Neevunamidha DK Jayaraman (M) 6.26 2 9 Nayaganai nindra R Vedavalli 6.02 2 10 Vezha Mugatharase Ambhujam Krishna (M) 6.51 2 11 Hare Krsna (M) 7.11 2

Table 4.4: Table describing Dataset for Raga Darbari Kan- nada

43 Sl. No. Song Name Artist Name Duration Thumbnails 1 Sukha Vazhvu Sumitra vasudev (F) 11.27 3 2 Karunichutakuidu Neyveli Santhana Gopalan (M) 4.17 1 3 Daridaapuleka Pandita Rama (M) 9.17 2 4 Venkata Shaila S Soumya (F) 15.38 3 5 Eti Yochanlu Hyderabad brothers (M) 4.27 4 6 Sankaracharyam TM Krishna (M) 6.28 2 7 Vagaladibodhanala Karunasaaya Ajoy Chakraborthy (M) 6.48 2 8 Viriboni Malladi brothers (M) 9.14 3 9 Gana Murte S Mahati (F) 5.48 2 10 Sri Matrbhutam Vidushi Sahasrabuddhe (F) 6.25 2 Sl. No. Song Name Artist Name Duration Thumbnails 11 Durmaracaradhamu Ramnad Krishnan (M) 6.51 2 12 Govardhana Giridhari A Kanyakumari (F) 4.49 1 13 Govinda Alarmelmagai Manala Va Tiruvenkatanatha Sanjay Subramanyam (M) 12.29 4 14 Haritumharo MS Subbalakshmi (F) 3.09 1

Table 4.5: Table describing Dataset for Raga Suruthi

Sl. No. Song Name Artist Name Duration Thumbnails 1 Karunichutakuidu Neyveli Santhana Gopalan (M) 9.48 3 2 Daridaapuleka Pandita Rama (M) 9.17 3 3 Venkata Shaila S Soumya (F) 15.38 5 4 Eti Yochanlu Hyderabad brothers (M) 14.27 4 5 Sankaracharyam TM Krishna (M) 9.28 3

Table 4.6: Table describing Dataset for Raga Varali

Sl. No. Song Name Artist Name Duration Thumbnails 1 Aazhi Mazhi Ariyakudi Ramanuja Iyengar (M) 6.51 2 2 Idhigo Bhadradri M Balamurali Krishna (M) 4.19 2 3 Viruttam Gayathri Girish (M) 11.28 3 4 Lambodaraya N kiran (M) 4.58 2 5 Kamakshi E Gayathri (F) 6.3 2 6 Guru (M) 6.53 2 7 Vanjaksha Krishna TM (M) 5.42 1 8 Ille Vaikunta Ramakrishna Murthy (M) 12.26 4 9 Ka Va va Brothers (M) 8.48 2 10 Intha podayya Ravi (M) 6.12 2

44 Table 4.7: Table describing Dataset for Raga Kuntala Varali

Sl. No. Song Name Artist Name Duration Thumbnails 1 Arkali Shoozhulagil Nityashree Madhavan (F) 6.58 2 2 Kadanakuthoohalam Prince Rama Varma(M) 6.06 2 3 Undhan Chevigal TN Seshgopalan (M) 7.16 2 4 Sarade Saraswadi Sumitra Nitin (F) 6.14 2 5 Bhogindra Shayinam Hyderabad brothers (M) 8.01 2 6 Thillana Amrutha (F) 7.05 2

Table 4.8: Table describing Dataset for Raga Sahana

Sl. No. Song Name Artist Name Duration Thumbnails 1 Kavae Kanyakumari Abhishek raghuram (M) 9.26 3 2 Karunimpa Sudha Raghunanthan (F) 9.08 3 3 Chittam Irangada Sikkil Gurucharan (M) 11.16 3 4 Shanti Neelava Vendum MS Subbalakshmi (M) 7.41 2 5 Engo Printhavaram TM Krishna (M) 3.41 1 6 Sahana AR Rahman (M) 3.4 1 7 Varnam Dr L Subrahamaniam (M) 10.48 3 8 Sri Kamalambikaya MD Ramanathan (M) 7.53 2 9 Vasudha Vedavalli (F) 9.53 3

Table 4.9: Table describing Dataset for Raga Nattai

Sl. No. Song Name Artist Name Duration Thumbnails 1 Ninne bhajana Abhishek raghuram (M) 11.08 3 2 Paahi Saure Dr saraswaty (F) 8.11 2 3 Jaya devaki kishora (F) 9.2 3

45 Sl. No. Song Name Artist Name Duration Thumbnails 4 Veda Mathe Veda Vinute TM Krishna (M) 16.46 2 5 Pahi Nikhila janani Sudha Raghunathan (F) 16.06 2 6 Kambeera Sanjay Subramanyam (M) 7.01 2 7 Namo Namo Raghukulanayaka MS Subbalakshmi (F) 6.37 2 8 Sidhi arul siva sakti balagane Trichur V Ramachandran (M) 6.31 2 9 Kamrimukhavarada Priya sisters (F) 7.17 2

Table 4.10: Table describing Dataset for Raga Saurastram

Sl. No. Song Name Artist Name Duration Thumbnails 1 Aadhuvum solluval Mrs Geetha raja (F) 5.51 1 2 Ponniah pillais ranganathude Priya sisters (F) 9.4 3 3 Varalkshmim bajare Sangeetha swaminathan (M) 9.26 3 4 Ninne nammi Bombay Jayshri (F) 10.48 3 5 Sevimpare Sri Ganapathini Trichur Brothers (M) 9.02 3 6 Suryamurthe Namostute Abhishek Raghuram (M) 8.45 2 7 Saakra Varnam Prince Rama Varama 5.24 1 8 Meluko Mangalampalli B Krishna (M) 5.02 1 9 Navagruha kriti M D Ramanathan (M) 4.51 1

Table 4.11: Table describing Dataset for Raga Nilambari

Sl. No. Song Name Artist Name Duration Thumbnails 1 Banagaru Murali M Balamurali Krishna 5.13 1 2 Etuvanti Vaade Malladi Brothers (M) 10.48 3 3 Rama Raghava ML Vasantah Kumari (F) 10.35 3 4 Tholliyunu MS Subbalakshmi (F) 5.22 1

46 Sl. No. Song Name Artist Name Duration Thumbnails 5 Thalatu Maharajapuram S Ramachandran (M) 6.56 2 6 Shringara Lahari (F) 8.13 2 7 Uyyala loogavayya Amritha (F) 7.22 2

Table 4.12: Table describing Dataset for Raga Vasantha

Sl. No. Song Name Artist Name Duration Thumbnails 1 Maa dayai MS Subbalakshmi (F) 6.01 2 2 Rama Ram Mani Krishnaswamy (M) 7.51 2 3 Brugadambikaye Maharajapuram Santanam (M) 4.01 1 4 Raara Seetha Rama Maharajapuram Viswantha Iyer (M) 6.07 2 5 Brukathambigaye Sanjay Subramanyam (M) 7.23 2 6 Nadha Jyothi Muthu Swamy Madhurai Somasundaram (M) 9.06 3 7 Slokam Kodulega Geetha Raja (F) 8.46 2 8 Parama purush Ramani (F) 6.48 2

Table 4.13: Table describing Dataset for Raga Kapi

Sl. No. Song Name Artist Name Duration Thumbnails 1 Vizzhikku tunai Prasanna Venkataraman (M) 6.02 22 2 Thinnala Sheik Chinna Moulana (M) 8.27 2 3 Sodanai Chumai Nityasree Mahadevan (F) 6.12 2 4 Varuvaano NC Vasanthakokilam (F) 7.21 2 5 Aparam Naja Jaya Mala (F) 8.31 2 6 Sarasamadhana (M) 14.03 4 7 Rama Pahi megha syam pahi M Balamurali Krishna (M) 6.58 2 8 Nee poi azhaithuvadi ML Vasanthakumari (F) 6.58 2

47 Sl. No. Song Name Artist Name Duration Thumbnails 9 Vaddheni Malladi Brothers (M) 7.45 2 10 Nandagopala MD Ramanathan (M) 8.09 2 11 Khelati Mama Kumaresh (M) 9.31 3 12 Nee Samagaaanapriya Prince Rama Varma (M) 6.04 2 13 Parulanna Maata Darmapuri Subbarayar (M) 6.22 2 14 Jo Achutanantha MS Subbalakshmi (F) 6.18 2 15 Maadhavame Narayan (M) 4.42 1 16 Punkuyil kuvum Bombay Jayshri (F) 6.16 2 17 Sarasamulaade Vaidyanantha Bhagavathar (M) 12.34 4 18 Rattiname Ariyakudi Ramanuja Iyengar (M) 3.06 1 19 Puththam Pudhu vasantham Anantha Lakshmi Sadagopan (F) 4.28 1 20 Nee mattume en Nenjil Nirkirai Ambujam Krishna (F) 7.15 2 21 Kanna va manivanna va Sriranjani (F) 9.45 3 22 Kurai Ondrum Illali Sooryagayathri (F) 4.55 1 23 Charanamule Nammiti M Balamurali krishna (M) 6.2 2

Table 4.14: Table describing Dataset for Raga Kedaram

1 Kauninchutakuidu Sumitra Vasudev (F) 6.26 2 2 Daridaapuleka Sudha Raghunathan (F) 6.41 2 3 Venkata Shaila Abhishek Raghuram (M) 13.49 4 4 Eti yochanlu Hyderabad Brothers (M) 6.07 2 5 Viriboni MS Subbalakshmi (F) 12.1 4 6 Gana murte Anoopsankar (M) 17.26 5 7 Sayan Kale Amruta Venkatesh (F) 6.58 2 8 Maaya Mohamu Sri Krishna (M) 9.04 3 9 Paratpara KV Narayanaswami (M) 6.24 2

48 Sl. No. Song Name Artist Name Duration Thumbnails 10 Mari Mari Vachuna Nedunuri Krishnamurthi (M) 8.48 2 11 Phanipatisai (F) 7.11 2 12 Durmaga caradamula Bombay sisters (F) 1306 4 13 Sri Matrbhutam Semmangudi Srinivas Iyer (M) 14.4 4

Table 4.15: Table describing Dataset for Raga Kanada

Sl. No. Song Name Artist Name Duration Thumbnails 1 Paada Poojeta Maadiro Rajkumar Bharati (M) 10.24 3 2 Intakante kavalena Ramnad Krishnan (M) 12.31 4 3 Puulinvaari pasuam Yesudas KV 5.35 1 4 Nadhilayam Karaikudi r mani (M) 4.3 1 5 Serithavachambugan KV Narayanaswami (M) 6.5 2 6 Sri Mathrubootham TM Krishna (M) 9.06 3 7 Javali Priayvadhanna (F) 10.26 3 8 8th Kriti Nalini Ramprasad (F) 8.43 2 9 Enna solli azhaithal Ambujam krishna (F) 5.27 1 10 Chaumathe upacharamu S Gayathri (F) 7.23 2 11 Thillana Dwaram Venkataswamy (M) 7.35 2 12 Veera Hanumanthe Hyderabad brothers (M) 4.02 1 13 Sukhi evaro Jaishankar (M) 6.1 2 14 Raagam Thanam Pallavi & Gayathri (F) 16.05 5 15 Thirukedeswara Santhanam (M) 5.42 1 16 Varnam Lalgudi Jayaraman (M) 6.24 2 17 Nera Nammiti Varnam Maharajapuram Santanam (M) 5.2 1 18 Alaipayuthe MS Subbalakshmi (F) 8.4 2 19 Bahajare Bhajamanasa MS Gopalakrishnan 8.2 2

49 Sl. No. Song Name Artist Name Duration Thumbnails 20 Mahaganapathe Bombay sisters (F) 6.55 2 21 Ni kanta MD Ramnathan (M) 5.32 1 22 Mamava sada Manjunatha M Mysore (M) 4.28 1 23 Jaya mangala Nagaraj M (M) 3.3 1 24 Vaani pondu Bombay Jayastri (F) 5.05 1 25 Alaaai paayude Anayampatti dhandapani (M) 4.4 1

Table 4.16: Table describing Dataset for Raga Khamas

Sl. No. Song Name Artist Name Duration Thumbnails 1 Kadaikannaparvai Gayathri Venkataraghavan (F) 12.34 4 2 Rama jogi S Gayathri (F) 4.53 1 3 Jayati jayati GN Balasubramaniam (M) 4.06 1 4 Santhana Gopala Sanjay Subramanyam (M) 12.16 4 5 Sri swaminadheya (M) 4.52 1 6 Sujana Jeevana M Balamurali krishna (M) 6.09 2 7 Thillana Khamas ML Vasantha kumari (F) 4.28 1 8 Madhanga Mohana Bombay jaysri (F) 6.19 2 9 Apadooraku Chittibabu (M) 3.18 1 10 Dolayam Bombay Sisters (F) 6.58 2 11 Evala Nannu (F) 8.53 2 12 Sitapathe Krishna (M) 6.13 2

Table 4.17: Table describing Dataset for Raga Sri

Sl. No. Song Name Artist Name Duration Thumbnails 1 Bagyada Lakshmi baramma (M) 6.37 2

50 Sl. No. Song Name Artist Name Duration Thumbnails 2 Vanajasana Vinuta Brothers (M) 8.05 2 3 Mayanai Akkarai Sisters (F) 3.49 1 4 Mangalam Arul Dr. Seerkali Govindarajan (M) 5.23 1 5 Karuna judu ninnu Sanjay Subramanyam (M) 5.53 1 6 Reena Madadritha Prince Rama Varma (M) 5.08 1 7 Karuna Cheyvan Enthu P Jayachandran (M) 6.42 2 8 Bhavayami nanda Mannaragudi Rajagopalan (M) 6.18 2 9 Sri Vishvanatham bhaje (F) 12.55 4 10 Sri muladhara chakra vinayaka Krishna TM (M) 6.21 2 11 Sri Abhayamba DK Jayaraman (M) 14.14 4

Testing Dataset:

Table 4.18: Table describing Dataset for Raga Ahiri

Sl. No. Song Name Artist Name Duration 1 Deenaraksha Sanjay Subramanyan (M) 4.4 2 Mayamma Yani Kunnankundi M Balamuralikrishna (M) 7.41 3 Panimathi KS Chaitra (F) 4.39 4 Challare Ramachandrunipai Hyderabad Brothers (M) 4.21 5 Sompaina Manasutho Priya Sisters (F) 6.27 6 Adaya sri Raghuvara Saketharaman s (M) 9.5 7 Sri Kamalamba Jayati Ranjani & Gayathri (F) 11.36

Table 4.19: Table describing Dataset for Raga Darbari

Sl. No. Song Name Artist Name Duration 1 Chalamela jesuvura Prasanna Venkataraman (M) 8.2

51 Sl. No. Song Name Artist Name Duration 2 Ramabhirama MS Gopalakrishnan (M) 5.43 3 Ne vedikkani Srivatsan (M) 4.26 4 Thyagaraja Dhanyam Nedunuri (M) 5.27 5 Halasyanandham Rajeswari Satish (F) 6.28 6 Aparathamulama piyadukovayya Ashwath narayanan (M) 10.3 7 Yocana Kamalalochana TM Seshagopalan (M) 4.51 8 Entundi Vedalithivo TM Krishna (M) 6.8 9 Ela Theliyalero Abhishek Raghuram (M) 6.12 10 Naradhaguruswami ikanina RK Srikantan (M) 4.58 11 Mundhuvenuka Niruprakkalathodai Ramakrishnan Murthy (M) 3.01 12 Ramabhirama Ramaneeyarama Ranjani & Gayathri (F) 5.6 13 Nityaroopa Evaripandithyamemi TV Sankaranarayan (M) 4.54

Table 4.20: Table describing Dataset for Raga Darbari Kanada

Sl. No. Song Name Artist Name Duration 1 Maya tita swaroopini Bombay Jayashri (F) 6.09 2 Alapana MD Ramanathan (M) 8.14 3 Dorakuna Ramakrishnan Murthy (M) 6.28 4 Smara Janaka Subhacharitha Amruta Venkatesh (F) 10.36 5 Margazhi Thingal Sumitra Vasudev (F) 6.38 6 Ni pada mule Pt (M) 3.49

52 Table 4.21: Table describing Dataset for Raga Surthi

Sl. No. Song Name Artist Name Duration Thumbnails 1 Ento Premato Nityasree Mahadevan (F) 8.12 2 Angaraka Ashrayamyaham Sanjay Subramanyam (M) 3.42 3 Kosalendraya Sikki Gurucharan (M) 2.12 4 Kelati Mam Hradaye Mangalam Semmangudi Srinivasa Iyer (M) 3.18 5 Kannallavo swami Mahati (F) 4.16 6 Devadi Devanukku Jaya mangalam S Sundar (M) 3.48

Table 4.22: Table describing Dataset for Raga Varali

Sl. No. Song Name Artist Name Duration SlNo Name of the song Name of the artist (M/F) Duration of the song 1 Kanakana Ruchira Maharajapuram Santanam (M) 12.19 2 Eti Janmamiti Ranjani Gayathri (F) 11.49 3 MArakathamani MS Subbalakshmi (F) 2.19 4 Mamava Meenakshi (M) 4.05 5 Seshachala S Gayathri (F) 6.37 6 Karunajudavamma Gayathri Girish (F) 4.28 7 Bangaru Kamakshi Bombay Jayashri (F) 6.48 8 Mamava Padmanabha Amruta Venkatesh (F) 13.5 9 Intha prodayye Sri Nanditha ravi (F) 4.14 10 Valapu Rama Ravi (F) 7.52 11 Ka vaa vaa Aruna Sriram (F) 8.32 12 Valayunniha Prince Rama Varma (M) 6.4 13 Kannare Kande achyutana M Balamurali Krishna (M) 5.38 14 Ne pagoda Ariyakudi Ramanuja Iyengar 4.36

53 Table 4.23: Table describing Dataset for Raga Sahana

Sl. No. Song Name Artist Name Duration 1 Raghupate rama rakshasa bhima (M) 2.15 2 Ee Vasudha Sanjay Subramanyam (M) 5.43 3 Giripai nelakona Amruta Venkatesh (F) 3.48 4 dEhi tavapada bhaktim MD Ramanathan (M) 6.24 5 oorake kalguna Jayaraman (M) 8.4 6 Emana Dichevo Semmangudi Srinivasa Iyer (M) 4.29 7 Vandanamu raghunandana M Balamurali krishna (M) 5.37 8 Rama ika nannu brovara Bombay Jayashri (F) 9.39 9 Manamu kavalannu TN Seshagopalan (M) 14.42 10 Sri kamalambikaya DK Jayaraman (M) 3.01 11 Inkevarunnaaru nannu R Vedavalli (F) 5.13 12 Sri Vatapi Ganapathiye Maharajapuram Santhanam (M) 7.15 13 Kavave Kanyakumari Abhishek Raghuram (M) 6.33

Table 4.24: Table describing Dataset for Raga Nattai

Sl. No. Song Name Artist Name Duration 1 Pahimam Sri raja Maharajapuram S Ramachandran (M) 7.02 2 Sarasijanaabha murare Gayathri Venkataraghavan (F) 8.44 3 Mahaganapathim Amruta Venkatesh (F) 4.35 4 Jaya jaya jaya janaki kantha Sangeetha Swaminathan 9.2 5 Jagadananda Lalgudi Jayaraman 5.27 6 Siva Thrayam KV Narayanaswami (M) 6.27 7 Shivatrayamahaganapatim MD Ramanathan (M) 5.47 8 Pavanatmajagachha Amrita Murali (F) 6.01 9 Sri Gajananana M Balamurali Krishna(M) 3.42

54 Sl. No. Song Name Artist Name Duration 10 Sri rajadhiraja Maharajapuram S Srinivasan (M) 3.48 11 Ninne Bhajana Abhishek raghuram (M) 10.4

Table 4.25: Table describing Dataset for Raga Kuntalavarali

Sl. No. Song Name Artist Name Duration 1 Bhogindra Shayinam Mysore Brothers (M) 5.25 2 Kanvilum kalabhyam KV Narayanaswami (M) 3.35 3 Ozhukkam Uyirinum Sanjay Subramanyam (M) 7.37 4 Thillana Balamurali Krishna (M) 4.56 5 Sarasarasamare U Shrinivas(M) 7.3 6 Shivara namavenre Malladi brothers (M) 5.4 7 Sharavanabhava Mysore T Chowdiah (M) 6.06 8 Tungatarange gange Sadasiva Brahmendra (M) 4.27

Table 4.26: Table describing Dataset for Raga Saurastram

Sl. No. Song Name Artist Name Duration 1 Mangalam kosalendraya Ranjana Gayathri (F) 2.12 2 Pavamana MS Subbalakshmi (F) 6.35 3 Sri Ganapathini Maharajapuram Santanam (M) 4.28 4 Ninu jhuchi Sanjay Subramanyam (M) 3.58 5 Meluko dayanidhi Balamurali Krishna (M) 8.12 6 Sharanu siddi vinayaka U Srinivas (M) 3.56 7 Jaya jaya swamin Unnikrishnan (M) 4.48 8 Sarasijanabha nin charana Shankaran Namboodri (M) 3.44

55 Sl. No. Song Name Artist Name Duration 9 Navagruha Kriti Suryamurthe Dr S Ramanathan (M) 6.02 10 Aadhuvum solluval Mrs Geetha Raja (F) 3.45

Table 4.27: Table describing Dataset for Raga Nilambari

Sl. No. Song Name Artist Name Duration 1 Ennagamanasu K V Narayanaswami (M) 4.21 2 Madhava mamava deva TM Krishna (M) 4.03 3 Amba nilambari Vasundara rajagopal (M) 10.42 4 60th melakartha Neetimati S Balachandar (M) 6.16 5 Ennakavi Anayampatii Dhandapani S (M) 5.09 6 Shringara Lahari Bombay Jayashri (F) 3.02 7 Maninoopura Dhari Bombay Sisters (F) 3.45 8 Uyyala Loogavayya M Balamurali krishna (M) 6.02

Table 4.28: Table describing Dataset for Raga Vasantha

Sl. No. Song Name Artist Name Duration 1 Seethamma maayyamma Gayathri Girish 4.52 2 Hari hara putram Gayatri venkataraghavan 9.45 3 Ramachandram S Balachandar 2.33 4 Vade Venkatadri Bombay Sisters 5.06 5 Anadamrutha Bombay Jayashri 3.45 6 Vasantha Jaya mala 5.13 7 Ninnu Kori Balamurali Krishna 4.57 8 Thaaiya supradiku ML Vasantha Kumari 5.53 9 Sri Kamakshi Lakshmi Rangarajan (F) 6.58

56 Sl. No. Song Name Artist Name Duration 10 Komma Thana Balamurali krishna (M) 6.11 11 Nannu brova Prince Rama Varma (M) 6.5 12 Thaaiya Supradiku ML Vasantha kumari (F) 10.4 13 Sri Kamakshi Lakshmi Rangarajan (F) 5.58

Table 4.29: Table describing Dataset for Raga Kapi

Sl. No. Song Name Artist Name Duration 1 jagadodarana Mysore Brothers (M) 7.01 2 Inka soukhyamani Nisha P Rajagopal (F) 14.28 3 Janakiramana Nedunuri (M) 3.54 4 Pazhani Nindra Sudha Raghunathan (F) 3.02 5 Intasaukya Aneesh Vidyashankar (M) 10.08 6 kalilo hari smrana Sanjay Subramanyam (M) 15.03 7 kodandamanita kara TM Rangachari (M) 1.53 8 Bandadella barali Gayatri Venkatesan (F) 4.58 9 Mee Valla Gunadosha MEmi KV Narayanaswami (M) 6.33 10 Venkatachalapate Ninu Nammitit TM Tyagarajan (M) 3.14 11 Viharamanasa Rame Prince Rama Varma (M) 8.18 12 Smarasi Pura MK Sankaran Namboothiri (M) 4.44 13 Sree Madhavamanu Gayathri Girish (F) 5.42 14 Enna Thavam Seydanai Yashoda Chinmaya Sisters (F) 3.24 15 Bhaja Maadhavam Anisham RK Srikantan (M) 4.48 16 Maya Gopabala Amruta Venkatesh (F) 5.36 17 Upaakhayanam Sudha Raghunathan (F) 2.16 18 Chinnanchiru Kiliye Aishwarya and S Saundarya (F) 6.06 19 Kurai Ondrum Illai MS Subbalakshmi (F) 4.19

57 Sl. No. Song Name Artist Name Duration 20 Karthikeyanai Mahesh Raghavan (F) 5.31 21 Aravinda Padamalar S Gayathri (F) 8.32 22 Charanamule Nammitit Malladi brothers (M) 6.2 23 Kanaka Simham TM Krishna (M) 4.37 24 Kanna va mannivanna va Sriranjani Santhanagopalan (F) 7.56 25 Nee Mattume En nenjil nirkirai Ambujam Krishna 5.02 26 Puththam pudhu vasantham Anantha lakshmi Sadagopan 2.24 27 Rattiname Ariyakudi Ramanuja Iyengar 3.06 28 Sarasamulaade Chembai Vaidyanatha Bhagavatha 12.34

Table 4.30: Table describing Dataset for Raga Kedaram

Sl. No. Song Name Artist Name Duration 1 Maya tita swaroopini Unnikrishnan (M) 6.09 2 Alapana MD Ramanathan (M) 4.58 3 Dorakuna Sanjay Subramanyam (M) 3.48 4 Smarana Janaka Subhacharitha Prince Rama Varma (M) 2.56 5 Margazhi Thingal Sumitra Vasudev (F) 8.16 6 Ni pada mule Gayathri Venkataraghavan (F) 6.46 7 Sukha vazhvu Trivandrum sisters (F) 4.32

Table 4.31: Table describing Dataset for Raga Kanada

Sl. No. Song Name Artist Name Duration 1 Mariyemi Anantha lakshmi Sadagopan 5.46 2 Alai paayude Anayampatti dhandapani S 2.36 3 Neranammi Aruna Sairam ( 1.48

58 Sl. No. Song Name Artist Name Duration 4 Sri Narada Ariyakudi Ramanuja Iyengar (M) 3.46 5 Sri Matrubhutam Bombay Jayashri (F) 4.52 6 Alaipayute Bombay Sisters(F) 3.48 7 Jaya Mangala Nagaraj M (M) 4.59 8 Mamava sada Manjunath M Mysore (M) 6.57 9 Yenna Solli Lalitha Krishnan (F) 6.09 10 Brihadeesvara M Balamurali Krishna (M) 10.06 11 Charumati Malladi Brothers (M) 9.56 12 Ni kanta MD Ramanathan (M) 4.03 13 Paramukha ML Vasantha Kumari (F) 2.54 14 Na ninna dhyana M Balamurali krishna (M) 3.03 15 Mahaganapathe MS Gopalakrishnan (M) 11.36 16 Sukiewarao Mysore brothers (M) 3.07 17 Bahajare bhajamanasa MS Subbalakshmi (F) 13.38 18 Alaipayuthe Amrutha venkatesh (F) 14.58 19 Varnam Lalgudi Jayaraman (M) 10.32 20 Thirkedeswara Santhanam (M) 5.36 21 Raagam Thanam Pallavi Ranjani Gayathri (F) 8.24 22 Sukhi Evaro Jaishankar (M) 1.28

Table 4.32: Table describing Dataset for Raga Khamus

Sl. No. Song Name Artist Name Duration 1 Tiru kanden Sikkil Gurucharan (M) 6.3 2 Sarasadala Nayana Chittibabu (M) 4.28 3 Idadu padam Semmangudi Srinivasa Iyer (M) 5.42

59 Sl. No. Song Name Artist Name Duration 4 Hare Krishna Dr Pantula Rama (M) 4.36 5 Sitapathe TM Krishna (M) 5.2 6 Brochevaarevarura S Soumya (F) 16.13 7 Karpuram Abhishek Raghuram (M) 14.25 8 Naaruma Karupuram Sanjay Subramanyam (M) 5.19 9 Payum oi nee yenakku Sangeetha swaminathan (M) 5.36 10 Evala nannu Aruna Sairam 4.32 11 Sonnadoravani Ariyakudi Ramanuja Iyengar (M) 7.04 12 Dolayam Bombay Sisters (F) 2.46 13 Apadooraku PR Varma (M) 3.53

Table 4.33: Table describing Dataset for Raga Sri

Sl. No. Song Name Artist Name Duration 1 Sami Ninne Koriyunnanu N Ramani (F) 4.42 2 Endaro Mahanu Ariyakudi Ramanuja Iyengar (M) 4.1 3 Sri M Balamurali Krishna (M) 5.35 4 Vande Vasudevam MS Subbalakshmi (F) 9.3 5 Varnam Saminine Hyderabad Brothers (M) 5.08 6 Sri abhayamba DK Jayaraman (M) 9.04 7 Entharo Gayathri (F) 12.49 8 Sri varalakshmi namastubyam Bombay Jayashri (F) 8.05 9 Nama kusuma MD Ramanathan (M) 10.36 10 Sami Ninnekori Sreevalsan J Menon (M) 9.04 11 Yuktamu Gadu Amrita Murali (F) 7.1 12 Sri Muladhara Chakra Vinayaka TM Krishna (M) 3.01 13 Sri Vishvanatham Bhaje Sudha Raghunathan (F) 4.05

60 Raga identification is often a rather difficult task considering that two ragas which are de- rived from the same. ragas have similar structure and may sound indistinguishable. The performance of our model shows a considerable dip in accuracy due to the presence of pairs of raga which are hard to distinguish. Examples of such raga pairs include:

• Kedaram and Nilambari - Janya ragas derived from Shankarabharanam.

• Kanada and Darbari Kanada - both belong to a group of Carnatic-derived ragas that share a similar minor-scale phraseology

Literature survey of Audio thumbnailing has been explained in Chapter-2. For comparison of proposed methodology, with existing methodologies, we have considered two algorithms [15] and [75]. The first one uses Chroma features, however, it is only applicable to Western Music. The second one uses MFCC features. When the thumbnails are generated with the help of the first approach, by keeping the database and classification model as constant, we found out that thumbnail classification accuracy was about 71.42% on average for Raga classification. The second one, in contrast gave a classification accuracy of 49.83%. This implicitly recon- firms the importance of Chroma based features for Music structural analysis. The thumbnail generation for the second approach is mostly at the beginning of the song. For baseline comparison, an experiment has been performed by taking a random 20 second audio piece from the musical recording and tried to classify the raga based on it. The average classification accuracy by randomly choosing an audio piece is 64.58% by keeping the same classification model. A set of 20best examples for Raga/Tala classifica- tion using thumbnails from the proposed methodology are saved in the following location: https://drive.google.com/drive/folders/1qPjT0htl8LPNSBx4P3rZ5x3V8mZDSwiy?usp=sharing

4.1.2 Dataset for Tala Classification

The publicly available database COMP MUSIC was used for our studies [95]. Total 140 songs, of which 100 songs were used as training set and 40 songs were used as test data set. Five Talas were used which are as follows: Adi talam (cycle of 8 beats), Rupaka talam(cycle of

61 Table 4.34 Table Describing Dataset for Tala Classification

Name of the Tala No.of beats Training and Testing set Average duration of songs in min

Adi 8 (20,8) 07.20

Rupaka 6 (20,8) 08.10

Tisra Jati Eka 3 (20,8) 09.05

Khanda Jati Eka 5 (20,8) 08.20

6 beats), Chaturasra jati Eka talam(cycle of 4 beats), Trisra jati Eka(cycle of 3 beats), Khanda Jati Eka(cycle of 5 beats). Selection of these Talas is based on unique number of beats. Table 4.34 illustrates the division of dataset for training and testing.

The CNN-RNN classification model significantly showed higher accuracy levels compared to standard SVM Classifier. CNN classifier produces a test accuracy of 86.08% where the SVM classifier gave an accuracy of 65.58%.

Table 4.35: Table describing Dataset for Tala Adi

Sl. No. Song Name Artist Name Duration Thumbails 1 Ni pada mule Sumitra Vasudev (F) 17.11 5 2 Paratpara Malladi Brothers (M) 9.5 3 3 Balagopala Sangeeta Swaminathan (M) 9.48 3 4 Gana nayaka Maharajapuram S Ramachandran (M) 6.52 2 5 Vazhi Maraittirukkude TM Krishna (M) 5.06 1 6 Dwaitamu sukhama Semmanagudi Srinivasa Iyer (M) 4.12 1 7 Chalame Sanjay subramanyam (M) 6.35 2

62 Sl. No. Song Name Artist Name Duration Thumbails 8 Thanam Pallavi and GN Subramaniam (M) 3.13 1 9 Tamarasadala Netri mitri Unnikrishnan (M) 4.09 1 10 Vatapi Kunnakudi M B Krishna(M) 5.35 1

Table 4.36: Table describing Dataset for Tala Rupaka

Sl. No. Song Name Artist Name Duration Thumbails 1 Durgamarcaradham Malladi brothers (M) 8.38 2 2 KV Narayanaswami (M) 624 2 3 Pahimam Sri Raja Maharajapuram S Ramachandran (M) 7.05 2 4 Sri guruna palitosmi Sumitra vasudev (F) 6.42 2 5 Sobhillu sapthaswara Chandrasekhar (M) 8.21 2 6 Gopalaka pahimam Gayathri venkataraghavan (F) 6.21 2 7 Durmarga chara Krishnan TN 9.52 3 8 Ninnuvina Chittbabu (M) 6.14 2 9 Keechu keechu Hyderabad brothers (M) 9.38 3

Table 4.37: Table describing Dataset for Tala Tisra Jati Eka

Sl. No. Song Name Artist Name Duration Thumbails 1 Soundara rajam Ranjani Guruprasad (F) 7.23 2 2 Shri subramanya namaste MD Ramanathan (M) 15.18 5 3 Sri vidya rajapolam R Vedavalli (F) 4.05 1 4 Villinai Sumitra Vasudev (F) 9.47 3 5 Alankaram Sanjay Subramanyam (M) 8.02 2 6 Varnam Ranjani Gayathri (F) 9.42 3

63 Sl. No. Song Name Artist Name 7 Namaste sri Nisha P Rajagopal (F) 8.09 2 8 Vidya Sri Nityasree mahadevan (F) 6.36 2

Table 4.38: Table describing Dataset for Tala Khanda Jati Eka

Sl. No. Song Name Artist Name Duration Thumbnail 1 Alankaram Nedunuri (M) 9.42 3 2 Varnam Thiruvayur vaidyanathan (M) 15.28 5 3 Va velava Maharajapuram S Ramachandran (M) 8.21 2 4 Tiru kandenkarpuram narumo Sikkil gurucharan (M) 6.36 2 5 Bhogindra shayinam Mysore brothers (M) 6.25 2 6 Ragam thanam pallavi Unnikrishnan (M) 3.48 1 7 Tanigai Valar Sanjay Subramanyam (M) 5.02 1 8 Anupamagunabhudhi Gayathri girish (F) 6.09 2 9 Marivere MS Gopalakrishnan (M) 5.31 1 10 Jaya jaya jaya janaki kantha Sangetha swaminathan (F) 4.2 1

Testing Dataset:

Table 4.39: Table describing Dataset for Tala Adi

Sl. No. Song Name Artist Name Duration 1 Mahaganapathim Maharajapuram santhanam (M) 4.35 2 Bhayada (M) 4.23 3 Karunimpa idi manchi Amruta venkatesh (F) 8.06 4 Annapoorne Gayathri Girish (F) 5.46 5 Arul Seiya Vendum Ayya Vidwan Lalgudi GJR Krishnan (M) 6.18

64 Sl. No. Song Name Artist Name Duration 6 Nidu charana KV Narayanaswami (M) 4.35 7 Thayi Ezhai DK Jayaraman (M) 3.04 8 Varnam-chalamela Chittibabu (M) 7.39

Table 4.40: Table describing Dataset for Tala Rupaka

Sl. No. Song Name Artist Name Duration 1 Sivakama sundari ML Vasantha kumari (F) 6.26 2 Bhujaga Sayinam Parassala Ponnammal (F) 4.22 3 Pazhani Nindra Sudha Raghunathan (F) 6.47 4 Hmiadrisuthe Nedunuri Krishna murthy (M) 5.36 5 Biranavara MD Ramanathan (M) 2.46 6 Angaraka ashrayamyaham Semmangudi Srinivasa Iyer (M) 8.36 7 Deva kalayami Vijay siva (M) 4.52 8 Govardhana Girisham Nityasree mahadevan (F) 3.54

Table 4.41: Table describing Dataset for Tala TisraEka

Sl. No. Song Name Artist Name Duration 1 Soundararajam Ranajani Guruprasad (F) 7.3 2 Rajagopalam Vedavalli (F) 13.28 3 Tisra nadai jathi Lakshmi venkatesh (F) 4.56 4 Margazli Sumitra VAsudev (F) 3.02 5 Ni padamule MD Ramantahan (M) 6.34 6 Namaste subramanya Sanjay subramanyam (M) 5.43 7 Manasija koti lavanya Chaitra sairam (F) 7.13 8 Paratpara Malladi brothers (M) 5.49

65 Table 4.42: Table describing Dataset for Tala Khanda Eka

Sl. No. Song Name Artist Name Duration 1 Karupuram Naarumo Abhishek Raghuram (M) 6.36 2 Govinda Alarmelmagai Manala Va Tiruvenkatanatha Sanjay subramanyam (M) 5.19 3 Anupamagunambudi KV Narayanaswami (M) 7.04 4 Hechariga ra ra MD Ramanathan (M) 4.32 5 Bhogindra shayinam MS Gopalakrishnan (M) 9.25 6 Janaki kantha M R Subramaniam (M) 3.38 7 Gunambudhi S Soumya (F) 5.25 8 Tanigai Valar Sanjay Subramanyam (M) 4.37

4.2 Summary and Conclusions

Using the thumbnailing instead of complete musical recording to extract features for train- ing and testing reduces the feature vector sizes to significant extent. However, extracting the thumbnail is a computationally expensive task. Hence, extracting the thumbnail out of longer recordings, is observed to be time consuming. On scrutinizing the dataset, it was observed that recordings of the same raga are not always performed from the same key. This is characteristic to a lot of renditions in Indian Classical Music which do not rely on absolute scales or pitch position. As a result, it can be argued that this detrimentally affects the performance of the model. This also opens up scope for future work in this regard.

In the next chapter, we will summarize the contribution of this thesis and briefly explain the scope for future work.

66 Chapter 5

Summary and Conclusions

The main aim of this thesis is to classify ragas and talas in carnatic music using audio thumbnailing. We discussed about Indian classical music, basic elements of classical music, machine learning approaches towards classification, generation of self similarity matrix, computation of audio thumbnails and formally explained them in detail. The concept of thumbnailing was introduced in Carnatic music for the first time.

The contributions are as follows:

• Brief literature review on Music Signal Processing: Brief introduction to Indian Art and its origin and a detailed review about existing music signal processing methodologies were discussed. Existing methodologies in Audio classification and thumbnailing were elaborated.

• Computation of Self-Similarity Matrix: SSM were explained briefly along with its struc- tural properties and computation of SSM with sample example is discussed. Elaborated the formal procedure to generate and enhance the self-similarity matrix.

• Generation of Audio thumbnails: Formally defined musical recording as feature se- quence and generated SSM, from there, computed the audio thumbnails for carnatic songs. Computation of fitness score, coverage score is discussed formally.

67 • Classification of ragas and talas: Parallel architecture of CNN RNN was proposed and used this to classify ragas and talas in carnatic music. Evaluated the model with carnatic music thumbnails.

5.1 Future Work

Tala classification is not explored by many in the area of Music Information Retrieval and the problem is unique to genres of Indian subcontinent. Existing researchers have used several statistical methodologies, using complete song. This thesis proposes a novel technique to use repetitive structure of musical recording, namely audio thumbnail to perform the classification. Audio thumbnailing in Carnatic music is not an explored area of research. We would like to extend our work on thumbnails towards mrdangam strokes identification and classification of all 175 talas in Carnatic music.

The method presents several directions in which future work can be conducted; such as improving the performance of the thumbnailing process. Our work also presents possibilities of using audio thumbnails in other MIR tasks specific to Indian classical music such as motif extraction, genre recognition etc. As observed earlier, the pitches in Indian classical music are relative to the key note rather than being absolute. Hence, identification of the key note and transposition of recordings to a single key can be a crucial modification to improve on the model proposed in this thesis.

There are many open challenges and exciting problems such as Audio thumbnailing, Audio matching, Version identification, Tala classification of all existing 175 talas, composer identification, album detection, generation of ragas automatically, identification of the song name etc. Human perception of music and its recognition is far beyond when compared to machine understanding and extracting. We hope these developments continue progressively and lessen this gap between human and machine perception and make music more colourful.

68 List of Publications

1. Amulya Sri Pulijala and Suryakanth V Gangashetty, Tala Classification in Carnatic Music using Audio Thumbanailing, In Proc. International Conference on Signal and Image Processing (SIPRO), , United Kingdom, 2020.

2. Amulya Sri Pulijala and Suryakanth V Gangashetty, Music Signal Processing - A Lit- erature Survey, In Proc. Frontiers of Research in Speech and Music, NIT Silchar, 2020.

3. Amulya Sri Pulijala, Aditya, Sakti Balan and Suryakanth V Gangashetty, Raga Classi- fication in Carnatic Music Using Audio Thumbnailing, In Proc. ACM Multimedia 2020, Seattle, United States, 2020. (Submitted to)

69 Bibliography

[1] Rajiv Trivedi. Bharatiya Shastriya Sangeet: Shaastra, Shikshan va Prayog. Sahitya Sangam, New , India,, 2008.

[2] Divya Mansingh Kaul. Hindustani and Persio-Arabian Music. Kanishka Publishers, Distributors, first edn.[Cited on page 15.], 2007.

[3] . Indian Music. Sangeet Research Academy, 1995.

[4] Manorma Sharma. Musical Heritage of India. APH Publishing, 2007.

[5] Anuradha Mahesh. Origin of Swara, 2017 (accessed June 3, 2020).

[6] Vidya Sankar. Tala Anubhava of Music by Trinity. Journal of Music Academy, 2002.

[7] Upbeat labs. A Primer for Carnatic Talas, 2018 (accessed June 3, 2020).

[8] MANNARKOIL J.BALAJI. Introduction to Talas and Sapta Tala System, 2008 (accessed June 3, 2020).

[9] Catherine Stevens. Cross-Cultural Studies of Musical Pitch and Time. Acoustical science and technology, 2004.

[10] Mary A Castellano, Jamshed J Bharucha, and Carol L Krumhansl. Tonal Hierarchies in the Music of North India. Journal of Experimental Psychology: General, 1984.

[11] Sankalp Gulati, Justin Salamon, and Xavier Serra. A Two-stage Approach for Tonic Identification in Indian Art Music. In Proceedings of the 2nd CompMusic Workshop;

70 2012 Jul 12-13; Istanbul, Turkey. Barcelona: Universitat Pompeu Fabra; 2012. p. 119- 127, 2012.

[12] Margaret J Kartomi. On concepts and classifications of musical instruments. University of Chicago Press Chicago, 1990.

[13] Erich M Von Hornbostel and Curt Sachs. Classification of Musical Instruments: Trans- lated from the Original German by Anthony Baines and Klaus P. Wachsmann. The Galpin Society Journal, 1961.

[14] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT press, 2016.

[15] Meinard Muller, Nanzhu Jiang, and Peter Grosche. A Robust Fitness Measure for Cap- turing Repetitions in Music Recordings with Applications to Audio Thumbnailing. IEEE Transactions on audio, speech, and language processing, 2012.

[16] Mark A Bartsch and Gregory H Wakefield. Audio Thumbnailing of Popular Music using Chroma-based Representations. IEEE Transactions on multimedia.

[17] Meinard Muller.¨ Fundamentals of Music Processing: Audio, Analysis, Algorithms, Ap- plications. Springer, 2015.

[18] Xavier Serra. A Multicultural Approach in Music Information Research. In Klapuri A, Leider C, editors. Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011.

[19] Dongmei Wang and Qinghua Huang. Single Channel Music Source Separation based on Harmonic Structure Estimation. In Proceedings of International Symposium on Circuits and Systems, 2009.

[20] Hui Wang, Ying Wang, Weina Wang, Bing Zhu, and Sai Ma. Single Channel Polyphonic Music Signal Separation based on Bayesian Harmonic Model. In Proceedings of 4th International Congress on Image and Signal Processing, 2011.

71 [21] Les Atlas and Christian Janssen. Coherent Modulation Spectral Filtering for Single- Channel Music Source Separation. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

[22] Taejin Park and Kyeng Ok Kang. Background Music Separation for Multichannel Audio based on Inter-Channel Level Vector Sum. In Proceedings of 18th IEEE International Symposium on Consumer Electronics (ISCE 2014), 2014.

[23] Michael Syskind Pedersen, Jan Larsen, Ulrik Kjems, and Lucas C Parra. Convolutive Blind Source Separation Methods. In Springer handbook of speech processing. 2008.

[24] Raphael Blouet, Guy Rapaport, Cohen, and Cedric Fevotte. Evaluation of Several Strategies for Single Sensor Speech/Music Separation. In Proceedings of IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing, 2008.

[25] Alexey Ozerov and Cedric´ Fevotte.´ Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation. IEEE Transactions on Audio, Speech, and Language Processing, 2009.

[26] Emad M Grais and Hakan Erdogan. Single Channel Speech Music Separation using Non- negative Matrix Factorization and Spectral Masks. In Proceedings of 17th International Conference on Digital Signal Processing (DSP), 2011.

[27] Tuomas Virtanen. Monaural Sound Source Separation by Nonnegative Matrix Factor- ization with Temporal Continuity and Sparseness Criteria. IEEE transactions on audio, speech, and language processing, 2007.

[28] Mikkel N Schmidt and Morten Mørup. Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation. In Proceedings of International Conference on Independent Component Analysis and Signal Separation, 2006.

[29] Semi-supervised {NMF} with Time-Frequency Annotations for Single-Channel Source Separation. In Proceedings of 13th International Society for Music Information Retrieval.

72 [30] Po-Sen Huang, Minje Kim, Mark Hasegawa-, and Paris Smaragdis. Joint Opti- mization of Masks and Deep Recurrent Neural Networks for Monaural Source Separa- tion. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.

[31] Yukio Fukayama. A modified Wiener Filter suitable for Separation of Individual In- strumental Sounds in Monaural Music Signals. In Proceedings of IEEE Conference on Norbert Wiener in the 21st Century (21CW), 2014.

[32] Yukio Fukayama. Separation of Individual Instrumental Tones in Monaural Music Signals Applying a Modified Wiener Filter and the Gabor Wavelet Transform. In Proceedings of the ISCIE International Symposium on Stochastic Systems Theory and its Applications, 2019.

[33] Jonathan Le Roux and Emmanuel Vincent. Consistent Wiener Filtering for Audio Source Separation. IEEE signal processing letters, 2012.

[34] David Gunawan and Deep Sen. Music Source Separation Synthesis using Multiple Input Spectrogram Inversion. In Proceedings of IEEE International Workshop on Multimedia Signal Processing, 2009.

[35] Michael Zibulevsky. Blind Separation of more Sources than Mixtures using Sparsity of their Short-Time Fourier Transform. jjAS.

[36] Pau Bofill and Michael Zibulevsky. Underdetermined Blind Source Separation using Sparse Representations. Signal processing, 2001.

[37] Po-Sen Huang, Scott Deeann Chen, Paris Smaragdis, and Mark Hasegawa-Johnson. Singing-Voice Separation from Monaural Recordings using Robust Principal Component Analysis. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.

[38] Yukara Ikemiya, Kazuyoshi Yoshii, and Katsutoshi Itoyama. Singing Vice Analysis and Editing based on Mutually Dependent F0 Estimation and Source Separation. In Pro-

73 ceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.

[39] Antoine Liutkus, Zafar Rafii, Roland Badeau, Bryan Pardo, and Gael¨ Richard. Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure. In Pro- ceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012.

[40] Zafar Rafii and Bryan Pardo. Music/Voice Separation Using the Similarity Matrix. In Proceedings of International Society for Music Information Retrieval (ISMIR), 2012.

[41] Zafar Rafii and Bryan Pardo. Repeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation. IEEE transactions on audio, speech, and language processing, 2012.

[42] Yuan-Pin Lin, Chi-Hong Wang, Tzyy-Ping Jung, Tien-Lin Wu, Shyh-Kang Jeng, Jeng- Ren Duann, and Jyh-Horng Chen. EEG-based Emotion Recognition in Music Listening. IEEE Transactions on Biomedical Engineering, 2010.

[43] Konstantin Markov and Tomoko Matsui. Music Guenre and Emotion Recognition sing Gaussian Processes. IEEE access, 2014.

[44] Satoru Fukayama and Masataka Goto. Music Emotion Recognition with Adaptive Aggre- gation of Gaussian Process Regressors. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.

[45] Parag Chordia and Alex Rae. Raag Recognition Using Pitch-Class and Pitch-Class Dyad Distributions. In Proceedings of International Society for Music Information Retrieval (ISMIR), 2007.

[46] Rajeswari Sridhar and TV Geetha. Raga Identification of Carnatic Music for Music In- formation Retrieval. International Journal of recent trends in Engineering, 2009.

74 [47] Pranay Dighe, Harish Karnick, and Bhiksha Raj. Swara Histogram Based Structural Analysis And Identification Of Indian Classical Ragas. In Proceedings of International Society for Music Information Retrieval (ISMIR), 2013.

[48] G Padmasundari and Hema A Murthy. Raga Identification using Locality Sensitive Hash- ing. In Proceedings of Twenty-third National Conference on Communications (NCC), 2017.

[49] Shrey Dutta, Krishnaraj Sekhar PV, and Hema A Murthy. Raga Verification in Carnatic Music Using Longest Common Segment Set. In Proceedings of International Society for Music Information Retrieval (ISMIR), 2015.

[50] CV Jawahar Vijay Kumar, Harit Pandya. Identifying Ragas in Indian Music. In Proceed- ings of 22nd International Conference on Pattern Recognition, 2014.

[51] Trupti Katte. Multiple Techniques for Raga Identification in Indian Classical Music. International Journal of Electronics and Computer Engineering, 2013.

[52] Sankalp Gulati, Joan Serra, Vignesh Ishwar, Sertan Senturk,¨ and Xavier Serra. Phrase- based Raga¯ Recognition using Vector Space Modeling. In Proceedings of IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.

[53] Alex Kan, EDU Akshay Sankar, Svetak Sundhar, and EDU Anthony Yang. A Comparison of Machine Learning Approaches to Classify Tala COMP 562. 2017.

[54] Ajay Srinivasamurthy and Xavier Serra. A Supervised Approach to Hierarchical Metrical Cycle Tracking from Audio Music Recordings. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.

[55] KP Nitha and ES Suraj. An Algorithm for Detection of Tala in Carnatic Music for Music Therapy Applications.

[56] Jilt Sebastian and Hema A Murthy. Onset Detection in Composition Items of Carnatic Music. In Proceedings of International Society for Music Information Retrieval (ISMIR), 2017.

75 [57] Carlos Guedes, Konstantinos Trochidis, and Akshay Anantapadmanabhan. Modeling Carnatic Rhythm Generation: A Data-Driven approach based on Rhythmic Analysis. In Proceedings of the 15th Sound & Music Computing Conference, 2018.

[58] Rushiraj Heshi, SM Suma, Shashidhar G Koolagudi, Smriti Bhandari, and KS Rao. Rhythm and Timbre Analysis for Carnatic Music Processing. In Proceedings of 3rd in- ternational conference on advanced computing, networking and informatics, 2016.

[59] Gopala Krishna Koduri, Vignesh Ishwar, Joan Serra,` and Xavier Serra. Intonation Anal- ysis of Ragas¯ in Carnatic Music. Journal of New Music Research, 2014.

[60] Gopala Krishna Koduri, Joan Serra` Julia,` and Xavier Serra. Characterization of Intona- tion in Carnatic Music by Parametrizing Pitch Histograms. In Proceedings of the 13th International Society for Music Information Retrieval Conference, 2012.

[61] M Subramanian. Carnatic Ragam Thodi-Pitch Analysis of Notes and Gamakams. Journal of the Akademi, 2007.

[62] Kaustuv Ganguli and Carlos Guedes. An Approach to adding Knowledge Constraints to a Data-Driven Generative Model for Carnatic Rhythm sequence. Trends in Electrical Engineering, 2019.

[63] Konstantinos Trochidis, Carlos Guedes, Akshay Anantapadmanabhan, and Andrija Klaric. CAMeL: Carnatic Percussion Music Generation using n-gram Models. In Pro- ceedings of 13th sound and music computing conference (SMC), Hamburg, Germany, 2016.

[64] Ashwin Bellur and Hema A Murthy. A Nvel Application of Group Delay Function for Identifying Tonic in Carnatic Music. In Proceedings of 21st European Signal Processing Conference (EUSIPCO 2013), 2013.

[65] S Samsekai Manjabhat, Shashidhar G Koolagudi, KS Rao, and Pravin Bhaskar Ramteke. Raga andTonic Identification in Carnatic Music. Journal of New Music Research, 2017.

76 [66] Justin Salamon, Sankalp Gulati, and Xavier Serra. A Multipitch Approach to Tonic Iden- tification in Indian Classical Music. In Gouyon F, Herrera P, Martins LG, Muller¨ M. Proceedings of the 13th International Society for Music Information Retrieval Confer- ence; 2012 Oct 8-12; Porto, Portugal. Porto: FEUP Edic¸oes; 2012., 2012.

[67] Ashwin Bellur, Vignesh Ishwar, Xavier Serra, and Hema A Murthy. A Knowledge based Signal Processing approach to Tonic Identification in Indian Classical Music. In Serra X, Rao P, Murthy H, Bozkurt B, editors. Proceedings of the 2nd CompMusic Workshop, 2012.

[68] Stanly Mammen, Ilango Krishnamurthi, A Jalaja Varma, and G Sujatha. iSargam: Music Notation Representation for Indian Carnatic music. EURASIP Journal on Audio, Speech, and Music Processing, 2016.

[69] Pierfrancesco Bellini and Paolo Nesi. WEDELMUSIC format: An XML Music Notation Format for Emerging Applications. In Proceedings of First International Conference on WEB Delivering of Music. WEDELMUSIC, 2001.

[70] Holger Hoos. Representing Score-Level Music using the GUIDO Music-Notation For- mat. Computing in musicology: a directory of research, 2001.

[71] Rajeswari Sridhar and TV Geetha. Swara Identification for South Indian Classical Music. In Proceedings of 9th International Conference on Information Technology (ICIT’06), 2006.

[72] TR Prashanth and Radhika Venugopalan. Note Identification in Carnatic Music from Frequency Spectrum. In Proceedings of International Conference on Communications and Signal Processing, 2011.

[73] HG Ranjani, S Arthi, and TV Sreenivas. Carnatic Music Analysis: Shadja, Swara Iden- tification and Raga Verification in Alapana using Stochastic Models. In Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011.

77 [74] Masataka Goto. SmartMusicKIOSK: Music Listening Station with Chorus-Search Func- tion. In Proceedings of the 16th annual ACM symposium on User interface software and technology, 2003.

[75] Lie Lu and Hong-Jiang Zhang. Automated Extraction of Music Snippets. In Proceedings of the eleventh ACM international conference on Multimedia, 2003.

[76] Wei Chai and Barry Vercoe. Music Thumbnailing via Structural Analysis. In Proceedings of the eleventh ACM International Conference on Multimedia, 2003.

[77] Namunu C Maddage, Li Haizhou, and Mohan S Kankanhalli. Music Structure Analysis Statistics for Popular Songs. Recent Advances in Signal Processing, 2009.

[78] Hiroyuki Nawata, Noriyoshi Kamado, Hiroshi Saruwatari, and Kiyohiro Shikano. Auto- matic Musical Thumbnailing based on Audio Object Localization and its Evaluation. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Process- ing (ICASSP), 2011.

[79] Richard O Duda, Peter E Hart, and David G Stork. Pattern Classification. John Wiley & Sons, 2012.

[80] Bernhard E Boser, Isabelle M Guyon, and Vladimir N Vapnik. A Training Algorithm for Optimal Margin Classifiers. In Proceedings of the fifth Annual Workshop on Computa- tional Learning Theory, 1992.

[81] Thomas Cover and Peter Hart. Nearest Neighbor Pattern Classification. IEEE transac- tions on Information Theory, 1967.

[82] Honglak Lee, Peter Pham, Yan Largman, and Andrew Y Ng. Unsupervised Feature Learning for Audio Classification using Convolutional Deep Belief Networks. In Ad- vances in Neural Information Processing Systems, 2009.

[83] Emre Cakır, Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, and Tuomas Vir- tanen. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017.

78 [84] Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. CNN Architectures for Large-scale Audio Classification. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.

[85] Mikael Henaff, Kevin Jarrett, Koray Kavukcuoglu, and Yann LeCun. Unsupervised Learning of Sparse Features for Scalable Audio Classification. In Proceedings of In- ternatioal Society for Music Information Retrieval (ISMIR), 2011.

[86] Sankalp Gulati, Ashwin Bellur, Justin Salamon, Ranjani HG, Vignesh Ishwar, Hema A Murthy, and Xavier Serra. Automatic Tonic Identification in Indian Art Music: Ap- proaches and Evaluation. Journal of New Music Research, 2014.

[87] Meinard Muller¨ and Michael Clausen. Transposition-Invariant Self-Similarity Matrices. In Proceedings of International Society for Music Information Retrieval (ISMIR), 2007.

[88] Lin Feng, Shenlan Liu, and Jianing Yao. Music Genre Classification with Paralleling Recurrent Convolutional Neural Network. arXiv preprint arXiv:1712.08370, 2017.

[89] Brian McFee, Colin Raffel, Dawen Liang, Daniel PW Ellis, Matt McVicar, Eric Batten- berg, and Oriol Nieto. librosa: Audio and Music Signal Analysis in Python. In Proceed- ings of the 14th python in science conference, 2015.

[90] Alvaro´ Barbero Jimenez,´ Jorge Lopez´ Lazaro,´ and Jose´ R Dorronsoro. Finding Optimal Model Parameters by Discrete Grid Search. In Innovations in Hybrid Intelligent Systems. 2007.

[91] Fabian Pedregosa, Gael¨ Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learn- ing Research, 2011.

[92] Frauke Friedrichs and Christian Igel. Evolutionary Tuning of Multiple SVM parameters. Neuro computing, 2005.

79 [93] Petre Lameski, Eftim Zdravevski, Riste Mingov, and Andrea Kulakov. SVM Parameter Tuning with Grid Search and its Impact on Reduction of Model over-fitting. In Rough sets, fuzzy sets, data mining, and granular computing. 2015.

[94] Antonio Gulli and Sujit Pal. Deep Learning with Keras. 2017.

[95] Alastair Porter, Mohamed Sordo, and Xavier Serra. Dunya: A System for Browsing Audio Music Collections Exploiting Cultural Context. In Britto A, Gouyon F, Dixon S. Proceedings of 14th International Society for Music Information Retrieval Conference (ISMIR), 2013.

80