International Journal of Electrical Engineering and Technology (IJEET) Volume 12, Issue 6, June 2021, pp. 251-258, Article ID: IJEET_12_06_024 Available online at https://iaeme.com/Home/issue/IJEET?Volume=12&Issue=6 ISSN Print: 0976-6545 and ISSN Online: 0976-6553 DOI: 10.34218/IJEET.12.6.2021.024

© IAEME Publication Scopus Indexed

A NORTH INDIAN RECOGNITION USING ENSEMBLE CLASSIFIER

Anagha A. Bidkar Research Scholar, Department of Electronics and Telecommunication, Vishwakarma Institute of Information Technology, And Pune Institute of Computer Technology, SPPU- Savitribai Phule Pune University, Pune, Maharashtra,

Rajkumar S. Deshpande Department of Electronics and Telecommunication, JSPM’s Imperial College of Engineering, SPPU- Savitribai Phule Pune University, Pune, Maharashtra, India

Yogesh H. Dandawate Department of Electronics and Telecommunication, Vishwakarma Institute of Information Technology, SPPU- Savitribai Phule Pune University, Pune, Maharashtra, India

ABSTRACT is an ancient art form. Western and Indian music differ in the sequence of musical notes that are present in the melodic segment. Raga recognition in Indian classical music has been an exciting area of music information retrieval system. This can be useful to create a music library, search raga related music, and music education system. Recognition of raga using machine learning algorithms is a very complex task. This paper aims to find a suitable classifier for a dataset of instrumental music of 12 . The music database has audio files of 4 different musical instruments. For this dataset, the ensemble bagged tree classifier outperforms the raga recognition. This approach suits our dataset to gain accuracy of 96.32%. This paper compares the results with the ensemble subspace KNN model which gives an accuracy of 95.83%. From the derived results, it is observed that ensemble classifiers are better for variants of MFCC features extracted for our North Indian Raga Dataset. Key words: North Indian Raga, Audio Feature Extraction, (MFCC) Mel Frequency Cepstral Coefficients, Ensembel Bagged Tree, Ensemble subspace KNN Cite this Article: Anagha A. Bidkar, Rajkumar S. Deshpande, Yogesh H. Dandawate, A North Indian Raga Recognition using Ensemble Classifier, International Journal of Electrical Engineering and Technology (IJEET), 12(6), 2021, pp. 251-258. https://iaeme.com/Home/issue/IJEET?Volume=12&Issue=6

https://iaeme.com/Home/journal/IJEET 251 [email protected] A North Indian Raga Recognition using Ensemble Classifier

1. INTRODUCTION Content-based music information retrieval system is the ongoing current research area. Developing the system for Indian classical music recognition is a challenging task. Indian classical music has two basic forms such as North Indian or Hindustani and South Indian or . This paper presents the work done on recognition of North Indian raga music generated by Indian musical instruments such as sitar, sarod, santoor and flute. Indian music is based on the concept of the raga. A raga has a specific combination of note-sequence described in music literature. In concert, the performer can improvise the raga segments as per his mood by considering the rules of the raga in mind. That has a spiritual impact on the performer and listener also. Due to this bizarre nature, it is a challenging task to analyse these music signals and make the machine learn and recognize the raga. All raga is characterized by several attributes like aroh means ascending sequence of notes; avroh means descending sequence of notes; vaadi is a prominent frequent repetition of note, samvadi is the second prominent repeating note and pakad has a specific set of notes which indicate raga; besides the sequence of notes. Two performances of the same person and another person may not be the same, there can be some variations, not identical. The study of musical ragas is carried out using music theory by extracting acoustic features of musical raga segments. The approach to recognising ragas in Indian classical music is presented in novel ways using two classifiers, and the results are promising. The rest of the paper is organized as follows. Section 2 highlights the literature review. Explanation of the proposed method is mentioned in section 3. In section 4 experimental results are presented. Finally, the conclusion and future scope are discussed in section 5.

2. LITERATURE REVIEW This section discusses various approaches and methods for identifying ragas in Indian classical music. Bhat et al. [1] compared classification models for 15 ragas of carnatic instrumental audio and achieved a 97% accuracy rate. As features, the researchers used spectral centroid, spectral bandwidth, spectral roll off, chroma features, and Mel Frequency Cepstral Coefficients (MFCC). Classifiers such as artificial neural network, XGboost, convolutional neural network, and Bidirectional long short-term memory process the features. The team mentioned that machine learning classifiers outperform deep learning models. Furthermore, the work can be expanded to include Hindustani classical music and a larger number of ragas. Kumar et al. [2] used 120 hours carnatic music dataset. Time delayed melody surface will extract melody tonal features and using a k-nearest neighbour with variations of distance measurements are analyzed. According to K. Pravin Kumar et al. [3], the 72 Melkarta Caranatic ragas are assigned a group label, and classification on the class-labeled data is performed using clustering. The J48 decision tree and PART (Projective Adaptive Resonance Theory) rule-based classifier correctly classified the groups with 94.4% accuracy. The Multilayer Perceptron gives 93.5% accuracy. The JRIP (Java-based Repeated Incremental Pruning to Produce Error Reduction) algorithm demonstrated 91.6% accuracy. The accuracy of the Nave Bayes and Random Forest algorithms was 90.2 %. K nearest Neighbor classifier gives 83.3 %. Anand [4] conducted experiments on Carnatic Comp-music datasets containing five and eleven ragas to develop a convolutional neural network (CNN) capable of learning the distinguishing characteristics of a raga from the predominant pitch values of a song. The model's accuracy was 96.7% and 85.6 %, respectively. The model has been tested for allied ragas and found to be 54% accurate. Sarkar et al. [5] experimented with 23 different ragas. There are a total of 1648 raga clips in the dataset. Each clip is 45 seconds long. There are 1190 raga clips from instrumental audio

https://iaeme.com/Home/journal/IJEET 252 [email protected] Anagha A. Bidkar, Rajkumar S. Deshpande, Yogesh H. Dandawate and 458 raga clips from vocal performances among them. For each audio signal, a pitch-based swara (note) profile is created, which generates a histogram of the dominant swaras as well as the energy distribution of the swaras. The accuracy achieved with the SVM classifier for the Instrumental dataset and the vocal dataset was 84.79% and 70.52%, respectively. When domain knowledge is used, classification errors can be reduced. Anoop [6] and team found 32 ragas spectrogram of flute samples. 95% accuracy is gained by phrase matching. Anitha and Gunavathi [7] selected musical features extracted from MIRTOOLBOX using Neutrosophic Cognitive Maps (NCMs). Carantic 72 melkarta raga classification is attempted with a gaussian kernel Support Vector Machine (SVM) and achieves 96 % accuracy. Alekh [8] used the GTraagDB database, which contains 127 samples from 31 different ragas. Raga pitch movements and tonal extraction were performed. The Neural Network classifier with the Bhattacharya Distance is used for raga recognition and tonic estimation, with a kernel density pitch distribution through 5-cent granularity. In the case of tonic estimation, the minimum error rate for 15-cent precision was calculated to be 4:92%. The same configuration produced the lowest error rate 8.5% for raga estimation. Rajani [9] attempts to identify a Carnatic raga based on its octave-folded prescriptive notations. They limit the notations to seven notes and map the finer note position information. A dictionary-based approach captures the statistics of raga notation's repetitive note patterns. The proposed stochastic models of repetitive note patterns were obtained from raga notations of known compositions and achieved a 96% accuracy. Most previous approaches to the problem of raga identification have relied heavily on explicitly developing features that can capture the various characteristic features of a raga. Raga detection can be accomplished by computing raga similarity using a distance measure or by using a classifier on those features. There has been a lot of work done on Carnatic raga recognition, but there is still a lot of work to be done on Hindustani or North Indian raga recognition. In this paper, researcher attempted to recognize North Indian classical ragas by reducing the size of the feature set and finding an appropriate classifier to identify the raga.

3. PROPOSED METHOD Fig.1 shows the proposed methodology. An ensemble classifier model is implemented to get the raga output. The generated database is divided into training and testing dataset. Features are extracted after preprocessing of the database. Ensemble classifier model is implemented after data analysis of features and raga output is generated. Details are given in subsequent sections.

20 sec .wav Pre- Feature Training Set processing Extraction Ensemble Classifier Raga Output Model 20 sec .wav Pre- Feature Testing Set processing Extraction

Figure 1 Proposed Method

https://iaeme.com/Home/journal/IJEET 253 [email protected] A North Indian Raga Recognition using Ensemble Classifier

3.1. Database Generation Dataset for Indian music is prepared in consultation with music expert Mr. Deepak Desai. Database set consist of 20-sec audio wave segments for 12 ragas played by 4 musical instruments like sitar, sarod, santoor and flute. Details of the dataset are mentioned in Table 1. The database has all variations of the specific pattern of notes indicating raga.

Table 1. Details of Database Ragas\ Instrument Sitar Sarod Santoor Flute Total no Ahir Bhairav 56 70 64 89 279 Bagshree 72 56 87 149 364 Bhiarav 59 73 56 59 247 Bihag 73 123 109 124 358 Bhimpalas 71 143 63 81 429 Lalit 83 56 57 117 313 Madhuwanti 64 106 56 68 294 Malkauns 56 60 100 100 316 Pooriya Kalyan 53 76 64 80 273 Sarang 60 58 54 66 238 Todi 77 87 51 40 255 Yaman 73 184 81 111 449 Total 797 1092 842 1084 3815

3.2. Data Pre-Processing The audio files had a standard 44100 KHz sampling frequency. The stereo audio signal is converted to a mono signal. This signal was down sampled to 11025 Hz. Preemphasis filter is used to filter the low-frequency components with filter coefficients as [0.99, 1]. Resulting signal enhancement and processing for feature extraction. Preprocessing of the signal is shown in Fig. 2.

Stereo to Down Processed Music audio Preemphasis mono signal sampled for Feature file Filter conversion signal Extraction

Figure 2 Preprocessing of Audio data

3.3. Feature Extraction As per techniques mentioned in the literature survey, to identify raga individual note transcription or tonic pitch estimation of each audio segment must be done. One should analyze these note sequence patterns to recognize raga. Instead of note transcription, traditional features are extracted such as pitch, centroid, kurtosis and mean of MFCC, mean of MFCC delta, and mean of MFCC delta-delta. Pitch is the fundamental frequency of the audio segment. The centroid feature represents the center of the gravity of the spectrum. Kurtosis is the impulsiveness of a signal that varies with frequency. Mel Frequency Cepstral Coefficients (MFCC) has operations as windowing the signal, find out coefficients of Discrete Fourier Transform (DFT) for each windowing signal, taking the log of the magnitude of DFT and then using Mel scale filter, frequencies are wrapped, applying the inverse Discrete Cosine Transform (DCT); MFCC coefficients are extracted. For each raga wave file, MFCC extracted is a matrix of size 5 * 13. Pitch, centroid and kurtosis are having one vector value, thus there is a need to

https://iaeme.com/Home/journal/IJEET 254 [email protected] Anagha A. Bidkar, Rajkumar S. Deshpande, Yogesh H. Dandawate compress the MFCC matrix. After MFCC, MFCC-delta feature is extracted by taking difference of current and previous coefficients of MFCC and and MFCC-delta-delta is computed by taking difference of current and previous coefficients of MFCC delta, and which are having the same size as MFCC. Mean all MFCC matrix is computed, thus matrix is compressed to one value. The feature set of 6 size is formed as [Pitch, Centroid, Kurtosis, mean of MFCC, mean of MFCC-delta, mean of MFCC-delta-delta]. Extracted features are very sparse and nonlinear, thus z-score normalization is applied to extracted features. z-scores measure the distance of a data point from mean terms of standard deviation. All the feature set values are now positive. The data splitting is done as a training and testing set. The classifier training is done on 3000 files and for testing the system 815 files are used. Approximately 78 % of data is used for training the model and the remaining is used for testing the model.

3.4. Ensemble Classifier Model Ensemble classifier [10] combines multiple classifiers and can give an improvement in the prediction of the output. Ensemble bagged tree and Ensemble subspace k-nearest neighbour (KNN) classifiers outperform for the recognition of ragas. Each classifier generates a random sample subset of the training data. Those subsets are fed into similar base classifiers. The bagging classifier's ultimate decision is determined by the class chosen by most base classifiers; each class is chosen with equal probability. Ensemble bagged tree approaches integrate numerous decision trees to improve forecast performance over a single decision tree. The ensemble model's core idea is that a group of weak learners join forces to generate a powerful learner.

4. RESULTS AND DISCUSSION Experimentation is performed on the training and testing dataset using both classifiers. The performance is evaluated for the classifiers are accuracy, precision, recall and F1-score. A confusion matrix is used to describe the performance of the Ensemble Bagged Tree classifier and Ensemble Subspace KNN classifier are as shown in Fig. 3 and Fig. 4, respectively.

Figure 3. Confusion matrix of Ensemble Figure 4. Confusion matrix of Ensemble Bagged Tree classifier Subspace KNN classifier

Confusion matrix parameters indicate the output of the predicted class against the actual class output. Details are given in a following matrix.

It is a Predicted class Not Predicted class It is True Class Tp is True positive Fn is False negative Not True Class Fp is False Positive Tn is True negative

https://iaeme.com/Home/journal/IJEET 255 [email protected] A North Indian Raga Recognition using Ensemble Classifier

A false positive (Fp) is a result that shows that a certain condition has been met when it has not been met. A false-negative (Fn) result shows that a condition failed when it did not. True positives (Tp) are items that have been appropriately detected and are therefore relevant. True negatives (Tn) are non-essential objects that have been appropriately determined as such. The accuracy of a measurement refers to how near it is to the actual (true) value. It is the percentage of cases for which the classifier correctly predicts the class. Equation (1) can be used to compute it. Result is tabulated in Table 2. Classifier accuracy for all ragas is around 99.5 % for all test samples. 푇푝 + 푇푛 퐶푙푎푠푠푖푓푖푒푟 퐴푐푐푢푟푎푐푦 = (1) 푇표푡푎푙 푛푢푏푒푟 표푓 푠푎푚푝푙푒푠

Table 2 Accuracy of Raga From Confusion Matrix Raga Class Number Raga Name Accuracy for Accuracy for Ensemble from Confusion matrix Ensemble bagged subspace KNN Tree classifier classifier 1 Ahir Bhairav 99.63 % 99.51 % 2 Bagshree 99.39 % 99.51 % 3 Bhiarav 99.51 % 99.63 % 4 Bihag 98.65 % 98.77 % 5 Bhimpalas 100 % 99.75 % 6 Lalit 99.63 % 99.51 % 7 Madhuwanti 99.75 % 99.75 % 8 Malkauns 100 % 99.88 % 9 Pooriya Kalyan 99.87 % 99.75 % 10 Sarang 100 % 99.88 % 11 Todi 99.88 % 99.88 % 12 Yaman 99.63 % 99.14 % Precision is defined as the number of true positives multiplied by the number of false positives. Precision is a metric for how accurate a classifier is. A higher precision implies fewer false positives, whereas a lower precision implies a greater number of false positives. An equation (2) is used to calculate the precision value. 푇푝 푃푟푒푐푖푠푖표푛 = (2) 푇푝 + 퐹푝

The recall is calculated by dividing the number of true positives by the number of true positives plus the number of false negatives. Recall measures a classifier's completeness or sensitivity. Lower recall implies more false negatives, whereas higher recall implies fewer false negatives. An equation (3) can be used to calculate the recall value. 푇푝 푅푒푐푎푙푙 = (3) 푇푝 + 푇푛

Precision and recall can be combined to produce a single metric known as F1_score, which is the weighted harmonic mean of precision and recall. An equation (4) can be used to calculate the value of F1_score . 2 ∗ (푃푟푒푐푖푠푖표푛 ∗ 푅푒푐푎푙푙) 퐹1_푠푐표푟푒 = (4) 푃푟푒푐푖푠푖표푛 + 푅푒푐푎푙푙

https://iaeme.com/Home/journal/IJEET 256 [email protected] Anagha A. Bidkar, Rajkumar S. Deshpande, Yogesh H. Dandawate

The Table 3 shows the accuracy achieved for all 12 ragas in terms of Precision, Recall and F1_score and diagrammatically it is shown in Fig.5. It is observed that the Precision, Recall, and F-measure accuracy gained by both classifier models are the same, indicating that the model is correctly trained for our database. Both classifiers produced the same result for Ahirbhairav and Bhairav raga. Ensemble subspace KNN outperforms Ensemble bagged Tree for Bageshree and Madhuwanti raga. Ensemble bagged tree performs better for Bhimplas, Malkauns, Pooriya Kalyan, Sarang, Todi and Yaman raga.

Table 3 Percentage Accuracy Comparison Chart Ensemble Bagged Tree Classifier Ensemble Subspace KNN classifier Raga name Precision Recall F1_ Score Precision Recall F1_ Score Ahir Bhairav 96.83 96.83 96.83 96.77 96.77 96.77 Bagshree 94.20 94.20 94.20 96.67 96.67 96.67 Bhiarav 100 100 100 100 100 100 Bihag 89.01 89.01 89.01 90 90 90 Bhimpalas 100 100 100 97.75 97.75 97.75 Lalit 94.29 94.29 94.29 95.59 95.59 95.59 Madhuwanti 94.52 94.52 94.52 95.78 95.78 95.78 Malkauns 100 100 100 98.59 98.59 98.59 Pooriya Kalyan 98.41 98.41 98.41 95.31 95.31 95.31 Sarang 95.83 95.83 95.83 93.88 93.88 93.88 Todi 98.18 98.18 98.18 96.43 96.43 96.43 Yaman 97.47 97.47 97.47 94.94 94.94 94.94

Figure 5. Comparative plot accuracy of Precision, Recall and F1-score

The overall performance of the classifier is shown in Table 4. Ensemble bagged tree classifier achieves 96.32% accuracy, while Ensemble subspace KNN achieves 95.83% accuracy. Table 4.Overall Performance of Classifier

Ensemble bagged tree Ensemble subspace KNN Classification accuracy 96.32 % 95.83 % Correct rate 0.9632 0.9583 Error rate 0.0368 0.0417 Sensitivity 0.9839 0.9677 Specificity 0.9973 0.9973 Positive predictive value 0.9683 0.9677 Negative predictive value 0.9987 0.9973 Positive likelihood 370.4274 364.3548 Negative likelihood 0.0162 0.0323

https://iaeme.com/Home/journal/IJEET 257 [email protected] A North Indian Raga Recognition using Ensemble Classifier

5. CONCLUSION This paper attempts to identify ragas in North Indian classical music. The paper proposes a recognition method that employs pitch, centroid, kurtosis, and mean of MFCC variants as features. Extracting features from a dataset of 12 ragas and 4 musical instruments. The ensemble classifier model was found to be highly suitable for raga class recognition without instrument classification and tonic identification. The proposed system was tested using the ensemble bagged tree and ensemble subspace KNN classification algorithms, with promising results of 96.32% and 95.83%, respectively. Accuracy performance parameters such as precision, recall, and F1- score are found to be the same for both ensemble models, indicating that the model is good. The research can be expanded to investigate parameters of Indian classical music that have yet to be explored. Deep learning models may be developed in the future.

ACKNOWLEDGEMENTS We would like to thank Vid. Deepak Desai (Sitarist) for his assistance in creating the database.

REFERENCES

[1] Bhat A, Krishna AV, Acharya S. Analytical Comparison of Classification Models for Raga Identification in Carnatic Classical Instrumental Polyphonic Audio. SN Computer Science. 2020 Nov;1(6):1-9.

[2] Kumar MS, Devi MS. Raga recognition using machine learning. Science, Technology and Development, ISSN : 0950-0707, Volume IX Issue IX September 2020, Page No : 646- 650,

[3] K. Praveen kumar, P.Subbarao, Venkata Naresh Mandhala, Debrup Banerjee Classification of 72 Melakartha ragas using PAM clustering method: Carnatic Music International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, April 2019, Volume-8 Issue-4, Pp-1864-1867

[4] Anand A. Raga Identification Using Convolutional Neural Network. In2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP) 2019 Feb 25 (pp. 1-6). IEEE.

[5] Sarkar R, Naskar SK, Saha SK. Raga identification from Hindustani classical music signal using compositional properties. Computing and Visualization in Science. 2019 Dec;22(1):15-26.

[6] Anoop M N, Deepak T S, Shreekanth T. An approach for analysis and identification of Raga of Flute Music using Spectrogram. In2017 International Conference on Trends in Electronics and Informatics (ICEI) 2017 May 11 (pp. 261-266). IEEE.

[7] Anitha R, Gunavathi K. NCM-Based Raga Classification using musical features. International Journal of Fuzzy Systems. 2017 Oct;19(5):1603-16.

[8] Alekh S. Automatic Raga Recognition in Hindustani Classical Music. arXiv preprint arXiv:1708.02322. 2017 Aug 7.

[9] H G, Sreenivas TV. Raga identification using repetitive note patterns from prescriptive notations of Carnatic music. arXiv e-prints. 2017 Nov:arXiv-1711.

[10] Tian Y, Feng Y. RaSE: Random Subspace Ensemble Classification. J. Mach. Learn. Res.. 2021 Jan 1;22:45-1.

https://iaeme.com/Home/journal/IJEET 258 [email protected]