International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com A Novel Hybrid Approach for Retrieval of the Music Information

Varsha N. Degaonkar Research Scholar, Department of Electronics and Telecommunication, JSPM’s Rajarshi Shahu College of Engineering, Pune, Savitribai Phule Pune University, Maharashtra, India. Orcid Id: 0000-0002-7048-1626

Anju V. Kulkarni Professor, Department of Electronics and Telecommunication, Dr. D. Y. Patil Institute of Technology, Pimpri, Pune, Savitribai Phule Pune University, Maharashtra, India. Orcid Id: 0000-0002-3160-0450

Abstract like automatic music annotation, music analysis, music synthesis, etc. The performance of existing search engines for retrieval of images is facing challenges resulting in inappropriate noisy Most of the existing human computation systems operate data rather than accurate information searched for. The reason without any machine contribution. With the domain for this being data retrieval methodology is mostly based on knowledge, human computation can give best results if information in text form input by the user. In certain areas, machines are taken in a loop. In recent years, use of smart human computation can give better results than machines. In mobile devices is increased; there is a huge amount of the proposed work, two approaches are presented. In the first multimedia data generated and uploaded to the web every day. approach, Unassisted and Assisted Crowd Sourcing This data, such as music, field sounds, broadcast news, and techniques are implemented to extract attributes for the television shows, contain sounds from a wide variety of classical music, by involving users (players) in the activity. In sources. The need for analyzing these sounds is increasing, the second approach, signal processing is used to e.g. for automatic audio tagging [1, 2], audio segmentation automatically extract relevant features from classical music. [3], Automatic Music Genre Recognition [4] and audio Mel Frequency Cepstral Coefficient (MFCC) is used for context classification [5, 6, 7]. Due to the technology and feature learning, which generates primary level features from customer need, there have been some applications for audio the music audio input. To extract high-level features related to processing in different scenarios, such as urban monitoring, the target class and to enhance the primary level features, surveillance, health care, music retrieval, and customer video. feature enhancement is done. During the learning process, the Existing retrieval methods rely heavily on textual information audio tags are automatically predicted by different classifiers. for data retrieval which demonstrates its effectiveness and With the domain knowledge of classical music, a human efficiency in large-scale data search. But the performance of computation gives better results. The experimental results such retrieval methods facing challenges and results in many show that Human-Machine hybrid approach by taking noisy inappropriate data. This happens due to the limited machines in the loop gives a performance improvement as illustration of the semantic data and large diversity in content compared to conventional method. and context of the given data. However, multi-attribute Keywords: Crowd Sourcing techniques, Mel Frequency queries are considered to improve the performance of retrieval Cepstral Coefficient (MFCC), Independent Component methods. Analysis (ICA), Feed Forward Neural Network. Researchers [8] have analyzed and classified the instrumental music according to their spectral and temporal features. For extraction of the feature, they have taken a spectrum, INTRODUCTION chromagram, centroid, lower energy, roll off, and histogram. Retrieval of the music file plays a vital role in developing They have worked on four namely , , automatic indexing and retrieval of the file from the database. and . For classification, they have used KNN These applications save the users from time-consuming classifier and SVM classifier. searches through a large number of digital audio files Scale-independent identification [9] has been done using available today. The sound samples those are similar to the chromagram patterns and Swara based features. They have used dataset sample will further improve the search. The used GMM based Hidden Markov Models over a collection of content analysis of Audio files has many real-life applications features consisting of chromagram patterns, Mel-cepstrum

15011 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com coefficients [10] and timbre features [11] on a dataset of 4 websites. The database contains 9 ragas namely , Ahir ragas- Darbari, , and Sohini. Bhairav, , Bhairavi, Darbaari Khanada, Maru , Vibhas, Yaman, Hansdhwani. For training, 10 audio clips of Researchers have made great progress in developing systems that can recognize an individual object category [12]. They each raga are used. have developed computer vision algorithms that go beyond naming and infer the properties or attributes of objects. The capacity to infer attributes allows describing, comparing, and FEATURE EXTRACTION METHODS more easily categorizing objects. Importantly, when faced Knowledge Generation Using Assisted and Unassisted with a new kind of object, one can still say something about it Crowd sourcing (Human Intelligence) Techniques (e.g., "soothing song”) even though they cannot identify it. Also, it is possible to say what is unusual about a particular In this type, a new mechanism for classifying an entity object and learn to recognize objects from the description (classical music file) is developed. Music classification is a alone [13]. task well suited for the inquiry to identify the correct type of audio files. In this section, the design of a new technique is The word attribute refers to the intrinsic property of a concept described, which is implemented to extract attributes for the (e.g., a quiet ambient music of the flute) [14]. There has been classical music file. a recent movement towards using an intermediary layer of human-understandable attributes for classification. For feature extraction using Human Intelligence Technique, the following Module is formed: Instead of learning a classifier to map audio file features and classes, one can map these features first to a set of semantic  User registration page: User can register them with this attributes and then map the predicted attributes to the class page and can enter a username, password, and mail-Id. with the most analogous set of attributes. This method will be  User login: This page is for Log In the user. scalable since many objects in the world can be briefly  Admin login: Admin login is used by administrators described using only a few number of semantic attributes, only. mapping these low-level features to this proficient code can o The administrator can update the Admin profile. allow instances to be classified into new classes where no The administrator can store the details like first training example is available. name, last name, city, mail-Id and Profile photo. Major contributions to this research work include: o The administrator can view all the users. o The administrator can control all the game 1. For human computation, Assisted and Unassisted settings and can upload classical music files, Crowd Sourcing techniques are implemented to extract view all classical music files, add/remove/edit attributes for music database. For this, people with classical music file category, add/remove/edit domain knowledge are assigned the task and the attribute category, view all reports and can view attributes are collected. player history. 2. To boost the performance of the system and to get the o While uploading the classical music files, the higher level features; signal processing is done on administrator has to enter the appropriate music primary level features which enhances the features and file category and write the predefined tags in the further improves the system. database for a particular classical music file 3. As the human computation requires specialized category. knowledge and doing so in the activity, where the average player may not have a keen knowledge of  A selection of classical music files: classical music, is especially challenging. So a There are some considerations when selecting the set of combined approach, involving both humans and classical music files to present to players in each round of the machines that can help overcome humans' knowledge game. The system decides randomly whether to draw a limitation in music classification is done. particular classical music file based on the selection of 4. The feasibility and performance of the proposed classical music file in the previous activity. Classical music scheme have been validated on all major databases files which are not selected at all in the previous activity, have with around 270 Classical Music audio files. given a priority in the next game. This variation allows for varying levels of difficulty in the game and prevents players DATABASE from being able to guess the answer by anticipating the The audio database (classical songs) contains 270 clips of composition of the classical music files. approximately 4 minutes long and is in the raw audio (.mp3)  The database creation using human intelligence format. All clips have been manually extracted from different technique:

15012 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com

 Matrix M is prepared to have pre-defined set of Table 1: Total attributes with unique and overlapped classical music files and all the possible attributes. attributes for all Music audio clips of Ragas  Ei: the pre-defined set of classical music files as rows Classical Over- Unique Total % Unique of the matrix. Music Raga lapped Attributes Attributes Attributes  Ai: possible attributes as columns of the matrix. Attributes collected  Cells are filled with count values. Adana 16 92 108 85 16 84 100 84 Assisted Crowd Sourcing Basant 28 92 120 77 Bhairavi 24 56 80 70 In the Assisted Crowd sourcing, the activity provides the classical music tune with some predefined attribute categories. Darbari 24 72 96 75 Players have to select the correct attribute for the given music tune and select the next option. The player can also skip the Hansdhwani 24 84 108 78 tune. Players can submit their selections at the end of the Maru Bihag 24 120 144 83 activity. At the end of each round, the activity provides Vibhas 24 56 80 70 feedback to the player, including the round score. Yaman 52 68 120 57

Unassisted Crowd Sourcing SIGNAL PROCESSING FOR FEATURE EXTRACTION In the Unassisted Crowd Sourcing, the activity provides the In feature extraction, the task is to compute a compact classical music file with some attribute categories. Players numerical representation of various classical ragas that can be have to listen to the audio file and write the correct attribute used to characterize a music segment. The main challenge of for the given classical music file. Players can add more than developing a system is to design the descriptive features for one attribute separated by commas, and submit their selections an exact application. Once these features are extracted, at the end of the activity. At the end of each round, the activity various machine learning techniques which are independent of provides feedback to the player, including the round score. the specific application can be used. To extract the first level features Mel-Frequency Cepstral Coefficients are calculated. Total ten classical music files are launched one after another in this activity. All the written attributes of particular classical music files are stored in the database, and the score is Mel-Frequency Cepstral Coefficients displayed. As Mel-frequency Cepstral coefficients are non-parametric representations of the audio signal. As MFCC models the While appending the attributes in the database, following human auditory perception system, in this work MFCC is used to extract primary level features. steps carried out: The relation between the perceived frequency in Mels and the  If the attribute is newly added in the database, then it frequency in Hz is as follows is appended to the database and attribute value is 퐹 written as ‘1’. 퐹 = 2595 ∗ 푙표푔 (1 + 퐻푧) (1) 푚푒푙 10 700  If the attribute is already present in the in the database, then instead of appending in the database Where Fmel is the frequency in Mels and FHz is the frequency and attribute value is incremented by ‘1’. in Hz.  A basic level of preprocessing is performed on all the The steps used in [15] are used to calculate the MFCC attributes by converting them to lower case, deleting coefficients and following formulas are used to get MFCC spaces, and removing the punctuation marks. coefficients: The result of this activity is given in Table 1, which shows 퐹⁄2− 1 푆[푗] = ∑푖=0 푊푗(푖) ∗ 푋(푖), 푗 = 1, … … . , 푀 (2) total attributes with unique attributes and overlapped attributes for all music audio clips of ragas and percentage of unique Where Wj is the critical-band filter. attributes collected. 푛휋 퐶[푛] = ∑푀 푙표푔(푆[푗]) cos [(푗 − 0.5) ] , 푛 = 1, . , 퐿 (3) 푗=1 푀

Total 13 MFCC coefficients are calculated for each frame. Then the average of 13 coefficients for all frames is taken to get 13 MFCC coefficients for each music file. From the

15013 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com

MFCC calculation, it is observed that the amplitude of first different techniques are applied to input layer neurons and the three MFCC coefficients describes the Raga feature in music output layer neurons correspond to different classes of Ragas. signal. The back propagation algorithm uses supervised learning. For Training, following steps are used in this work [17]:

Feature Enhancement  Each input neuron of the input layer receives features xi from feature extraction methods and apply it to the After the feature extraction, the main purpose of doing the hidden layer neuron (i=1to n). feature enhancement is to both reduce the dimensionality as well as to provide a more meaningful data set from which to  The input of the hidden layer neuron calculated by: classify against. Independent Component Analysis is used for n enhancing the features. zinj = v0j + ∑i=1 xi . vij (7)  By applying activation functions, the output of the Independent Component Analysis (ICA) hidden layer is calculated by:

In this work, input feature vector is linearly transformed to 푧 = 푓(푧 ) (8) obtain statistically independent components. The goal is to 푗 푖푛푗  The input of the output layer neuron is: express the music input data by a linear generative model where the stochastic sources (si) are as mutually independent ∑푝 as possible. This has been done by subtracting the mean of the 푦푖푛푘 = 푤0푘 + 푗=1 푧푗푤푗푘 (9) input data from center the data on the origin by equation 1  By applying activation functions, the output of the [16] output layer is calculated by:

X=x-E{x} (4) 푧푘 = 푓(푦푖푛푘) (10) To get the unit variance, components (V) are uncorrelated and  To achieve the targeted class, the error function is this is done to whiten the data (z) by equation 2[16] calculated and according to that weights are updated in the training phase. z = Vx; so that E{zzT} = I (5)

As ICA gives statistically independent components, ICA 훿 = (푡 − 푦 )푓′(푦 ) enhances the primary level features and improves the results. 푘 푘 푘 푖푛푘 Eq. (11) { ∆푤푗푘 = 훼훿푘푧푗 ∆푤푗푘 = 훼훿푘푧푗 CLASSIFIERS

K-Nearest Neighbor (KNN) 푤푗푘(푛푒푤) = 푤푗푘(표푙푑) + ∆푤푗푘 This method is used for pattern recognition and classification. Eq. (12) {푤0푘(푛푒푤) = 푤푖푘(표푙푑) + ∆푤0푘 Feature vectors of both test and train music files of the ragas 푣푖푗(푛푒푤) = 푣푖푗(표푙푑) + ∆푣푖푗 are taken. Then the features of both test and trained music files of the ragas are compared. The Euclidean distance between test vector and trained vector of music files of ragas  The training will stop when the target output is is calculated. The test segment is assigned only those who achieved according to different Raga classes. have most common category among K-Nearest. In this work,

KNN-classification is done by the following equation For Testing, the following steps are used [17]: 퐶 ∗= ∑ 1(푐, (푓(푥))) (6) 푖  Weights are taken from the training algorithm. Where c is the Raga class label, i.e class to identify and fi(x)  The input of the hidden layer neuron calculated by: is the class label for the ith neighbor of x and (c, fi(x)).

푛 As in KNN, K instances are memorized by training 푧푖푛푗 = 푣0푗 + ∑푖=1 푥푖 . 푣푖푗 (13) observations which closely resembles the new instance of a class and assigns it to most common class; it results in better  By applying activation functions, the output of the classification efficiency. hidden layer is calculated by:

Neural Networks (NN) 푧푗 = 푓(푧푖푛푗) (14)  Now compute the output of the output layer unit. For To classify different Ragas, Back Propagation Feed-Forward k=1 to m multilayered network is used. The features extracted from

15014 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com

푝 푦푖푛푘 = 푤0푘 + ∑푗=1 푧푖 . 푤푗푘 (15) classifier is used to classify the different ragas. The results are as shown in table 3. 푦푘 = 푓(푦푖푛푘) 1 (16) Table 3 shows that when the features are enhanced by ICA, Here, the sigmoid activation function is used to calculate the the percentage efficiency is increased. As ICA gives set of output. statistically independent components, enhancement of the ANN (Back Propagation Feed-Forward multilayered network) primary level features is done which improves the results. can be trained at a time for all classes of the classical music signal, to get the better classification. Table 3: Percentage efficiency of MFCC, ICA and NN- FeedForward network RESULTS AND DISCUSSION % Efficiency Results with K-Nearest Neighbor (KNN) classifier Method used MFCC+ NN MFCC+ ICA+ NN In the second method a) MFCC and KNN: only MFCC is used Adana 71 86 to extract the basic features of classical ragas and KNN Ahir Bhairav 57 57 classifier is used to classify the different ragas. b) MFCC, ICA and KNN: MFCC is used to extract the basic features of Basant 100 71 classical ragas, then ICA is used to enhance these features and Bhairavi 100 100 KNN classifier is used to classify the different ragas. The results are as shown in table 2. 71 86 Hansdhwani 43 60 Maru Bihag 100 Table 2: Percentage efficiency of MFCC, ICA and KNN 100 Vibhas 71 100 % Efficiency Yaman 60 57 Method used MFCC+ KNN MFCC+ ICA+ KNN

Adana 29 55 In the proposed work, primary level feature extraction is done Ahir Bhairav 57 60 by MFCC and by further enhancing these features, meaningful Basant 62 70 data is maintained. Average percentage efficiency from table Bhairavi 70 75 2, table 3 are compared in figure 1.

Darbari Kanada 86 90 82 Hansdhwani 86 86 80 Maru Bihag 62 70 78 76 Vibhas 71 71 74 Yaman 86 90 72 70 68 Table 2 shows that when the features are enhanced by ICA, 66 the percentage efficiency is increased. As ICA gives set of Average % Efficiency 64 statistically independent components, enhancement of the 62 primary level features is done which improves the results. MFCC+ KNN MFCC+ ICA+ MFCC+ NN- MFCC+ ICA+ NN- KNN FeedForward FeedForward Methodology used Results with Feedforward Neural Network classifier Figure 1: Average efficiency of all the methods In this first method a) MFCC and Feedforward Neural Network: MFCC is used to extract the basic or primary level features of classical ragas and a Feedforward Neural Network As proposed in this work, from the above figure 1, it shows classifier is used to classify the different ragas. b) MFCC, that when primary level features are enhanced by ICA, and ICA and Feedforward Neural Network: MFCC is used to Feed Forward Neural Network classifier is used, efficiency is extract the basic features of classical ragas, then ICA is used increased to enhance these features and Feedforward Neural Network

15015 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com

Results with Human computation in a loop REFERENCES As proposed, a Human-Machine hybrid approach by taking [1] Yogesh H.Dandawate, Prabha Kumari and Anagha machines in the loop, Results in figure 2 shows that when the Bidkar, "Indian Instrumental Music: Raga Analysis and human computation (Assisted and unassisted computing) is Classification", 1st International Conference on Next taken in the loop with the machine based methods i. e. signal Generation Computing Technologies (NGCT-2015), pp processing approach, the results are further improved. 4-5, 2015. The results show that domain knowledge in human gives an [2] Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, improved performance of the system by taking machines in Siddharth Sigtia, Philip J. B. Jackson, and Mark D. the loop. Plumbley, “Fully Deep Neural Networks Incorporating Unsupervised Feature Learning for Audio Tagging”,

arXiv: 1607.03681v1, 2016. 90 [3] Pranay Dighe, Parul Agrawal, Harish Karnick, Siddartha 80 Thota, Bhiksha Raj, “Scale Independent Raga Identification Using Chromagram Patterns And Swara 70 Based Features”, 2013. 60 [4] Marcin Serwach, Bartłomiej Stasiak, “GA-based 50 Parameterization and Feature Selection for Automatic Music Genre Recognition”, IEEE Proceedings of CPEE, 40 2016.

% % Efficiency 30 [5] G. Tzanetakis and P. Cook, "Musical genre classification 20 of audio signals," IEEE transactions on speech and audio processing, vol. 10, pp. 293–302, 2002. 10 [6] C. N. Silla Jr, A. L. Koerich, and C. A. Kaestner, “A

0 MFCC+ KNN MFCC+ ICA+ KNN MFCC+ NN- MFCC+ ICA+ NN- machine learning approach to automatic music genre FeedForward FeedForward classification,” Journal of the Brazilian Computer Methodology Used Society, vol. 14, pp. 7–18, 2008. Signal Processing Human Computation in loop [7] Pedro Girao Antunes, David Martins de Matos, Ricardo Ribeiro, Isabel Trancoso, “Automatic Fado Music Figure 2: Average efficiency when human computation is in Classification”, arXiv preprint arXiv: 1406.4447v1 the loop [cs.SD], 2014.

[8] A. C. Bickerstaffe and E. Makalic, “MML classification CONCLUSION of music genres,” in AI 2003: Advances in Artificial Intelligence, Springer Berlin Heidelberg, 2003. In proposed work, Human Computation is done by two activities, Assisted and Unassisted crowd sourcing. Players [9] Chordia, P., Senturk, S., "Joint recognition of raag and having domain knowledge of classical Music played the tonic in north Indian music", Computer Music Journal, activity. A Second approach used is machine based where, 37(3), pp. 82-98, 2013. feature extraction is done by MFCC; feature enhancement is [10] Kartik Mahto, Abhilash Hotta, Sandeep Singh Solanki, done by ICA; and classification is done by Feed Forward Soubhik Chakraborty, “A Study on Artist Similarity Neural Network and KNN. Using Projection Pursuit and MFCC: Identification of When the primary level features are enhanced by ICA and from Raga Performance”, International Feed Forward Neural Network classifier is used, the best Conference on Computing for Sustainable Global average efficiency is achieved. One possible explanation is Development (INDIACom), pp: 647:653, 2014. that at first level instead of going for fewer features, [11] Längkvist, M., Karlsson, Loutfi, “A review of exhaustive features are calculated so that the minute details of unsupervised feature learning and deep learning for all Ragas will be calculated. As ICA gives set of statistically time-series modeling”, Pattern Recognition Letters, independent components, results are improved. KNN http://dx.doi.org/10.1016/j.patrec.2014.01.008, pp: 11- calculates the minimum distance for all the sample vectors 24, 2014. whereas Feed Forward Neural Network is trained for all the music files at a time. As proposed, when the human [12] A. Farhadi, I. Endres, and D. Hoiem, “Attribute-centric computation is taken in a loop, best results are achieved. recognition for cross-category generalization”, CVPR, 2010.

15016 International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 24 (2017) pp. 15011-15017 © Research India Publications. http://www.ripublication.com

[13] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, “Describing objects by their attributes”, CVPR, 2009. [14] M. Poesio and A. Almuhareb, “Extracting concept descriptions from the web: the importance of attributes and values”, Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, 2008. [15] Varsha Degaonkar, Shaila Apte, “Emotion modeling from speech signal based on wavelet packet transform”, International Journal of Speech Technology, Volume 16, Number 1, 2013. [16] Patrik Hoyer, “Independent component analysis applied to feature extraction from color and stereo images”, Network Computation in Neural Systems, 2000. [17] Smita, Varsha Degaonkar, “Defect Identification for Simple Fleshy Fruits by Statistical Image Feature Detection”, Advances in Intelligent Systems and Computing series of Springer, 2017.

15017