Research on Recognition and Classification of Folk Music Based on Feature Extraction Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
https://doi.org/10.31449/inf.v44i4.3388 Informatica 44 (2020) 521–525 521 Research on Recognition and Classification of Folk Music Based on Feature Extraction Algorithm Xi Wang Henan Polytechnic, Zhengzhou, Henan 450046, China E-mail: [email protected] Keywords: folk music, feature extraction, music classification, support vector machine Received: December 8, 2020 In this study, the feature extraction algorithm for folk music was analyzed. The features of folk music were extracted in aspects of time domain and frequency domain. Then, a support vector machine (SVM) was selected to identify and classify folk music. It was found that the performance of SVM was the best when 휎2 was 26 and C was 4; the recognition rate of using only one feature was inferior to that of using all features; the highest recognition rate of SVM was 92.76%; compared with back propagation neural network (BPNN) and decision tree classification method, SVM had a higher recognition rate. The experimental results show the effectiveness of SVM, which can be applied in practice. Povzetek: V tem študentskem članku je predstavljena klasifikacija glasbe s pomočjo metod umetne inteligence. 1 Introduction As an art form, music can express people's thoughts, on its recognition and classification. Therefore, this study feelings, and life style and has a role in promoting people’s took folk music as the research subject, carried out feature emotion and spirit. With the improvement of human living extraction in aspects of time domain and frequency standards, music has become more and more popular. domain, established a feature database, and then identified With the development of science and technology, more and classified folk music with a support vector machine and more people have tended to enjoy music through the (SVM), and verified the reliability of the method through Internet. Therefore, finding out music which users want to experiments. The present study contributes to the listen to from a massive amount of music has become realization of the automatic classification of folk music more and more important, and the recognition and and the improvement of retrieval efficiency. classification of music have attracted more and more extensive attention. Huang et al. [1] improved the hidden 2 Folk music and feature extraction Markov model (HMM) using an artificial neural network (ANN). The application of the improved HMM in practical music classification found that HMM had a fast 2.1 Folk music calculation speed but a poor classification performance, Folk music includes instrumental music, songs, opera, etc. ANN had a good classification performance but a high Musical instruments play a very important role in folk computational complexity. The combination of them music, which can be divided into four categories, as shown could improve the recognition rate of HMM by 4% - 5% in Table 1. while maintaining the same calculation speed as HMM. Abidin et al. [2] recognized a Turkish music data set, Wind Xiao, Suona, Lusheng, Xun, pan SymbTr, with ten machine learning algorithms, and found instruments flute, etc. that the performance of the algorithms was between 82% Plucked stringed Chinese lute, moon lute, Guqin, and 88%. Rao et al. [3] studied chord recognition. Pitch instrument kayagum, Zheng, Konghou, etc. Class Profile features were extracted from raw audio and Percussion Collected bronze bells, wooden recognized by spare representation. Through the instruments fish, bronze drum, long drum, experiment on MIREX09, it was found that the method gong, etc. had robustness to Gaussian white noise. Iloga et al. [4] String Erhu, Xiqin, horse head string studied the genre classification of music, designed a instruments instrument, Leiqin, etc. sequential pattern mining method, and carried out Table 1: Folk music instruments. experiments on GTZAN. They found that the accuracy of the method was 91.6%, which was more than 7% higher Musical instruments can be solo or ensemble, and than the existing classifiers. Chinese folk music refers to different combinations of musical instruments will form the music played by traditional instruments, which has different styles of instrumental music. For example, the high artistry and nationality [5], but there is little research music played by percussion instruments has a strong 522 Informatica 44 (2020) 521–525 X. Wang rhythm and rich timbre; the music performed with string domain features can be obtained by converting the signal instruments has a delicate style and simple and elegant to the frequency domain through Fourier transform. The style; the music played with wind instruments, and string features selected in this study are as follows. instruments tends to be light and lively; the music played (1) Spectrum centroid (SC): it refers to the characteristic with wind instruments and percussion instruments is quantity of the spectrum center of a signal. Fourier joyful and enthusiastic. transform is represented by 퐹(훿), 훿 ∈ (푔, ℎ), and the maximum and minimum values of frequency are 2.2 Music feature extraction represented by 푔 and ℎ respectively. Then SC can be expressed as: To identify and classify folk music, it is necessary to ∑ℎ 훿|퐹(훿)|2 extract the features of folk music. Music is composed of 푆퐶 = 훿=푔 ℎ 2 many monosyllables. In psychology, sound includes the ∑훿=푔|퐹(훿)| following four characteristics: (1) pitch: pitch refers to people’s feeling of the frequency (2) Spectrum energy (SE): it refers to the frequency of sound, determined by the number of vibration of an domain energy of the signal, which can be expressed object; as: (2) sound duration: sound duration refers to the duration ℎ 1 of a note, which is determined by the duration of the 푆퐸 = √ ∑|퐹(훿)|2 vibration; ℎ − 푔 훿=푔 (3) sound intensity: sound intensity refers to the loudness that people feel, which is determined by the vibration (3) Mel frequency cepstrum coefficient (MFCC) [6]: it amplitude; refers to the cepstrum characteristics at Mel frequency, (4) timbre: timbre refers to people’s perception of sound which has 13 dimensions. Suppose that the frequency quality, which is determined by the material, of the music signal is f , then its Mel frequency is: structure,. and shape of the sound body. 푓 푓 = 2595 × 푙표푔 (1 + ). In the recognition of folk music, timbre is the main 푚푒푙 10 700 feature because music is played by different instruments. Timbre is a short-term feature, which can be extracted from the following three aspects. 3 Support vector machine-based classification algorithm 2.2.1 Time-domain characteristics SVM is a machine learning method [7], which has Time-domain characteristics aim at the characteristics of significant advantages in a small sample and nonlinear the audio signal waveform. The time-domain features field and has been successfully applied in many fields, selected in this study are as follows. such as speech recognition [8] and image classification (1) Short-time average energy (STE): it is used for [9]. 푑 reflecting the change of music signal amplitude. It Suppose that in Euclidean space 푅 , the training refers to the average energy of the signal in the short- sample is {(푥1, 푦1), (푥2, 푦2), . , (푥푁, 푦푁)} (푦 ∈ {+1, −1}), term audio window. For a short-time frame with a the linear discriminant function is 푔(푥) = 푤푥 + 푏, and window length of 푁, suppose that the signal value of the classification plane equation is 푤푥 + 푏 = 0, where 푤 the 푛 -th sampling point is 푥(푛), the window function refers to the hyperplane normal vector, and 푏 refers to the is represented by 푤(푛 − 푚). For the 푚 -th frame, its offset. To separate the samples correctly, the problem can STE can be expressed as: be expressed as: 1 2 1 2 푚푖푛 ‖푤‖ 퐸(푚) = ∑(푥(푛)푤(푛 − 푚)) 2 푁 푦 [(푤푥 ) + 푏] − 1 ≥ 0 푚 푖 푖 (2) Zero crossing rate (ZCR): it refers to the number of In the case of inseparable linearity, relaxation variable times a signal waveform passes through the zero point 휆 and penalty factor 퐶 are introduced. Then the above in a frame. For the 푚 -th frame, its ZCR can be equation is transformed into: 푁 expressed as: 1 푚푖푛 ‖푤‖2 + 퐶 ∑ 휆 1 2 푖 푍퐶푅(푚) = ∑|푠푔푛[푥(푛)] − 푠푔푛[푥(푛 − 1)]|푤(푛 − 푚) 푖=1 2 푚 푦푖[(푤푥푖) + 푏] ≥ 1 − 휆푖 The Lagrange function is introduced to solve the where 푠푔푛 stands for the sign function, above equation. Lagrange coefficient is set as 푎푖, then the 1, 푥(푛) ≥ 0 optimal classification function is: 푠푔푛 = { 0, 푥(푛) < 0 푁 푓(푥) = 푠푔푛 {∑ 푎 푦 푘(푥 , 푥) + 푏} 2.2.2 Frequency-domain characteristics 푖 푖 푖 푖=1 Audio contains a lot of information, which needs to be obtained in the frequency domain analysis. The frequency Research on Recognition and Classification of... Informatica 44 (2020) ) 521–525 523 For any unclassified sample 푥 , the result of 4.2 Experimental results classification can be obtained by calculating 푓(푥) . Firstly, two parameters of SVM need to be determined. 푘(푥푖, 푥푗) represents the kernel function. In SVM, the Two hundred of samples were selected. and determine the commonly used ones are: value of parameters through the cross test, as shown in (1) linear kernel function: 퐾(푥푖, 푥푗) = 푥푖 ⋅ 푥푗 Tables 3 and 4. ; (2) polynomial kernel function: 퐾(푥푖, 푥푗) = [(푥푖 ⋅ 푥푗) + 푑 퐶 Recognition rate/% 1] , where 푑 is an adjustable parameter; 2-1 90.11 2 ‖푥 −푥 ‖ 2 91.23 (3) RBF kernel function: 퐾(푥 , 푥 ) = 푒푥푝 (− 푖 푗 ) , 푖 푗 휎 22 93.87 23 92.18 where 휎 is an adjustable parameter. In SVM, the RBF kernel function is the most 24 92.09 commonly used and has the best performance; therefore, 25 91.63 this study uses RBF kernel function.