Factors Affecting Automatic Genre Classification: an Investigation Incorporating Non-Western Musical Forms
Total Page:16
File Type:pdf, Size:1020Kb
FACTORS AFFECTING AUTOMATIC GENRE CLASSIFICATION: AN INVESTIGATION INCORPORATING NON-WESTERN MUSICAL FORMS Noris Mohd Norowi, Shyamala Doraisamy, Rahmita Wirza Faculty of Computer Science and Information Technology University Putra Malaysia 43400, Selangor, MALAYSIA {noris,shyamala,rahmita}@fsktm.upm.edu.my characteristics is therefore highly sought. ABSTRACT Musical genre is used universally as a common metadata for describing musical content. Genre The number of studies investigating automated genre hierarchies are widely used to structure the large classification is growing following the increasing amounts of collections of music available on the Web. Musical digital audio data available. The underlying techniques to perform automated genre classification in general include genres are labels created and used by humans for feature extraction and classification. In this study, MARSYAS categorizing and describing the vast universe of music was used to extract audio features and the suite of tools [1]. Humans possess the ability to recognize and analyze available in WEKA was used for the classification. This study sound immediately based on instrumentation, the investigates the factors affecting automated genre rhythm and general tone. Furthermore, humans are able classification. As for the dataset, most studies in this area work to draw connections to other songs that have a similar with western genres and traditional Malay music is sound and feel. These commonalities make it possible incorporated in this study. Eight genres were introduced; Dikir for humans to classify music into different genres. Barat, Etnik Sabah, Inang, Joget, Keroncong, Tumbuk An automatic genre classification is a system that Kalang, Wayang Kulit, and Zapin. A total of 417 tracks from various Audio Compact Discs were collected and used as the allows structuring and organization of the huge number dataset. Results show that various factors such as the musical of archived music automatically. The system should also features extracted, classifiers employed, the size of the dataset, be able to analyze and possibly extract the implicit excerpt length, excerpt location and test set parameters knowledge of these musical files that infers the structure improve classification results. that underlies beneath the audio information into a comprehensible form. The analysis includes evaluation Keywords: Genre Classification, Feature Extraction, Music and comparisons of the feature sets that attempt to Information Retrieval, Traditional Malay Music represent the musical content. The general framework for automatic genre classification includes feature 1 INTRODUCTION extraction and classification. Various studies have been Improvements in audio compression along with carried out toward the development [1,2,3,4]. increasing amounts of processing power, hard disk Wold et al. [2] extensively discuss the essence of capacity and network bandwidth have resulted in audio classification, search and retrieval. Many audio increasingly large number of music files. Easier features were analysed and compared, such as rhythms, distribution of digital music through peer-to-peer file pitch, duration, loudness and instrument identification, in sharing has also made possible creation of large, digital particular. Their work is not only limited to classification personal music collection, typically containing of music but includes classification of speech, laughter, thousands of popular songs. Users can now store large gender, animal sounds and sound effects. personal collections of digital music. These growing Tzanetakis [1] also found that although there has collection of digital audio data needs to be classified, been significant work in the development of features for sorted, organized and retrieved in order to be of any speech recognition and music-speech discrimination, value at all. relatively little work has been done on developing At present, metadata such as the filename, date features to specifically design music signals. Here, he created, etc., are used at large but is very labor- deduces three specific features for musical content; intensive, costly and time consuming. A system that timbral texture, rhythmic structure and pitch content to allows classification of music based on the audio be specific. This is similar to the conclusions from Aucoturier [3]. The features are not the only basis of a genre Permission to make digital or hard copies of all or part of this classification system. Kaminskyj [4] proves that work for personal or classroom use is granted without fee classifiers or machine learning algorithms are just as provided that copies are not made or distributed for profit or crucial, where Gaussian Mixture Model was used. These com-mercial advantage and that copies bear this notice and the approaches work conceivably well on western musical full citation on the first page. forms. However, it is very likely that the representation © 2005 Queen Mary, University of London used for extraction of one genre is bad at describing other genres. Thus, a whole new set of features may be required in order to cater other musical forms. 13 Additionally, the classifier, test set parameters and value for the entire song although timbral feature is dataset pre-processing issues are also taken into sometimes referred to global timbral. consideration in this study in order to get the real 2.1.1 Spectral Centroid description of their effects on classification. This paper investigates the factors affecting This is the gravity centre of the spectral distribution classification results and expands the customary work within a frame. The centroid measures the spectral scope of using western musical forms onto other non- shape. Higher centroid values indicate higher western musical forms, i.e. traditional Malay musical frequencies. forms. The remainder of this paper comprises six sections. The effect of combining different feature sets 2.1.2 Spectral Roll-Off on classification is discussed in Section 2 followed by The Roll-Off is another measure of spectral shape. It is analysis of direct comparison between two classifiers in the point where frequency that is below some percentage Section 3. Section 4 describes the experimental (usually at 85%) of the power spectrum resides. framework of this study. Results are revealed in Section 5. Finally, conclusion and future work are presented in 2.1.3 Spectral Flux Section 6. This feature measures frame-to-frame spectral difference. In short, it tells the changes in the spectral 2 FEATURE SETS shape. It is defined as the squared difference between the normalized magnitudes of successive spectral In music genre recognition, one challenge is the ability distribution. to differentiate between musical styles. Feature extraction is important for this. Feature extraction is the 2.1.4 Time Domain Zero Crossing part where knowledge of music, psychoacoustics, signal This refers to the number of time –domain zero crossings processing and many other fields is considered. It is a within a frame. Zero crossings occur when successive process where a segment of an audio is characterized samples in a digital signal have different signs. In simple into a compact numerical representation. Audio data are signals, ZCR is directly related to the fundamental fairly large to be stored and processing them can take a frequency. This feature can be used as a measure of lot of space and time. This is the reason why typically, noisiness in a signal. the features for audio and speech analysis algorithms are computed on a frame basis. By doing so, the amount of 2.1.5 Mel-Frequency Cepstral Coefficients data that needed to be processed could be reduced. MFCCs are based on the spectral information of a sound, Numerous audio features have been identified by but are modelled to capture the perceptually relevant various studies, each individual one or combination of parts of the auditory spectrum. It is the inverse of Fourier them is best suited for classifying a certain audio class, transform. They are thought to capture the perceptually be it between music and speech, between noise and relevant part of the auditory spectrum. Naturally, there music, male or female identifier, or musical genre are thirteen coefficients that have been found to be classification. However, it is important to find the right useful in speech representation. However, for genre features as any classification algorithm will always classification, the first five correlations appear to be of return with some kind of result, but a poor feature any importance in terms of performance. representation will only yield results that do not reflect the real nature of the underlying data. The extracted 2.2 Rhythm Related features will be useful during classification when standard machine learning techniques are used in the Other than the timbral related features, another feature next step. set that can be made useful is the rhythm related Works in Tzanetakis [1] and Aucoturier [3] proposed features. The beat and rhythmic structure of a song is a that for automatic musical genre classification, there are good genre indicator. In Tzanetakis [1], a beat histogram three prescriptive sets of feature that should be looked is built from autocorrelation function of the signal. The into thoroughly: timbral related features, rhythm related beat histogram provides an outline of the strongness and features and pitch related features. In order to exploit complexity of the beat in the music. This is an especially these features,