Musical Genre Classification of MPEG-4 TwinVQ Audio Data

Michihiro Kobayakawa Mamoru Hoshi Tokyo Metropolitan College of Industrial Technology Graduate School of Information Systems, Email: [email protected] The University of Electro-Communications

Abstract— We proposed a musical feature based on LSP (Line The rest of this paper is organized as follows. Section II Spectrum Pair) parameter directly extracted from the bitstream and Section III briefly describe the TwinVQ audio compression in the MPEG-4 TwinVQ audio data. Our key idea is to extract and a framework for musical genre classification, respectively. the musical features by using information stored in the bitstream without decoding to audio signals. In this paper, we propose Section IV proposes new musical features using LPC cepstrum two musical features for musical genre classification of MPEG-4 and LPC coefficient computed from LSP parameter in the TwinVQ audio data. bitstream of the TwinVQ audio data and the Discrete For extracting musical features, we focus on LPC (Linear Transform (DWT). Section V and Section VI describe exper- Predictive Coding) cepstrum and LPC coefficient computed from iments on 2, 196 compressed data and discusses the perfor- LSP parameter in the bitstream of TwinVQ audio data by inverse mances of the musical features for musical genre classification, operations of encoding steps. The musical features based on LPC cepstrum and on LPC coefficient are computed by Discrete respectively. Finally, Section VII concludes this paper. (DWT). For musical genre classification, we use the Discriminant Analysis (DA) as a classifier. We II. TWINVQ AUDIO COMPRESSION experimented on 2, 196 TwinVQ audio data collected from 10 musical genres and evaluated the performance of the musical This section briefly describes the architecture of MPEG- features. 4 TwinVQ audio compression. The details of TwinVQ From the experiments, we got the correct ratios 79.7% audio compression can be found in [8] or the URL and 84.1% for LPC coefficient-based musical feature and LPC http://www.twinvq.org/. A sequence of original signals of a cepstrum-based musical feature, respectively. And we compared piece of music is divided into frames. The MDCT (Modified the performance of two musical features. The experiments showed that LPC cepstrum-based musical feature had good Discrete Cosine Transform) is then applied to the signals of performance for musical genre classification in the compressed each frame (= 23.2ms). For the MDCT coefficients of each domain of MPEG-4 TwinVQ audio compression. frame, five analyses, namely, LPC analysis, Pitch analysis, Barkscale envelope analysis, Power analysis and Weighted I. INTRODUCTION VQ analysis are carried out in this order (Figure 1). In LPC analysis (Figure 2), TwinVQ encoder computes autocorrelation Audio compression techniques are widely used. We can coefficients, LPC cepstrum, LPC coefficients, LSP (Line Spec- obtain compressed music data through music distribution trum Pair) parameters, and LSP spectrum in this order, then the systems that contain hundreds of thousands of the compressed LSP parameters are stored in the bitstream. The LSP spectrum music data. As a large amount of compressed music data is (1024 dimensional) are used for describing the power envelope generated, stored and distributed continuously, methods for of MDCT coefficients of each frame. From LSP parameters in effectively managing compressed music data are required ( the bitstream, we can obtain LPC coefficient (20 dimensional) [2], [3], [10]). and LPC cepstrum (40 dimensional) by inverse operations of In addition to a compression function, functions for describ- encoding step in this order (Figure 3). ing the contents of compressed music data are required in order to effectively use the compressed data. To retrieve a III. FRAMEWORK piece of music, conventional music retrieval systems usually use keywords such as the title, artist, musical genre, etc. It This section describes a framework for musical genre classi- is difficult to describe musical genre of a piece of music. fication of the MPEG-4 TwinVQ audio data using a sequence Several researchers proposed a content-based musical genre of LPC cepstrum or LPC coefficients computed from the LSP classification from audio signal [4]–[7], [9], [11], [12]. It parameters stored in the bitstream of TwinVQ audio data. is becoming increasingly important to develop methods for The system for musical genre classification consists of music genre classification. The recent results of music genre four components : 1) classification space generator, 2) user classification are found in the web page of MIREX [1]. interface, 3) query generator, 4) genre assignment. In this paper, we propose two new musical features for Here, we describe the classification space; content-based musical genre classification in the compressed G1: The system manager labels with a musical genre for domain of MPEG-4 TwinVQ audio [8]. each audio data.

978-1-61284-350-6/11/$26.00 ©2011 IEEE input signal MDCT / / / interleave bitstream - LSP pitch Bark power

LPC LPC pitch Bark scale power weighted analysis spectrum analysis envelope analysis VQ prediction VQ LSP parameter 20 LSP VQ perceptual model

LPC Coefficient 20 inverse operations of encoding step Bark scale flattened bitstream power LSP pitch envelope MDCT coefficient

LPC Cepstrum 40

LPC spectrum pitch decoding Bark scale power decoding decode decoding envelope decoding output signal inverce interleave Fig. 3. Process for computing LPC coefficient and LPC cepstrum. IMDCT * + * *

Fig. 1. Architecture of MPEG-4 TwinVQ audio compression. IV. MUSICAL FEATURE This section describes two musical features for musical encoder genre classification of MPEG-4 TwinVQ audio data. Input MDCT coeffcient 1024 The musical features are based on the sequences of LPC

auto correration 20 cepstrum and of LPC coefficient computed from LSP parame- ter of the bitstream, and on multi-resolution analysis by DWT. LPC coefficient 20 We define LPC cepstrum vector cn and LPC coefficient LPC cepstrum 20 vector αn of the n-th frame by 1/2 LPC cepstrum 20 T cn = (cn,1, . . . , cn,d, . . . , cn,40) , (1) LPC coefficient 20 bitstream vector LSP parameter 20 LSP parameter 20 quantizatin T α α 1, . . . , α , . . . , α 20 , LPC spectrum 1024 n = ( n, n,d n, ) (2)

where cn,d and αn,d are the d-th LPC cepstrum and LPC Fig. 2. Encoding Step of LPC analysis. coefficient of the n-th frame, respectively, and T denotes the transposition. We consider the sequences Ωc and Ωα consisting of N frames (vectors), c Ω = c1 c2 . . . cn . . . cN , (3) G2: The system computes LPC cepstrum or LPC coef- ficient from LSP parameters in the bitstream and α calculates the musical feature vector by using the Ω = α1 α2 . . . αn . . . αN . (4) DWT. Here, we describe a method of extracting musical feature G3: To generate classification space, the system applies from the sequence. The procedures of feature extraction from the discriminant analysis to the set of musical feature c α the sequences Ω and Ω are the same. vectors and maps them into the discriminant space. We represent a sequence Ω of the D-dimensional vectors G4: The system calculates the centroid of the musical and of length N as the D × N matrix feature vectors with the same musical genre. We call the centroid “the representative point” of the genre. v1,1 v2,1 vn,1 vN,1  . . . .  . . . . The steps of assigning a musical genre to compressed music   Ω =  v1,d v2,d . . . vn,d . . . vN,d  . (5) data are as follows:    . . . .   . . . .  A1: A user inputs compressed audio data to the system   using user interface,  v1,D v2,D vn,D vN,D  c α A2: The system computes LPC cepstrum or LPC co- In the case of Ω (Ω ), the (n, d) element vn,d of Ω is the c α efficient from LSP parameters in the bitstream of (n, d) element cn,d (αn,d) of Ω (Ω ) and D = 40(20). the MPEG-4 TwinVQ audio data and calculates the The d-th row of the matrix Ω expressed as musical feature vector. ω v1 , . . . , v , A3: The system maps it into the classification space, finds d = ,d N,d (6) the representative point nearest from the mapped is called “the d-th sequence of length N”. points, and returns the musical genre of the repre- Then, by using DWT with the base Haar, we decompose sentative point. the d-th sequence ωd and then obtain the detailed wavelet N TABLE I coefficients wd,l,t (l = 1, . . . , L; t = 1, . . . , b 2l c ) up to GENRE AND THE NUMBER OF COMPRESSED DATA. level L, where L denotes the maximum decomposition level of DWT. genre # of compressed data We calculate the mean spectrum pd,l of level l (l = baroque 223 1, . . . , L) defined by the average of the power of the detailed bossanova 119 176 wavelet coefficients wd,l,t of level l dance hiphop 196 2 (wd,l,t) jazz 431 pd,l = P N , march 91 2l oldies 435 rock 200 and the standard deviation σd,l tango 220 N waltz 99 v 2l total 2, 196 u 1 2 2 σd,l = u (w − pd,l) . u N X d,l,i t 2l i=1

We then define the mean spectrum vector pd,L by the mean [Classifier]: We used the Discriminant Analysis (DA) as a spectrum up to level L as classifier. The DA is a simple and widely used standard statical classifier. pd,L ≡ (pd,1, pd,2, . . . , pd,L), d = 1, . . . , D, B. Musical Genre Classification and the standard deviation vector up to level L as We applied the MPEG-4 TwinVQ encoder to each signal σd,L ≡ (σd,1, σd,2, . . . , σd,L). of piece of music and obtained 2, 196 compressed music data, then computed the musical feature vectors for each We define the vector f δ,L of the compressed music data up compressed music data. to order δ and wavelet decomposition level L as The musical genre classification was done as follows. Step1: As training data we randomly selected 50 percent of f δ,L ≡ (p1,L, , . . . , pδ,L, σ1,L, . . . , σδ,L), the compressed music data of the same genre and used the 1 ≤ δ ≤ D, 1 ≤ L ≤ 10. rest as the test data (50%). Step2: The system executed discriminant analysis to the δ, L and call it “the ( ) musical feature vector”. training data, and obtained the discriminant space and the δ, L We can obtain the ( ) musical feature vectors from the centroid vector (representative point) for each musical genre. Ωc Ωα sequences and , and call them “LPC cepstrum-based Step3: To classify the test data, the system maps the musical (δc, L) musical feature vector” (1 ≤ δc ≤ 40) and “LPC feature vector of the test data into the discriminant space and coefficient-based (δα, L) musical feature vector” (1 ≤ δα ≤ then compute the distance between the test musical feature 20), and use each of them for genre classification. vector and centroid points (representative points), and returns V. EXPERIMENTS the genre of the nearest centroid as the musical genre of the test data. This section evaluates the performance of LPC cepstrum- For each (δ, L) musical feature, the Step1 ∼ Step3 above δ , L based ( c ) musical feature vector and LPC coefficient-based were repeated three times and the average accuracy of classi- (δα, L) musical feature vector. fication was obtained. We computed LPC cepstrum (40 dimensional) and LPC [LPC coefficient-based musical feature vector]: We exam- coefficient (20 dimensional) from the LSP parameter stored in ined how the number δα of LPC coefficients and the wavelet TwinVQ compressed data and computed the LPC cepstrum- decomposition level L affect on the performance of musical based (δc, L) musical feature vector and the LPC coefficient- genre classification. based (δα, L) musical feature vector. Table II shows the average correct ratio of f δα,L for δα = 1 L δ A. Data set and Classifier ∼ 20, = 1 ∼ 10. The average correct ratios for 7 ≤ α ≤ 20 and 4 ≤ L ≤ 10 were greater than or equal to 70.0%. The [Data set]: We experimented on 2, 196 compressed music maximum ratio was 79.7% (for f 20,5). data of 10 musical genres such as “baroque”, “bossanova”, [LPC cepstrum-based musical feature vector]: Table III “dance”, “hiphop”, “jazz”, “march”, “oldies”, “rock”, “tango” shows the average correct ratio of f δc,L for δc = 1 ∼ 40, and “waltz”. L = 1 ∼ 10. The average correct ratios for 6 ≤ δc ≤ 36 We collected the pieces of music from collected works of and 2 ≤ L ≤ 10 were greater than or equal to 70.0%. The “baroque”, “bossanova”, “dance”, “oldies” and “tango”, and maximum ratio was 84.1% (for f 26,3). from several albums of march and waltz. In addition, we selected the pieces of music of hiphop and rock from our VI. DISCUSSION library consisting of 20, 000 pieces of music. Table I shows This section discusses the performance of LPC coefficient- the number of pieces of music of each musical genre. based (20, 5) musical feature and of LPC cepstrum-based TABLE II

AVERAGE CORRECT RATIO OF THE (δα, L) MUSICAL FEATURE VECTOR f δα,L FOR TEST DATA (%). wavelet decomposition level L δα 1 2 3 4 5 6 7 8 9 10 1 38.5 49.4 51.2 51.2 53.2 54.7 55.0 56.6 59.0 59.2 2 47.3 58.5 61.6 61.9 62.6 62.4 63.6 65.2 66.7 66.5 3 52.0 61.8 64.8 64.8 65.7 65.7 65.7 68.3 69.8 69.9 4 57.1 65.5 67.9 68.2 69.4 69.1 69.2 71.8 72.2 72.3 5 58.9 67.0 68.8 69.7 69.9 70.1 70.0 72.6 73.2 73.3 6 59.4 67.1 68.6 69.7 70.2 71.1 71.4 73.6 74.3 74.4 7 60.9 68.5 69.6 70.7 71.6 71.9 72.5 74.2 74.7 74.7 8 61.6 69.1 70.6 71.4 72.1 72.3 72.9 74.7 75.3 75.5 9 61.6 69.0 69.8 70.7 71.9 72.2 72.8 74.0 75.0 74.7 10 63.2 69.6 71.1 71.7 73.2 73.2 73.1 74.5 75.3 74.6 11 64.1 69.7 71.2 72.0 72.7 73.4 73.4 75.2 75.6 75.3 12 65.3 69.9 71.5 71.9 72.4 73.6 73.8 75.1 75.8 75.7 13 66.0 70.1 71.5 72.5 73.6 74.2 74.3 75.4 75.8 75.6 14 66.5 71.0 72.2 72.6 73.4 73.7 73.4 75.0 75.9 75.4 15 66.9 72.0 72.9 73.3 74.7 74.8 74.7 75.8 76.2 76.2 16 67.2 73.0 74.5 75.1 75.1 75.7 75.4 76.4 77.3 77.7 17 68.2 73.4 74.6 75.4 75.9 76.1 76.4 76.9 77.2 77.5 18 69.7 75.5 76.4 77.7 77.6 77.1 77.4 77.0 77.8 77.8 19 70.5 76.3 78.3 78.6 79.1 78.5 78.5 78.1 78.0 77.3 20 70.7 76.9 77.9 78.4 79.7 78.9 78.6 78.2 78.5 77.2

(26, 3) musical feature. We look into the rows in Table IV and Table V. The Firstly, we show the performance of LPC coefficient-based element #r of the right-most column is the number of genres (20, 5) musical feature in the form of a confusion matrix on into which some pieces of genre r were wrongly classified. percentage (Table IV). The rows and the columns correspond The smaller the number is, the better the genre classification to the actual genre and to the classified genre, respectively. performance is. The numbers for LPC cepstrum-based musical The main diagonal elements show the correct ratios. And the feature were greater than those for LPC coefficient-based mu- off-diagonal element (r, c) (r 6= c) shows the ratio of the sic feature, except for bossanova music. The minimum (best) number of test data of genre r which were wrongly classified number and maximum (worst) number for LPC cepstrum- as genre c to the number of test data of genre r, and is called based musical feature were less than those for LPC coefficient- “the misclassification ratio”. For example, the element of row based musical feature. The averages were 5.4 and 4.3 for LPC “dance”, column “bossa” with value 6.82 means that 6.82% of coefficient-based musical feature and for LPC cepstrum-based the dance music was wrongly classified as bossanova music. music feature, respectively. All the diagonal elements in Table IV were greater than We look into the columns in Table IV and Table V. For or equal to 60.00, except for dance (53.41%). Dance music example, in Table V, the baroque column shows that the set had the minimum (worst) correct ratio 53.41. Dance music of pieces of music classified into baroque genre consisted was misclassified into eight genres such as rock (15.91%), of six genres, i.e., baroque, bossanova, dance, hiphop, tango hiphop(7.95%), oldies(7.95%), bossa(6.82%), jazz(3.41%) and waltz. In other words, some pieces of music in genres, and march (2.27%), and tango, waltz (1.69%). bossanova, dance, hiphop, tango or waltz were wrongly clas- Secondly, we show the performance of LPC cepstrum-based sified into baroque genre. (26, 3) musical feature (Table V). All diagonal elements in The element c of the bottom row shows the ratio Rc Table V were greater than or equal to 60.00. The correct ratios (percentage) of the number of pieces of music of genre c for six genres: baroque, hiphpo, jazz, march, oldies and rock to the total number of pieces of music classified into genre were greater than or equal to 81.82. Bossanova music had c. For example, the bottom of the baroque column in Table the minimum (worst) correct ratio 64.41. Bossanova music IV shows that the 90.97% pieces of music classified into was misclassified into six genres such as oldies (18.64%), baroque music were actually baroque music, and the 9.03% jazz(10.17%) and baroque, dance, hiphop, tango (1.69%). pieces of music classified into baroque music were actually Thirdly, we compare the performance of LPC coefficient- not baroque music. We can say that the ratio Rc for genre c based music feature with that of LPC cepstrum-based musical shows the reliability of the assignment of genre c to a piece feature by using Table IV and Table V. We compare the main of music. The larger the ratio is, the higher the reliability of diagonal elements, i.e., the correct ratios. The correct ratios for genre classification for music data is. From Table IV (Table LPC cepstrum-based (26, 3) music feature were greater than V), we see that the maximum ratio and the minimum ratio those for LPC coefficient-based (20, 5) music feature, except for LPC cepstrum-based musical feature (LPC coefficient- for baroque, bossanova and rock. based musical feature) were 95.58 at rock column (91.09 at TABLE III

AVERAGE CORRECT RATIO OF THE (δc, L) MUSICAL FEATURE VECTOR f δc,L FOR TEST DATA (%). wavelet decomposition level L δc 1 2 3 4 5 6 7 8 9 10 1 38.5 49.4 51.2 51.2 53.2 54.7 55.0 56.6 59.0 59.2 2 48.9 57.2 59.7 60.9 62.4 62.6 64.0 67.8 68.0 69.0 3 54.5 62.0 65.1 65.8 66.9 68.2 68.7 72.8 73.1 73.8 4 55.7 64.0 65.8 67.9 69.4 70.4 71.3 74.9 76.1 76.1 5 58.3 67.2 70.2 71.5 73.1 73.7 75.2 78.6 78.9 78.7 6 61.5 70.1 73.0 74.5 75.8 76.9 77.6 80.8 80.7 80.6 7 63.3 71.5 74.5 76.6 77.6 78.2 78.9 81.8 81.1 81.3 8 64.5 72.5 75.7 77.0 77.7 78.9 79.6 82.0 81.5 81.6 9 66.3 73.6 76.9 78.0 79.0 80.0 80.3 82.6 81.9 81.7 10 66.1 73.7 77.5 77.8 79.1 79.6 80.5 82.1 81.6 81.4 11 67.4 74.9 78.7 79.0 80.1 81.2 80.8 82.8 81.9 81.6 12 68.7 75.9 79.1 79.8 81.0 81.6 81.4 82.6 81.9 81.5 13 69.4 76.5 79.4 79.6 81.2 81.3 81.3 82.7 81.3 80.8 14 70.4 78.2 79.8 79.7 80.9 81.0 81.5 82.4 81.4 81.1 15 71.0 78.2 79.7 79.7 81.0 81.0 81.0 82.3 81.3 81.0 16 72.0 78.6 80.2 80.8 82.2 81.3 81.3 82.3 81.9 81.1 17 72.9 79.0 80.4 81.0 82.2 81.4 81.5 82.2 81.6 80.8 18 73.2 79.4 81.2 81.9 82.6 82.0 82.1 82.2 81.6 80.6 19 73.4 80.0 81.9 81.8 82.6 81.8 81.9 82.4 81.5 80.6 20 74.5 80.5 82.4 82.2 82.9 82.3 82.8 82.5 81.4 81.0 21 75.1 81.1 82.8 82.6 83.3 82.7 82.6 82.9 81.6 80.7 22 76.1 81.4 83.3 82.9 83.3 83.1 82.7 83.3 81.6 80.6 23 75.9 81.7 83.7 83.0 83.3 82.6 82.1 82.5 81.5 80.1 24 76.2 81.8 83.7 83.1 83.4 82.4 82.0 82.5 81.3 79.6 25 76.2 82.1 83.9 83.0 83.1 82.9 82.3 81.9 80.9 79.7 26 76.1 82.2 84.1 83.0 83.0 82.3 81.9 81.5 80.7 78.0 27 75.6 82.4 83.7 83.0 83.2 82.2 81.5 81.6 79.6 77.7 28 75.7 82.3 83.6 82.8 82.7 82.2 81.7 80.7 79.0 76.8 29 75.4 82.3 83.5 83.1 83.1 81.8 80.9 80.7 78.8 76.7 30 75.4 82.7 83.9 83.0 83.3 81.9 81.3 80.5 78.3 76.9 31 75.4 82.7 83.8 82.5 83.0 81.5 81.1 80.2 78.2 76.2 32 75.7 82.7 84.1 82.8 83.2 81.5 80.5 79.7 77.4 75.4 33 75.4 82.4 83.7 82.9 82.7 81.0 80.4 78.9 77.0 74.8 34 75.3 82.6 83.8 83.1 82.7 80.8 80.0 78.6 76.6 73.3 35 75.9 82.5 83.6 82.5 82.3 80.5 78.9 77.9 75.7 72.0 36 76.3 82.7 83.1 82.6 82.4 80.6 78.8 77.9 75.7 72.0 37 76.1 82.5 83.0 82.5 82.3 80.7 78.8 77.3 74.7 70.1 38 76.3 82.6 83.0 82.5 82.0 80.0 78.3 76.5 74.9 69.4 39 77.1 82.4 82.5 82.2 81.5 79.9 77.8 75.8 73.3 68.4 40 77.0 82.3 82.5 82.6 81.1 79.6 77.4 75.2 72.4 67.3

hiphop) and 67.08 at bossanova column (53.13 at bossanova), from the bitstream of the MPEG-4 TwinVQ audio data by respectively. The average ratios were 83.52 and 79.14 for LPC inverse operation of encoding step. We applied the Discrete cepstrum-based music feature and for LPC coefficient-based Wavelet Transform to a sequence of the LCP cepstrum or LPC music feature, respectively. coefficient and obtained the (δ, L) musical feature vectors. The discussions above show that the LCP cepstrum-based To evaluate the performance of the musical features, we ex- (26, 3) musical feature vector had a better genre classification perimented musical genre classification on 2, 196 compressed performance than the LCP coefficient-based (20, 5) musical music data of 10 musical genres. We got the maximum feature vector. correct ratios 79.7% and 84.1% for LPC coefficient-based Finally, we can say that the musical genre classification by (20, 5) musical feature vector and for LPC cepstrum-based using the LCP cepstrum-based (26, 3) musical feature vector (26, 3) musical feature vector, respectively. Although the used had good performance. classifier (DA) was very simple, the proposed music features had very good performance for musical genre classification. VII. CONCLUSION From the experiments, we can say that LPC cepstrum-based (26, 3) musical feature vector had a good performance of This paper proposed new musical features for musical genre musical genre classification. classification of MPEG-4 TwinVQ audio data. The key idea is to use LPC cepstrum or LPC coefficient directly computed In this paper, we use LPC cepstrum and the LCP coefficient TABLE IV

THE ACCURACY OF GENRE CLASSIFICATION USING LPC COEFFICIENT-BASED (20, 5) MUSICAL FEATURE VECTOR f 20,5 FOR THE TEST DATA (%)

baroque bossa dance hiphop jazz march oldies rock tango waltz #r baroque 84.55 0.00 0.00 0.00 0.91 1.82 1.82 0.00 7.27 3.64 5 bossa 5.08 67.80 0.00 1.69 10.17 0.00 11.86 0.00 3.39 0.00 5 dance 0.00 6.82 53.41 7.95 3.41 2.27 7.95 15.91 1.14 1.14 8 hiphop 0.00 1.00 3.00 92.00 1.00 0.00 2.00 0.00 1.00 0.00 5 jazz 0.00 8.84 0.47 0.00 83.72 0.00 1.86 0.00 4.65 0.47 5 march 2.22 0.00 0.00 0.00 2.22 82.22 0.00 0.00 4.44 8.89 4 oldies 0.88 3.08 0.00 0.44 7.49 0.44 82.82 0.44 3.96 0.44 8 rock 0.00 0.00 3.00 0.00 1.00 0.00 1.00 94.00 1.00 0.00 4 tango 0.00 3.00 0.00 0.00 11.00 0.00 8.00 1.00 73.00 4.00 5 waltz 10.20 0.00 0.00 0.00 16.33 2.04 2.04 0.00 4.08 65.31 5 ratio Rc 89.50 53.13 87.02 91.09 78.68 86.25 85.14 85.45 66.97 68.30 TABLE V

THE ACCURACY OF GENRE CLASSIFICATION USING THE LPC CEPSTRUM-BASED (26, 3) MUSICAL FEATURE VECTOR f 26,3 FOR THE TEST DATA (%)

baroque bossa dance hiphop jazz march oldies rock tango waltz #r baroque 81.82 0.00 0.00 0.00 0.91 0.00 2.73 0.00 8.18 6.36 4 bossa 1.69 64.41 1.69 1.69 10.17 0.00 18.64 0.00 1.69 0.00 6 dance 2.27 4.55 69.32 11.36 1.14 0.00 7.95 3.41 0.00 0.00 6 hiphop 1.00 0.00 3.00 94.00 0.00 0.00 1.00 1.00 0.00 0.00 4 jazz 0.00 3.26 0.00 0.00 92.09 0.00 3.26 0.00 0.93 0.47 4 march 0.00 0.00 0.00 0.00 4.44 95.56 0.00 0.00 0.00 0.00 1 oldies 0.00 2.20 2.20 0.44 7.93 0.88 84.14 0.00 1.76 0.44 7 rock 0.00 0.00 4.00 0.00 0.00 0.00 3.00 93.00 0.00 0.00 2 tango 1.00 1.00 0.00 0.00 7.00 0.00 11.00 0.00 79.00 1.00 5 waltz 8.16 4.08 0.00 0.00 0.00 12.24 0.00 0.00 4.08 71.43 4 ratio Rc 90.97 67.08 82.55 88.69 85.12 84.48 81.19 95.88 81.33 77.93

computed from LSP parameters, because they can be calcu- [7] M. Mandel and D. P. W. Ellis, “LABROSA’S Audio Mu- lated from LSP parameters in the bitstream without decoding sic Similarity and Classification Submissions,” http://www.music- ir.org/mirex/2007/abs/ AI CC GC MC AS mandel.pdf, 2007. and are fundamental information of audio signals. [8] T. Moriya, N. Iwakami, K. Ikeda, and S. Miki, “Extension and The contribution of this paper is to present new musical Complexity of TwinVQ Audio Coder,” Proceedings of International features to classify MPEG-4 TwinVQ compressed data into Conference on Acoustics, Speech and Signal Processing (ICASSP1996), pages 1029–1032, 1996. musical genre without decoding to audio signals. [9] N. M. Norowi , S. Doraisamy and R. Wirza, “Factors Affecting Automatic Genre classification: An Investigation Incorporating Non- Western Musical Forms,” Proceedings of International Symposium on REFERENCES Music Information Retrieval (ISMIR2005), pages 13–20, 2005. [10] G. Tzanetakis and P. Cook, “Sound Analysis using MPEG Compressed [1] MIREX2009 Audio Genre Classification Audio,” Proceedings of IEEE International Conference Acoustic, (Mixed Set) Results, http://www.music- Speech, and Signal Processing (ICASSP2000), pages 761–764, 2000. ir.org/mirex/wiki/2009:Audio Genre Classification (Mixed Set) Results. [11] G. Tzanetakis and P. Cook, “Music genre classification of audio signals,” [2] M. Nakanishi, M. Kobayakawa, M. Hoshi and T. Ohmori, “A Method IEEE Transaction on Speech and Audio Processing, Vol. 10, No. 5, pages for Extracting a Music Unit to Phrase Music Data in the Compressed 293–302, 2002. Domain of TwinVQ Audio Compression,” Proceedings of 2005 IEEE [12] G. Tzanetakis, “MARSYS Submissions to MIREX207,” International Conference on Multimedia & Expo, A. 5-2, 2005. http://www.music-ir.org/mirex/2007/abs/ AI CC GC MC AS tzanetakis, [3] M. Kobayakawa, M. Hoshi and K. Onishi, “A Method for Retrieving 2007. music data with different bit rates using MPEG-4 TwinVQ Audio Compression,” Proceedings of the ACM Multimedia 2005, pages 459- 462, 2005. [4] T. Li, M. Ogihara and Q. Li, “A comparative study on content-based music genre classification,” Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2003), pages. 282–289, 2003. [5] T. Lidy and A. Runber, “Evaluation of Feature Extractions and Psycho-Acoustic Transformations for Music Genre Classification,” Pro- ceedings of International Symposium on Music Information Retrieval (ISMIR2005), pages 34–41, 2005. [6] T. Lidy, A. Runber, A. Pertusa, and J. M. Inesta, “Combining Audio and Symbolic Descriptors for Music Classification from Audio,” http://www.music-ir.org/mirex/2007/abs/AI CC GC MC AS lidy.pdf, 2007.