Tenth IEEE International Symposium on Multimedia

A Music Retrieval Method based on Distribution of Feature Segments

Þ Þ Kazuhisa OnoÝ , Yu Suzuki , Kyoji Kawagoe

Ý Graduate School of Science and Engineering, Ritsumeikan University Kusatsu, Shiga, 525-8577, Japan, [email protected]

Þ College of Information Science and Technology, Ritsumeikan University  Kusatsu, Shiga, 525-8577, Japan, yusuzuki, kawagoe @is.ritsumei.ac.jp

Abstract the query music piece and the retrieval target music piece. When two music pieces have several similar features, these In this paper, we propose a music retrieval method based music pieces are treated as similar types of music, and the on the distributions of features in the music. In common importance of the other features of the music is ignored. music retrieval methods, if several features are similar be- However, the problem with this approach is that, when the tween the query and the retrieval target, the retrieval sys- other features of the music are quite different, these music tems return that the query is similar to the retrieval target. pieces should be treated as different types of music. Sim- However, a problem is that several features in the music are ilarly, when the other features are not very different, these ignored. If the other features in the query and the retrieval pieces should be treated as similar types of music. target are quite different, the query and the retrieval tar- To solve this problem, the music retrieval system should get should be treated as different types of music. Therefore, avoid ignoring the other features. Therefore, in this paper, we calculate the importance of each feature in the music. we calculate the importance of the each feature for all fea- Then, we compare the importance of features between the tures in a retrieval target music set. The importance of the query and the retrieval target, and we can retrieve the mu- features means how the features contribute to represent the sic without ignoring the importance of several features. In music. We can assume that the more the feature appears in our experimental evaluation, we can confirm that our pro- the music, the more the feature is important in the music. posed system has better accuracy than the baseline method. This idea is quite similar to the idea in textual infor- mation retrieval techniques. In the textual information re- trieval techniques, the text retrieval system also should not ignore several terms to retrieve texts. To calculate the im- 1. Introduction portance of the terms, the text retrieval system deals with vector retrieval model and Term Frequency/Inverse Docu- For some time now, people have been able to hear a num- ment Frequency (TF-IDF) algorithm[1]. When the text re- ber of types of music in passive ways, for example, as back- trieval system deals with TF-IDF algorithm, the system cal- ground music in shops, on TV programs, and in movies. culates how often the each term appears in the text. Then, From these music pieces, the listener can become interested the system can calculate the importance of the each term in in the music, and can obtain information on it such as the a retrieval target text set. song title and the artist’s name. Moreover, the listener may Based on this algorithm, we deal with the vector retrieval want to listen to other songs that are similar to the music model and TF-IDF with a music information retrieval sys- they have heard. When a listener retrieves music, the lis- tem, and we calculate the importance of the features. In our tener usually uses music information such as the song title approach, we divide the music into meaningful segments, or music data as a query[4]. Nevertheless, the impression and deal with TF-IDF to calculate the frequencies of the that the listener has of the music may be vague and uncer- segments in the music. We can calculate the importance tain, and the listener cannot generate a specific query from of each segment. Then, we compare the importance of seg- it. Therefore, the listener needs a music retrieval system ments between the query music piece and the retrieval target that uses music data as the query. music piece, and we can retrieve the music without ignor- In common content-based music retrieval algorithms[2, ing the importance of several segments. Using our proposed 9], the music retrieval system extracts acoustic features method, the system can calculate intuitively similar music from the music piece, and compares the features between from retrieval target music.

978-0-7695-3454-1/08 $25.00 © 2008 IEEE 613 DOI 10.1109/ISM.2008.93

Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply. 2. Related Work this problem, common text retrieval systems deal with TF- IDF. In TF-IDF, the frequencies of each term in the texts are Several methods of content-based music retrieval exist. considered. Using the frequencies of each term, the com- Some methods are music retrieval systems that divide the mon text retrieval systems can consider to what extent each music data into segments, and generate music signatures term is important in the texts, and calculate the importance from the segments[3, 7]. There are also methods that gener- of each term. In a similar way, we deal with TF-IDF to cal- ate music signatures from pitch sequences in the music data. culate the importance of each feature segment in the music. In these methods, the retrieval techniques mainly deal with Using TF-IDF, we can consider the importance of the fea- pitch information, and do not attach importance to other in- ture segments, and prevent the similarity of several feature segments from being ignored.

formation. There are also methods that utilize Æ -gram for music retrieval. Doraisamy[5] extracts monophonic pitch Let us consider an example of a matching failure in a sequences from polyphonic music and constructs musical common music retrieval system. We give ÑÙ×  as a

words using the sequences. Then, their retrieval system query to a music retrieval system, and ÑÙ×  is included Å Ù×  in the retrieval target music. Å Ù×  and have

utilizes Æ -gram to index the musical words. The similar- ity with our method is that it considers musical structures similar feature segment, and the feature segment is frequent. the way text structures are considered. However, their re- However, the other feature segments in these music pieces trieval system treats the musical words as having a fixed are different, and the distributions of these feature segments are not similar. In retrieving music, the common music re-

length because of Æ -gram, and their system does not con- sider the length of the feature segments. Downie[6] also trieval system can judge that these music pieces are similar, because it ignores several feature segments. As a result, the

deals with Æ -gram to divide the music into segments with a fixed length, and then applies TF-IDF with the segments. system does not consider whether these distributions of fea- The similarity with our method is in utilizing a text re- ture segments are similar or not. Moreover, the system can- trieval method as a music retrieval method. However, in not judge whether these distributions of melodies are simi- their method, each segment is the same length. On the larornot. other hand, there is a method that divides music pieces into On the contrary, we show another example using our pro- frames, and extracts features by merging the frames from posed method. We deal with the same retrieval case as that the music pieces[10]. This is similar to our method in that described in the previous paragraph. We calculate a distri- it deals with similarity between the frames. However, their bution of the feature segments for each music piece. We system does not classify the features. do not consider only the importance of the frequent feature Another study[8] considers the structures of music to be segment, but also the importance of the other feature seg-

structures of text based on phonetics. Their system divides ments. Then, we can determine that the frequencies of the ÑÙ×  the music into flexible segments, and treats each segment feature segments in ÑÙ×  and in are differ- as a term in the text. This is similar to our method in that ent. Consequently, we can judge that these distributions of it considers the flexible segments to be feature segments. melodies are not similar.

However, their system deals with all segments that are ex-

º ÇÚÖÚÛ Ó ÇÙÖ ÔÔÖÓ tracted from the music as terms, and describes the music by ¿º½ sequentially connecting all the segments. Figure 1 shows an overview of our music retrieval sys- 3. Music Retrieval based on Distribution of tem. Our system compares frequencies of the feature seg- ments in a query with frequencies of the feature segments Feature Segments in the retrieval target music by utilizing a vector retrieval model. In this comparison, our system retrieves music Features that the music represents are not constant in based on the distribution of feature segments in the music. the music, but these features change by playing positions. To retrieve music, we extract frequencies of the feature Then, we divide the music into meaningful segments. In segments from each retrieval target music piece by pre- this paper, we define the meaningful segments as feature processing, and we construct a feature segment database segments. These feature segments characterize the music. and a retrieval target music database. The feature segment In other words, the feature segments can be treated as terms database contains a lot of feature segments that we extract in a text. In a text retrieval method, when two texts have from the retrieval target music. In other words, we deal same term and the term appears frequently in the texts, these with the feature segment database just as a word dictionary texts can be treated as relevant texts. However, the other is used with text. The retrieval target music database con- terms can be ignored, and the other terms should be consid- tains the frequencies of each feature segment in the retrieval ered judge whether two texts are relevant or not. To solve target music.

614

Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply. Similarity Retrieval System based on our proposal

Music A Feature Segment Database Input the music Retrieval Target Music Feature Segment b (Polyphonic data) Step 1 Database Feature Segment a Feature Segment a Extracting feature segments in the query Feature Segment c Feature Segment d

Similar Step 2 Step 3 Calculating frequency Getting frequency of of feature segments in feature segments in the Feature Segment c User the query retrieval target music Feature Segment a Feature Segment b Feature Segment d Step 4 Calculating similarity for each music piece Similarity of each music piece Music B

Figure 1. Overview of our music retrieval sys- tem Figure 2. Representation of feature segments using similar segments. A user gives a polyphonic music data such as a WAVE format data to the system as a query. In our system, we as- sume that the user does not give a part of the music data, but the whole music data to the system. In Step 1, the sys- similar string patterns from a lot of strings, and defines the tem extracts the feature segments from the query. In Step pattern as a term. In our proposed method, we extract fea- 2, the system calculates the frequencies of each feature seg- ture segments from the music based on this method for text. ment in the query. In Step 3, the system also obtains the In short, we assume that the sequential time series data in frequencies of each feature segment in the retrieval target the music correspond to strings in the text. We extract seg- music from the retrieval target music database. In Step 4, ments that have similar feature values from the songs, and the system calculates the similarity between the query and we define the similar segments as feature segments. From the retrieval target music by using the frequencies of the this assumption, we can extract flexible feature segments feature segments in the music data. Lastly, the system gives from the music. a similarity result to the user. Figure 2 depicts the notion of similar segments in our

proposal. In the figure, we extract similar segments from

ÑÙ×  ÑÙ× 

º ÜØÖØ ØÙÖ ËÑÒØ× ¿º¾ and , and we define the similar seg- ments as feature segments. Furthermore, our proposal cov- The length of each feature segment in the music is not ers cases where a feature segment includes other feature

the same. For example, we can assume that music has both segments, such as feature segment , or where a feature seg- short and long phrases. In this assumption, if we define a ment overlaps other feature segments, such as feature seg-

fixed length for each feature segment, we will ignore several ment . We are not restricted by the minimum units in time phrases. Therefore, to deal with flexible feature segments, series data when extracting similar segments from the mu- we should extract the feature segments from the music con- sic. Hence, we can extract flexible feature segments from sidering the length of each feature segment. the music while preserving sequences for time series data. In spite of this extraction, polyphonic data is time series

data. When we deal with flexible lengths of terms in text,

ÓÙ×Ø ØÙÖ׺ we can analyze the morphology based on a word dictionary. ¿º¾º½º In this paper, we deal

We can also divide the text into minimum units such as let- with two values based on filter-bank approach: frequency

¡¡¡ 

¾ Ò ters. On the other hand, time series data is sequential data, distribution ½ and average amplitude as fea- and a minimum unit does not exist in time series data. We ture values for the music. The first value, frequency distri- cannot deal with a dictionary of feature segments, because bution, is obtained by calculating the Fast Fourier Trans- we cannot deal with a minimum unit. Therefore, dividing form (FFT) value for similar segments. Let the frequency the time series data into meaningful segments and extract- distribution be the following value. We divide the frequency

ing flexible feature segments from the music is difficult. domain into Ò sections, and we get the value by calculating To accomplish this task, we assume that the music cor- the sum total of the amplitude in each section. We can detect

responds to text that is written in an unknown language. As differences in timbral textures using the frequency distribu-

a simple way of interpreting the text, an interpreter extracts tion. To calculate the frequency distribution, we define 

615

Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply.

Ø

 ½ ¾ ¡¡¡ Ò in the  ( ) section as follows: Nevertheless, the average amplitude and the frequency dis-

tribution are values that vary case by case, and many pat- Æ

terns of their values exist. Therefore, the same pattern rarely

 

 (1) appears in the music, and considering the frequencies of

´ ½µÆ ·½  feature segments is difficult. To solve this problem, we de-

fine the feature segments that have similar value patterns

 

where  is the amplitude at a frequency point in the fre- as the same feature segments, and we classify the feature

quency domain, and Æ is the number of frequency points in segments in order to reduce the number of different kinds a section. of feature segments. We compare these feature segments The second value is the average amplitude, which is the

with the feature segments that have the same length Ä.To average value of amplitude for each sampling point in each compare each feature segment, we calculate the similarity similar segment. We can detect differences in impressions of each frame in each feature segment. For this calculation, based on the strength of the sound using the average ampli- if all similarities of calculated frames are over the threshold,

tude. We calculate the average amplitude  as follows:

we define the feature segments as the same feature segment, È

Ä and merge the feature segments. If at least a similarity of a





 ½

 (2) calculated frame is under the threshold, we define the fea- Ä

ture segments as different feature segments. We deal with 

Ø asthethresholdinthesamewayasinSection3.2.2.

   ½ ¾ ¡¡¡ Ä

where  is the amplitude at a ( )sam-

pling point in similar segments, and Ä is the length of the

º ÐÙÐØÒ ËÑÐÖØÝ Ù×Ò ÎØÓÖ

similar segments. ¿º¿

ÅÓ Ð

¿º¾º¾º ØÙÖ ËÑÒØ× Ý ÓÒÒØ

ÖÑ׺ Ä  We need an value to calculate the feature val- We utilize a TF-IDF algorithm for the vector retrieval ues described in Section 3.2.1. However, we cannot get an model in text retrieval in our music retrieval method. The

Ä value until we extract similar segments from the music. frequencies of terms in text characterize the theme of the Therefore, we cannot calculate an FFT value for the similar text. Similarly, the frequencies of feature segments in music segments. To solve this problem, we divide music into fixed characterize the music. Therefore, we can retrieve music frames, and we connect the frames that are similar between based on a distribution of the feature segments using the one piece of music and other music pieces. We define the frequencies of the feature segments in music. In this paper, connected frames as similar segments. TF refers to the frequency of a feature segment in the music, To extract the similar segments from the music, we first and IDF refers to the inverse number of music pieces in get the feature values for a frame in each music piece. Next, which a feature segment appears. we generate a vector that consists of the feature values as Then, we calculate the TF-IDF value and determine the elements for each frame in each music piece, and we calcu- weight of each feature segment in each piece of music. We late the vector similarity. We deal with the cosine measure express the weighting determination as follows:

as the vector similarity. For this calculation, if the vector

Ö



× Ø ÐÓ Ð

similarity is over the threshold , we do the same calcula- Ð (3) Ô tion for the following frames. While we calculate the vector 

similarity in turn, if the vector similarity is under the  ,we

Û Û ¡¡¡ Û

¾ Ñ where we extract the feature segments ½

stop calculating the vector similarity. Then, we define the

Ë Ë Ë ¡¡¡ Ë 

¾ Ö from the retrieval target music ½ . Here,

segment from the start frame to the frame preceding the end Ø

Ø   ½ ¾ ¡¡¡ Ñ

Ð is the frequency of ( ) feature segment

frame as a similar segment. On the other hand, if the vec- Ø

Û Ð Ð ½ ¾ ¡¡¡ Ö Ë Ô

Ð   in ( )music ,and is the number

tor similarity is under the  at the first calculation, we start

Û × Ð of music pieces that contain  .Theterm is the weight

calculating the vector similarity from the following frame

Û Ë Û

Ð  of  in . We treat the frequencies of in the query as

again.

Õ Û 

weights  of the feature segments in the query.

Ñ×

In our proposal, we set ¾ as the length of the Lastly, we generate feature vectors that consist of each

Ñ× frame and ½¼ as the length of the frame shift. This is weight as elements, and we calculate each feature vector because many studies on speech recognition deal with these similarity. In this paper, we deal with the cosine measure as values. the feature vector similarity. We calculate the cosine mea-

sure as follows:

¿º¾º¿º Ð×׬ØÓÒ ÓÖ ØÙÖ ËÑÒØ׺

È

Ñ

× Õ

Ð  ½

The feature segments have the average amplitude and the 

Ô Ô

Ó×´× Õ µ 

È È

Ð (4)

Ñ Ñ

¾ ¾ Õ

frequency distribution as the element of the feature values. ×

 ½ Ð  ½ 

616

Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply. The cosine measure gives the result of comparing the im- a powerful impression based on heavy metal. portance of the feature segments in a piece of music with the Secondly, given the query, the music retrieval system cal- importance of the feature segments in other pieces of music. culates frequencies of the feature segments in the query us- In this way, we can calculate retrieval status values based on ing the feature segment database. Thirdly, the system cal- the distribution of feature segments in the music. culates the similarities between the frequencies in the query and the each retrieval target music piece. Finally, the system 4. Experimental Evaluation returns the similarities as a retrieval result. Here, we need to define answers for the query in order to evaluate our pro-

posed method. Therefore, six people collaborated to create

º ÜÔ ÖÑÒØÐ ËØÙÔ $º½ an answer set for the query. We dealt with a benchmark to judge whether the distribution of the melodies in the mu- In this section, we describe the experiment and evalua- sic was similar to the distribution of the melodies in other

tion of our proposed method. We dealt with ½¼¼ retrieval music in order to determine whether the query was similar target music pieces consisting of various types of music to the retrieval target music or not. The number of relevant

such as classical music, heavy metal music, and pop mu- ½¼¼ music pieces was ½¼ out of retrieval target music pieces. sic etc.. These music pieces are music files ripped from In this experiment, we got the similarity results for each CD collections. The music format was monaural WAVE. threshold. From the similarity results and the answer set,

The sampling rate was ½¼¼ Hz, and the quantization bit we calculated recalls and precisions for the each similar-

rate was ½ bits. We extracted the feature segments from ity result, and we calculated 11-point average precisions the retrieval target music set using the extraction method for the each recall and precision. The evaluation with 11- that is explained in Section 3.2. When we extract the fea- point average precisions is usually used in the text retrieval ture segments from the retrieval target music set, we need method[1]. Then, we compared the 11-point average pre-

to set the threshold  . Therefore, we decided three thresh- cisions in our method with those in a baseline method in olds based on preliminary experiment. In the preliminary order to evaluate our proposed method. We dealt with a experiment, we calculate a correlation between the changes music retrieval method that does not consider frequencies in the threshold with the changes in the number of frames of each feature segment as the baseline method. In the base- extracted from the music. We compared each frame in the line method, we dealt with a max similarity value between

music piece that had ¾¼ frames with a random frame in the feature segment in the query and the retrieval target mu- the other music piece at three times. Therefore, the number sic just as retrieval status values between the query and the

of combinations of frames was ¿½. Then, we calculated retrieval target music.

the similarity for each combination of frames. When  is

From this comparison, we confirmed that our method

 ¾¾ ¼ , frames were extracted from the music, and the had better accuracy than the baseline method in retrieving

ratio for the total number of combinations is %.Fromthis music based on the distribution of melodies in the music.

relation, we determine that the changes in the threshold are

¼ ¼¼ ¡¡¡ ¼ ¼

 . The ratio is over % until

$º¾º ÜÔ ÖÑÒØÐ Ê×ÙÐØ×



the threshold is ¼ , and the ratio gradually decreases.

 However, when the threshold is over ¼ , the ratio de-

creases sharply. Based on this result, we deal with three Figure 3 plots a precision recall curve based on our pro-



thresholds. The first is ¼ , which is the minimum value posed method and the baseline method. The 11-point av-

¾ ¼¾ ¼¾

through the changes in the threshold. The second threshold erage precisions at each curve are ¼ , , ,and

¼½½

 

is ¼ , which shows a ratio of %. The third threshold . From these values, the highest 11-point average pre-

¼¼

 ¿ is ¼ , which shows a ratio of %. Based on the prelim- cision of our proposed method is higher than the 11- inary experiment, we extracted the feature segments from point average precision of the baseline method, demonstrat- the retrieval target music set in the each threshold. Then, ing that our proposed system is more accurate than the base- we inserted the feature segments into the feature segment line method. database. Moreover, we calculated the frequencies of the Our retrieval result includes the song “Kurenai” also by each feature segment in each retrieval target music piece, -JAPAN, which ranked high for both calm impressions and we inserted the frequencies into the retrieval target mu- based on classical instruments and powerful impressions sic database. based on heavy metal. Our retrieval result also includes the When users use the music retrieval system, the users song “Karn Evil 9” by the group Emerson, Lake & Palmer, firstly give a WAVE format data as a query to the music which also ranked high for both calm impressions based retrieval system. Here, as the query, we dealt with the song on classical instruments and powerful impressions based “Silent Jealousy” by the group X-JAPAN, which conveys on rock music. On the other hand, the result of the base- both a calm impression based on classical instruments and line method not only shows “Kurenai” and “Karn Evil 9” at

617

Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply. 0.6 Furthermore, we classify the feature segments by compar- θ=0.99 θ=0.9975 ing the frames in the feature segments, and the feature seg- 0.5 θ=0.9985 ments can be classified into the wrong cluster from human Baseline Method 0.4 perception. Therefore, we plan to consider how to structure the feature segments in the music. In addition, we also plan n o0.3 si ci to consider proper feature values in our method. re P0.2 Second, we plan to consider the positions where the fea- ture segments appear in the music. We did not calculate the 0.1 appearance positions of the feature segments in the music. 0 However, the impressions that a listener has of the music are 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 different based on the appearance positions of the feature Recall segments in the music. Then, we plan to consider transition relations between feature segments in the music. Figure 3. Precision recall curve based on pro- The third is planning increases of the number of retrieval posed method and current method. target and the number of query pattern. We plan to verify the retrieval results for the various query patterns.

a high rank, but also “La Campanella” and “Riverdance”, Acknowledgment which is mainly played by violins, at a high rank. One rea- son for this result is that the baseline system judges only This work is partially supported by the Ministry melodies based on the piano or violins in the query that are of Education, Culture, Sports, Science and Technology similar to the melodies in the retrieval target music. Hence, (MEXT), Japan, under Grants-in-Aid for Scientific Re- we conclude that our proposed system can retrieve music search #20500104 and #20700101. based on the distribution of melodies in the music, which is in contrast to the baseline system, which retrieves music References based on only part of the melodies in the music. However, our retrieval result also showed pop music at [1] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information a high rank. For this reason, our proposed system could Retrieval. Addison-Wesley Pub., 1999. not clearly distinguish the feature segments in the query [2] M. Clausen and F. Kurth. A Unified Approach to Content- from those in the retrieval target music. When we distin- Based and Fault-Tolerant Music Recognition. IEEE TRANS. guish the feature segments based on classical instruments ON MULTIMEDIA, 6(5):717–731, 2004. from those based on heavy metal, the feature values in the [3] B. Cui, H.V.Jagadish, B. C. Ooi, and K.-L. Tan. Compacting Music Signatures for Efficient Music Retrieval. In Proc. of feature segments of each melody in the music are clearly the 11th EDBT, pages 229–240, 2008. different. On the other hand, when we distinguish several [4] B. Cui, L. Liu, C. Pu, J. Shen, and K.-L. Tan. QueST: Query- feature segments in a melody, gaps in the feature values in ing Music Databases by Acoustic and Textual Features. In the feature segments are not as clear as gaps in the feature Proc. of the ACM MM’07, pages 1055–1064, 2007. values among the melodies. Therefore, we cannot clearly [5] S. Doraisamy and S. Ruger. A Polyphonic Music Retrieval distinguish the feature segments in a melody. System Using N-Grams. In Proc. of ISMIR 2004, pages 204–209, 2004. [6] S. Downie and M. Nelson. Evaluation of a Simple and Effec- 5. Conclusion tive Music Information Retrieval Method. In Proc. of ACM SIGIR 2000, pages 73–80, 2000. In this paper, we presented a music retrieval method [7] C. Francu and C. G. Nevill-Manning. Distance Metrics and based on the distribution of feature segments in music that Indexing Strategies for a Digital Library of Popular Music. is designed to retrieve music based on the distribution of In Proc. of IEEE ICME 2000, pages 889–892, 2000. [8] J. Reed and C.-H. Lee. A Study on Classifica- melodies in the music. Specifically, to calculate the distri- tion Based on Universal Acoustic Models. In Proc. of ISMIR bution of feature segments in the music, we applied TF-IDF, 2006, pages 89–94, 2006. which is used in text retrieval. Using TF-IDF, we can con- [9] E. Ukkonen, K. Lemstrom, and V. Makinen. Geometric Al- sider the importance of feature segments in the music and gorithms for Transposition Invariant Content-Based Music calculate the distribution of feature segments in the music. Retrieval. In Proc. of ISMIR 2003, pages 193–199, 2003. We are planning to do three studies in the future. The [10] Y. Yu, J. S. Downie, and K. Joe. An Evaluation of Fea- first is to consider how to structure the feature segments and ture Extraction for Query-by-Content Audio Information Retrieval. In Proc. of the 9th IEEE ISMW, pages 297–302, proper feature values. We treat feature segments in the mu- 2007. sic as terms are treated in text only by connecting frames.

618

Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply.