A Music Retrieval Method Based on Distribution of Feature Segments
Total Page:16
File Type:pdf, Size:1020Kb
Tenth IEEE International Symposium on Multimedia A Music Retrieval Method based on Distribution of Feature Segments Þ Þ Kazuhisa OnoÝ , Yu Suzuki , Kyoji Kawagoe Ý Graduate School of Science and Engineering, Ritsumeikan University Kusatsu, Shiga, 525-8577, Japan, [email protected] Þ College of Information Science and Technology, Ritsumeikan University Kusatsu, Shiga, 525-8577, Japan, yusuzuki, kawagoe @is.ritsumei.ac.jp Abstract the query music piece and the retrieval target music piece. When two music pieces have several similar features, these In this paper, we propose a music retrieval method based music pieces are treated as similar types of music, and the on the distributions of features in the music. In common importance of the other features of the music is ignored. music retrieval methods, if several features are similar be- However, the problem with this approach is that, when the tween the query and the retrieval target, the retrieval sys- other features of the music are quite different, these music tems return that the query is similar to the retrieval target. pieces should be treated as different types of music. Sim- However, a problem is that several features in the music are ilarly, when the other features are not very different, these ignored. If the other features in the query and the retrieval pieces should be treated as similar types of music. target are quite different, the query and the retrieval tar- To solve this problem, the music retrieval system should get should be treated as different types of music. Therefore, avoid ignoring the other features. Therefore, in this paper, we calculate the importance of each feature in the music. we calculate the importance of the each feature for all fea- Then, we compare the importance of features between the tures in a retrieval target music set. The importance of the query and the retrieval target, and we can retrieve the mu- features means how the features contribute to represent the sic without ignoring the importance of several features. In music. We can assume that the more the feature appears in our experimental evaluation, we can confirm that our pro- the music, the more the feature is important in the music. posed system has better accuracy than the baseline method. This idea is quite similar to the idea in textual infor- mation retrieval techniques. In the textual information re- trieval techniques, the text retrieval system also should not ignore several terms to retrieve texts. To calculate the im- 1. Introduction portance of the terms, the text retrieval system deals with vector retrieval model and Term Frequency/Inverse Docu- For some time now, people have been able to hear a num- ment Frequency (TF-IDF) algorithm[1]. When the text re- ber of types of music in passive ways, for example, as back- trieval system deals with TF-IDF algorithm, the system cal- ground music in shops, on TV programs, and in movies. culates how often the each term appears in the text. Then, From these music pieces, the listener can become interested the system can calculate the importance of the each term in in the music, and can obtain information on it such as the a retrieval target text set. song title and the artist’s name. Moreover, the listener may Based on this algorithm, we deal with the vector retrieval want to listen to other songs that are similar to the music model and TF-IDF with a music information retrieval sys- they have heard. When a listener retrieves music, the lis- tem, and we calculate the importance of the features. In our tener usually uses music information such as the song title approach, we divide the music into meaningful segments, or music data as a query[4]. Nevertheless, the impression and deal with TF-IDF to calculate the frequencies of the that the listener has of the music may be vague and uncer- segments in the music. We can calculate the importance tain, and the listener cannot generate a specific query from of each segment. Then, we compare the importance of seg- it. Therefore, the listener needs a music retrieval system ments between the query music piece and the retrieval target that uses music data as the query. music piece, and we can retrieve the music without ignor- In common content-based music retrieval algorithms[2, ing the importance of several segments. Using our proposed 9], the music retrieval system extracts acoustic features method, the system can calculate intuitively similar music from the music piece, and compares the features between from retrieval target music. 978-0-7695-3454-1/08 $25.00 © 2008 IEEE 613 DOI 10.1109/ISM.2008.93 Authorized licensed use limited to: CHAOYANG UNIVERSITY OF TECHNOLOGY. Downloaded on April 28, 2009 at 07:15 from IEEE Xplore. Restrictions apply. 2. Related Work this problem, common text retrieval systems deal with TF- IDF. In TF-IDF, the frequencies of each term in the texts are Several methods of content-based music retrieval exist. considered. Using the frequencies of each term, the com- Some methods are music retrieval systems that divide the mon text retrieval systems can consider to what extent each music data into segments, and generate music signatures term is important in the texts, and calculate the importance from the segments[3, 7]. There are also methods that gener- of each term. In a similar way, we deal with TF-IDF to cal- ate music signatures from pitch sequences in the music data. culate the importance of each feature segment in the music. In these methods, the retrieval techniques mainly deal with Using TF-IDF, we can consider the importance of the fea- pitch information, and do not attach importance to other in- ture segments, and prevent the similarity of several feature segments from being ignored. formation. There are also methods that utilize Æ -gram for music retrieval. Doraisamy[5] extracts monophonic pitch Let us consider an example of a matching failure in a sequences from polyphonic music and constructs musical common music retrieval system. We give ÑÙ× as a words using the sequences. Then, their retrieval system query to a music retrieval system, and ÑÙ× is included Å Ù× in the retrieval target music. Å Ù× and have utilizes Æ -gram to index the musical words. The similar- ity with our method is that it considers musical structures similar feature segment, and the feature segment is frequent. the way text structures are considered. However, their re- However, the other feature segments in these music pieces trieval system treats the musical words as having a fixed are different, and the distributions of these feature segments are not similar. In retrieving music, the common music re- length because of Æ -gram, and their system does not con- sider the length of the feature segments. Downie[6] also trieval system can judge that these music pieces are similar, because it ignores several feature segments. As a result, the deals with Æ -gram to divide the music into segments with a fixed length, and then applies TF-IDF with the segments. system does not consider whether these distributions of fea- The similarity with our method is in utilizing a text re- ture segments are similar or not. Moreover, the system can- trieval method as a music retrieval method. However, in not judge whether these distributions of melodies are simi- their method, each segment is the same length. On the larornot. other hand, there is a method that divides music pieces into On the contrary, we show another example using our pro- frames, and extracts features by merging the frames from posed method. We deal with the same retrieval case as that the music pieces[10]. This is similar to our method in that described in the previous paragraph. We calculate a distri- it deals with similarity between the frames. However, their bution of the feature segments for each music piece. We system does not classify the features. do not consider only the importance of the frequent feature Another study[8] considers the structures of music to be segment, but also the importance of the other feature seg- structures of text based on phonetics. Their system divides ments. Then, we can determine that the frequencies of the ÑÙ× the music into flexible segments, and treats each segment feature segments in ÑÙ× and in are differ- as a term in the text. This is similar to our method in that ent. Consequently, we can judge that these distributions of it considers the flexible segments to be feature segments. melodies are not similar. However, their system deals with all segments that are ex- º ÇÚÖÚÛ Ó ÇÙÖ ÔÔÖÓ tracted from the music as terms, and describes the music by ¿º½ sequentially connecting all the segments. Figure 1 shows an overview of our music retrieval sys- 3. Music Retrieval based on Distribution of tem. Our system compares frequencies of the feature seg- ments in a query with frequencies of the feature segments Feature Segments in the retrieval target music by utilizing a vector retrieval model. In this comparison, our system retrieves music Features that the music represents are not constant in based on the distribution of feature segments in the music. the music, but these features change by playing positions. To retrieve music, we extract frequencies of the feature Then, we divide the music into meaningful segments. In segments from each retrieval target music piece by pre- this paper, we define the meaningful segments as feature processing, and we construct a feature segment database segments. These feature segments characterize the music. and a retrieval target music database.