D3.4.1 Music Similarity Report

D3.4 .1 Music Similarity Report Abstract The goal of Work Package 3 is to take the features and meta- data provided by Work Package 2 and provide the technology needed for the intelligent structuring, presentation, and use of large music collections. This deliverable is about audio and web-based similarity measures, novelty detection (which we demonstrate to be a useful tool to combine with similarity), and first outcomes applying mid-level WP2 descriptors from D2.1.1 and preliminary versions of D2.1.2. We present improvements of the similarity measures presented in D3.1.1 and D3.2.1. The outcomes will serve as foundation for D3.5.1, D3.6.2, and the prototypes, in particular for the recommender and organizer. Version 1.0 Date: May 2005 Editor: E. Pampalk Contributors: A.Flexer, E. Pampalk, M. Schedl, J. Bello, and C. Harte Reviewers: G. Widmer and P. Herrera Contents 1 Introduction 3 2 Audio-Based Similarity 6 2.1 Data .................................... 6 2.2 Hidden Markov Models for Spectral Similarity . ... 8 2.2.1 Methods .............................. 10 2.2.2 Results............................... 13 2.2.3 Comparing log-likelihoods directly . 14 2.2.4 GenreClassification . 16 2.2.5 Discussion ............................. 17 2.3 Spectral Similarity Combined with Complementary Information . ..... 18 2.3.1 SpectralSimilarity . 19 2.3.2 FluctuationPatterns . 19 2.3.3 Combination ............................ 22 2.3.4 GenreClassification . 23 2.3.5 Conclusions ............................ 28 2.4 Summary&Recommendations . 28 3 Web-Based Similarity 30 3.1 Web Mining by Co-occurrence Analysis . 31 3.2 ExperimentsandEvaluation . 32 3.2.1 Intra-/Intergroup-Similarities . 32 3.2.2 Classificationwithk-NearestNeighbors . 34 3.3 Conclusions & Recommendations . 39 4 Novelty Detection and Similarity 44 4.1 Data .................................... 44 4.2 Methods .................................. 45 4.2.1 MusicSimilarity .......................... 45 4.2.2 Algorithms for novelty detection . 45 4.3 Results................................... 46 4.4 Discussion ................................. 48 1 CONTENTS 2 5 Chroma-Complexity Similarity 50 5.1 ChromagramCalculation. 50 5.2 ChromagramTuning............................ 51 5.3 ChromagramProcessing .......................... 51 5.4 ChromaComplexity ............................ 52 5.4.1 ChromaVisuTool ......................... 52 5.5 Results................................... 54 5.5.1 Jazz (Dave Brubeck Quartet) . 54 5.5.2 ClassicOrchestra.......................... 54 5.5.3 ClassicPiano............................ 54 5.5.4 Dance ............................... 54 5.5.5 HipHop .............................. 55 5.5.6 Pop................................. 55 5.6 Discussion&Conclusions . 55 6 Conclusions and Future Work 59 1. Introduction Overall goal of Workpackage 3: The goal of workpackage 3 (WP3) is to take the features and meta-data provided by workpackage 2 (WP2) and provide the technology needed for the intelligent structuring, presentation, and use (query processing and retrieval) of large music collections. This general goal can be broken down into two major task groups: the automatic structuring and organisation of large collections of digital music, and intelligent music retrieval in such structured ”music spaces”. Role of Similarity within SIMAC: Similarity measures are a key technology in SIMAC. They are the foundation of the deliverables D3.5.1 (music collection structuring and navigation module) and D3.6.2 (module for retrieval by similarity and semantic descriptors). They enable core functionalities of the organizer and recommender prototypes. Without similarity, functions such as playlist generation, organization and visualization, hierarchical structuring, retrieval, and recommendations can not be implemented. In fact, the importance of similarity goes beyond their role in the prototypes. A similarity measure can be licensed as is, and can easily find its way into online music stores or mobile audio players. Thus, it is highly recommended to continue the development and improvement of similarity measures throughout the SIMAC project beyond D3.4.1. Definition of Similarity in D3.4.1: There are many aspects of similarity (timbre, har- mony, rhythm, etc.), and there are different sources from which these can be computed (audio, web-pages, lyrics, etc.). Most of all, similarity is a perception which depends on the listeners point of view and context. Within SIMAC any important dimension of perceived similarity is useful. However, the main parts of this deliverable define similarity as the concept which pieces within a genre (or subgenre) have in common. As already pointed out in D3.1.1 and D3.2.1 the reason for is that this allows highly efficient (i.e. fast and cheap) evaluations of the similarity measures (since genre labels for artists are readily available). Evaluation Procedures: In this deliverable we primarily use nearest neighbor clas- sifiers (and genre classification) to evaluate the similarity measures. The idea is that pieces within the same genre should be very close to each other. In addition we use inter and intra group distances as described in Section 3.2.1. These are particularly useful in understanding how well each group is modeled by the similarity measure. In 3 1. Introduction 4 Section 2.2.3 we compute the log-likelihood of the self-similarity within a song and use it to evaluate similarity measures. In particular, the first half of each song is compared to the second half. The idea is, that a good similarity measure would recognize these to be highly similar. In Section 4.3 we use receiver operator characteristics (ROC) to measure the tradeoff between sensitivity and specificity. Throughout this deliverable (and in particular in Chapter 5) we use illustrations to demonstrate characteristics of the similarity measures. Specific Context of D3.4.1 in SIMAC: D3.4.1 is built on the code and ideas from WP2 (i.e. D2.1.1 and preliminary versions of D2.1.2). D3.4.1 uses findings of D3.1.1 and most of all of D3.2.1. In particular, a large part of D3.2.1 covers similarity measures (including the implementation of these) used in this deliverable. The recommendations of D3.4.1 are the foundation for D3.5.1, D.3.6.2 and the organizer and recommender prototypes. Relationship to D3.1.1 and D3.2.1: In these previous deliverables of WP3 we presented a literature review of similarity measures, implementations (MA toolbox), and extensive evaluations thereof based on genre classification. In this deliverable we present improvements of these results and recommendations for the implementations of the prototypes. Topics covered in detail in these previous deliverables are only repeated if they are necessary to define the context. Thus, this deliverable is not self-contained but rather an add-on to D3.1.1 and D3.2.1. Outcomes of D3.4.1: There are 5 main outcomes of this deliverable. The following 5 chapters are structured according to these. A. Recommendations for audio-based similarity: We report on findings using HMMs and using combinations of different similarity measures. We show that a combination of different approaches can improve genre classification performance up to 14% in average (measured on four different collections). B. Recommendations for web-based similarity: We report a simpler alternative to the approach presented in D3.2.1 and D.3.1.1 based on co-occurrences and number of pages retrieved by Google. We show that depending on the size of the music collection different approaches are preferable. C. We demonstrate how novelty detection can be combined with similarity measures to improve the performance. We show that using simple techniques genre classification or playlist generation can be improved. 1. Introduction 5 D. We report first results using outcomes from D2.1.1 and D2.1.2 for similarity. In particular we present a general approach to using musical complexity for similarity computations and in particular using chroma patterns. Our preliminary results demonstrate possible applications of mid-level descriptors developed in WP2. E. We give an outlook of topics to pursue in the remainder of the SIMAC project and the necessary next steps. 2. Audio-Based Similarity Advantages and Limitations of Audio-Based Similarity: Audio-based similarity is cheap and fast. Computing the similarity of two pieces can be done within seconds. The similarity can be computed between artists, songs, or even below the song level (e.g. between segments of different pieces). The main limitation is the quality of the computed similarity. For example, the presence and the expressiveness of a singing voice, or instruments such as electric guitars are not modeled appropriately. In general the meaning (or the message) of the piece of music (including for example emotions) will as far as we can foresee remain incomprehensible to the computer. Furthermore, the audio-signal does not contain cultural information. Both history and social context of the piece of music are not accessible. In fact, as we will discuss in the next chapter these are better extracted through analysis of contents in the Internet. This remainder of this chapter is organized as follows: A. Description of the four data sets which we used for evaluation. One of which was used as training set for the ISMIR’05 genre classification contest. B. Results on modeling temporal information for spectral similarity. Our results show that temporal information improves the performance of the similarity within a song. However, this improvement does not appear significant measured in a genre classification task. C. Results on combining different approaches. In particular, we combine the spectral similarity (which we have shown to outperform other approaches in D3.1.1 in various tasks) with information gathered from fluctuation patterns. On average (using the four collections) the improvement is about 14% for genre classification. D. We summarize our findings and make recommendations for the prototypes. 2.1 Data For our experiments we use four music collections with a total of almost 6000 pieces. Details are given in Tables 2.1 and 2.2. For the evaluation (especially to avoid overfitting) 6 2.

Load more