Text-Based Description of Music for Indexing, Retrieval, and Browsing
Total Page:16
File Type:pdf, Size:1020Kb
JOHANNES KEPLER UNIVERSITAT¨ LINZ JKU Technisch-Naturwissenschaftliche Fakult¨at Text-Based Description of Music for Indexing, Retrieval, and Browsing DISSERTATION zur Erlangung des akademischen Grades Doktor im Doktoratsstudium der Technischen Wissenschaften Eingereicht von: Dipl.-Ing. Peter Knees Angefertigt am: Institut f¨ur Computational Perception Beurteilung: Univ.Prof. Dipl.-Ing. Dr. Gerhard Widmer (Betreuung) Ao.Univ.Prof. Dipl.-Ing. Dr. Andreas Rauber Linz, November 2010 ii Eidesstattliche Erkl¨arung Ich erkl¨are an Eides statt, dass ich die vorliegende Dissertation selbstst¨andig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und Hilfsmittel nicht benutzt bzw. die w¨ortlich oder sinngem¨aß entnommenen Stellen als solche kenntlich gemacht habe. iii iv Kurzfassung Ziel der vorliegenden Dissertation ist die Entwicklung automatischer Methoden zur Extraktion von Deskriptoren aus dem Web, die mit Musikst¨ucken assoziiert wer- den k¨onnen. Die so gewonnenen Musikdeskriptoren erlauben die Indizierung um- fassender Musiksammlungen mithilfe vielf¨altiger Bezeichnungen und erm¨oglichen es, Musikst¨ucke auffindbar zu machen und Sammlungen zu explorieren. Die vorgestell- ten Techniken bedienen sich g¨angiger Web-Suchmaschinen um Texte zu finden, die in Beziehung zu den St¨ucken stehen. Aus diesen Texten werden Deskriptoren gewon- nen, die zum Einsatz kommen k¨onnen zur Beschriftung, um die Orientierung innerhalb von Musikinterfaces zu ver- • einfachen (speziell in einem ebenfalls vorgestellten dreidimensionalen Musik- interface), als Indizierungsschlagworte, die in Folge als Features in Retrieval-Systemen f¨ur • Musik dienen, die Abfragen bestehend aus beliebigem, beschreibendem Text verarbeiten k¨onnen, oder als Features in adaptiven Retrieval-Systemen, die versuchen, zielgerichtete • Vorschl¨age basierend auf dem Suchverhalten des Benutzers zu machen. Im Rahmen dieser Dissertation werden verschiedene Strategien zur Extraktion von Deskriptoren, sowie zur Indizierung und zum Retrieval von Musikst¨ucken erar- beitet und evaluiert. Weiters wird das Potenzial Web-basierter Retrieval-Ans¨atze, die um signalbasierte Ahnlichkeitsinformation¨ erweitert werden, sowie das Poten- zial Audio¨ahnlichkeitsbasierter Suchans¨atze, die mit Web-Daten erweitert werden, untersucht und anhand von Prototypanwendungen demonstriert. v vi Abstract The aim of this PhD thesis is to develop automatic methods that extract textual descriptions from the Web that can be associated with music pieces. Deriving de- scriptors for music permits to index large repositories with a diverse set of labels and allows for retrieving pieces and browsing collections. The techniques presented make use of common Web search engines to find related text content on the Web. From this content, descriptors are extracted that may serve as labels that facilitate orientation within browsing interfaces to music collections, • especially in a three-dimensional browsing interface presented, indexing terms, used as features in music retrieval systems that can be queried • using descriptive free-form text as input, and features in adaptive retrieval systems that aim at providing more user-targeted • recommendations based on the user’s searching behaviour for exploration of music collections. In the context of this thesis, different extraction, indexing, and retrieval strate- gies are elaborated and evaluated. Furthermore, the potential of complementing Web-based retrieval with acoustic similarity extracted from the audio signal, as well as complementing audio-similarity-based browsing approaches with Web-based de- scriptors is investigated and demonstrated in prototype applications. vii viii Acknowledgments First and foremost thanks for everything are due to my family. Without them, this thesis would not be. At least equally responsible for making this thesis possible is my supervisor Gerhard Widmer. I would like to thank him for giving me the opportunity to join his research group in Linz, for giving me space to work on this and other interesting topics, and for supporting me where ever possible and whenever necessary. Further- more, I want to thank him for enabling me to visit many interesting conferences in amazing places and to meet the most extraordinary people. Also, I want to thank my second reader, Andreas Rauber, for his inspirational ideas and for providing support over the years. I would like to thank all my co-authors and colleagues from the Department of Computational Perception and the OFAI in Vienna I got to work and be around with in the last years, namely Elias Pampalk, Tim Pohle, Markus Schedl, Dominik Schnitzer, Klaus Seyerlehner, Andreas Arzt, Arthur Flexer, Sebastian Flossmann, Harald Frostel, Martin Gasser, Werner Goebl, Maarten Grachten, Bernhard Nie- dermayer, Josef Scharinger, Claudia Kindermann, and, most recently, Reinhard Sonnleitner and Sebastian B¨ock. I also want to acknowledge all the students con- tributing to the projects I was involved in. There are numerous people I have met during conferences that I had insightful discussions with and that have influenced my research. I can not and do not want to list them all here, but I would particularly like to mention Oscar` Celma, Gijs Geleijnse, and Noam Koenigstein for the good cooperation and for promoting the field of Web-MIR. For all the others, if you think we had a good time, please feel acknowledged, as I had a good time too. Special credits for additional motivation to finish this thesis — unless not mentioned before — are due to Simon Dixon and KatieAnna Wolf. This research was supported by the Austrian Fonds zur F¨orderung der Wissen- schaftlichen Forschung (FWF) under project numbers L112-N04 (“Operational Mod- els of Music Similarity for Music Information Retrieval”) and L511-N15 (“Music Retrieval Beyond Simple Audio Similarity”). ix x Contents 1 Introduction 1 1.1 Contributions............................... 2 1.2 Why Automatic Extraction from the Web? . 4 1.3 OrganisationofthisThesis . 5 2 Related Work 7 2.1 MusicSimilarityandIndexing. 7 2.1.1 Content-BasedSimilarity . 8 2.1.2 Context-Based Indexing and Similarity . 9 2.1.2.1 Manual Annotations . 11 2.1.2.2 Collaborative Tags . 12 2.1.2.3 Web-TextTermProfiles . 13 2.1.2.4 Web-based Music Information Extraction . 15 2.1.2.5 SongLyrics....................... 16 2.1.2.6 Web-based Co-Occurrences and Page Counts . 17 2.1.2.7 Playlists ........................ 19 2.1.2.8 Peer-to-Peer Network Co-Occurrences . 19 2.1.2.9 Collaborative Filtering-based Approaches . 20 2.1.2.10 Other Sources of Context-Based Information . 21 2.1.3 HybridApproaches. 21 2.2 Web Information Retrieval and Search Engines . 22 2.2.1 WebDataMining ........................ 23 2.2.2 MultimediaIndexing. 25 2.3 MusicSearchEngines .......................... 26 2.3.1 Symbolic-Representation-Based Retrieval . 26 2.3.2 Audio-BasedRetrieval . 27 2.3.3 Text-BasedRetrieval. 27 2.4 UserInterfacestoMusicCollections . 29 2.4.1 Music Information Systems . 29 2.4.2 Map-basedInterfaces. 29 2.4.3 OtherIntelligentInterfaces . 31 3 Methodological Background 33 3.1 WebMiningandDocumentIndexing . 33 3.1.1 WebDataRetrieval ....................... 33 3.1.2 InvertedIndex .......................... 35 3.1.3 Term Weighting and Document Ranking . 37 xi Contents 3.1.4 TextCategorisation . 38 3.1.5 TextAlignment.......................... 40 3.2 AudioSimilarity ............................. 43 3.2.1 MFCC-based Spectral Features . 44 3.2.2 Post-Processing the Distance Matrix . 45 3.2.3 FluctuationPatterns. 45 3.2.4 CombinedSimilarity . 46 3.3 MusicMaps................................ 47 3.3.1 Self-OrganizingMap . 47 3.3.2 IslandsofMusic ......................... 48 4 Automatically Deriving Music Labels for Browsing 51 4.1 Motivation ................................ 51 4.2 MusicDescriptionMap ......................... 54 4.2.1 ArtistTermProfileRetrieval . 54 4.2.2 MapLabelling .......................... 55 4.2.3 ConnectingSimilarClusters. 56 4.2.4 Example.............................. 57 4.2.5 FurtherEnhancements. 57 4.3 Augmented Exploration in Virtual Music Landscapes . ..... 60 4.3.1 Prototype: The nepTune Interface . 61 4.3.2 Incorporating Web-Based Labels and Images for Orientation 64 4.3.3 ExtensionsandFutureDirections. 66 4.4 Evaluation................................. 68 4.4.1 User Evaluation of the Music Description Map . 69 4.4.2 Qualitative Evaluation of the nepTune Interface . 70 4.5 RecapitulationandDiscussion. 70 5 Larger Scale Indexing and Retrieval of Music via Web-Data 73 5.1 Motivation ................................ 73 5.2 Web-DataAcquisition .......................... 76 5.3 IndexingandRetrieval. 77 5.3.1 Pseudo Document Vector Space Model Approach . 77 5.3.2 Web-Document-Centred Approach . 78 5.4 Selection of Pages Using Filtering Techniques . ..... 80 5.4.1 UnsupervisedFiltering. 80 5.4.1.1 Alignment-Based Noise Removal . 80 5.4.1.2 Too-Many-Artists Filtering . 81 5.4.2 SupervisedFiltering . 81 5.4.2.1 Query-Based Page Blacklisting . 82 5.4.2.2 Query-Trained Page Classification . 83 5.5 Post-Hoc Re-Ranking Based on Audio Similarity . 84 5.6 Evaluation................................. 85 5.6.1 TestCollections.......................... 86 5.6.1.1 c35kCollection. 86 5.6.1.2 CAL500Set ...................... 87 5.6.2 EvaluationMeasures . 87 xii Contents 5.6.3 WebSearchEngineImpact . 89 5.6.4 PageFilteringImpact . 91 5.6.5 Audio-Based Re-Ranking Impact . 93 5.6.6 Impact of Audio-Based Combination on Long-Tail Retrieval . 94 5.7 Prototype: The Gedoodle Music Search Engine . 97 5.8