A Music Search Engine Built Upon Audio-Based and Web-Based Similarity Measures
Total Page:16
File Type:pdf, Size:1020Kb
A Music Search Engine Built upon Audio-based and Web-based Similarity Measures Peter Knees1, Tim Pohle1, Markus Schedl1, and Gerhard Widmer1,2 1Department of Computational Perception, Johannes Kepler University Linz, Austria 2Austrian Research Institute for Artificial Intelligence (OFAI) [email protected], [email protected], [email protected], [email protected] ABSTRACT track name, or year) to index the underlying music collec- An approach is presented to automatically build a search tion. Thus, the person issuing the query (i.e. the potential engine for large-scale music collections that can be queried customer) must already have a very precise conception of through natural language. While existing approaches de- the expected result set. The intrinsic problem of these sys- pend on explicit manual annotations and meta-data assigned tems is the limitation to a rather small set of meta-data, to the individual audio pieces, we automatically derive de- whereas the musical, or more general, the cultural context scriptions by making use of methods from Web Retrieval of music pieces is not captured. and Music Information Retrieval. Based on the ID3 tags of However, as digital catalogs rapidly become larger and a collection of mp3 files, we retrieve relevant Web pages via more inconvenient and inefficient to access, the need for Google queries and use the contents of these pages to charac- more sophisticated methods that enable intuitive searching terize the music pieces and represent them by term vectors. inside large music collections increases. For example, in- By incorporating complementary information about acous- stead of just finding tracks that are labeled as rock, it would tic similarity we are able to both reduce the dimensionality be valuable to be able to formulate a query like “rock with of the vector space and improve the performance of retrieval, great riffs” to emphazise the importance of energetic gui- i.e. the quality of the results. Furthermore, the usage of tar phrases. Another query could be “relaxing music” that audio similarity allows us to also characterize audio pieces should return relaxing pieces regardless of which “genre” when there is no associated information found on the Web. they are assigned to. Clearly, music resellers with very large music databases or music information systems could benefit Categories and Subject Descriptors: from such an engine, as it provides access to their catalog in H.3.3 [Information Systems]: Information Storage and the most common and most accepted manner. Retrieval; H.5.5 [Information Systems]: Information In- Recent developments in music information systems re- terfaces and Presentation – Sound and Music Computing spond to the fact that new and possibly unconventional ap- General Terms: Algorithms proaches are necessary to support the user in finding de- sired music. Most of them make use of content-based anal- Keywords: Music search engine, Music Information Re- ysis of the audio files (e.g. [12, 6]) or use collaborative rec- trieval, music similarity, context-based retrieval, cross-media ommendations to point the users to music they might like retrieval (e.g. [9], Last.fm1, or on-line retailers like Amazon). Also, there are techniques that incorporate information from dif- 1. INTRODUCTION ferent sources to build interactive interfaces [15, 21]. Over the past years, using text-based search engines has In this paper, we present first steps toward the challeng- become the “natural” way to find and access all types of ing task of automatically building a search system that is multimedia content. While there exist approaches to auto- capable of finding music that satisfies arbitrary natural lan- matically derive and assign semantic, natural language de- guage queries. For each track in a collection of mp3 files, scriptors for images, videos, and – of course – text, the broad we retrieve a set of relevant Web pages via Google. This field of (popular) music has not drawn that much of atten- enables us to represent music pieces in a traditional term tion. Basically all existing music search systems, e.g. those vector space. Additionally, we make use of a state-of-the- offered by commercial music resellers, make use of manu- art audio similarity measure. Thus, we combine information ally assigned subjective meta-information like genre or style about the context of music with information about the con- (in addition to objective meta-data like artist, album name, tent. With the integration of acoustic similarity, we are able to (a) reduce the dimensionality of the feature space, (b) describe music pieces with no (or little) related information present on the Web, and (c) improve the quality of the re- Permission to make digital or hard copies of all or part of this work for trieval results. The evaluation carried out on a collection personal or classroom use is granted without fee provided that copies are consisting of more than 12,000 music pieces gives strong in- not made or distributed for profit or commercial advantage and that copies dication that a natural language search engine for music bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific collections is in reach. permission and/or a fee. SIGIR’07, July 23–27, 2007, Amsterdam, The Netherlands. 1 Copyright 2007 ACM 978-1-59593-597-7/07/0007 ...$5.00. http://www.last.fm 2. RELATED WORK A system that is not restricted to a pre-defined set of Several methods to retrieve music from large databases meta-data is Last.fm. Last.fm integrates into music player have been proposed. Most of these approaches are based software and keeps track of each user’s listening habits. Based on query-by-example methods (e.g. [17]). Thus, the query on the collected data, similar artists or tracks can be recom- must consist of a piece of information that has the same mended. Additionally, users can assign tags to the tracks representation as the records in the database. For exam- in their collection. These tags provide a valuable source of ple, in Query-by-Humming/Singing (QBHS) systems [11] information on how people perceive and describe music. A the user has to sing or hum a part of the searched piece drawback of the system is that most tags are highly incon- into a microphone. In most cases, these systems operate sistent and noisy. We discuss this issue in more detail in on a symbolic representation of music (i.e. MIDI). Other Section 4 where we use part of the Last.fm data to evaluate query-by-example systems use a small, low quality recorded our own approach. portion of a music piece as input, identify the piece, and re- Beside music information systems that deal solely with turn the associated meta-data (artist, title, etc.). Typically, popular music, there exist a number of search engines that such services are accessible via cellular phone (e.g. from the use specialized (focused) crawlers to find all types of sounds commercial service Shazam2). on the Web. The traced audio files are indexed using con- An even more challenging task is to design systems that textual information extracted from the text surrounding the links to the files. Examples of such search engines are Arooo- enable cross-media retrieval. In our case, systems that al- 6 low queries consisting of arbitrary natural language text, e.g. ga [16] and FindSounds . descriptions of sound, mood, or cultural events, and return Finally, we briefly review approaches that aim to bridge music pieces that are semantically related to this query, are the semantic gap for music by extracting information from of interest. Unfortunately, the number of systems enabling the pure audio signal and assigning some sort of meta-data. its users to perform such queries is very little. The most In [15], music landscapes created from audio similarity infor- elaborate approach so far has been presented by Baumann mation are labeled with music related terms extracted from et al. [5]. Their system is supported by a semantic ontol- the Web. [21] uses similar techniques to arrange a music ogy which integrates information about artist, genre, year, collection around a circle for easy access. In [24], low-level lyrics, and automatically extracted acoustic properties like characteristics of audio signals are mapped to semantic con- loudness, tempo, and instrumentation and defines relations cepts to learn the “meaning” of certain acoustic properties. between these concepts. Beside the correct extraction of the In contrast, in this paper, we directly derive culturally as- features, also the mapping of the query to the concepts in sociated descriptors for audio files by extracting the desired the ontology has to be accomplished. In the end, the sys- information from related Web pages. Additionally, we make tem allows for semantic queries like “something fast from...” use of audio-based similarity to improve this technique. As or “something new from...”. Also phonetic misspellings are a result, we obtain a representation of each music piece in corrected automatically. a term vector space. To search for music pieces, queries are In [8], Celma et al. present the music search engine Search also transformed to this vector space where distances to all Sounds3. The system uses a special crawler that focuses on music pieces can be calculated. a set of manually defined “audio blogs”, which can be ac- cessed via RSS links. In these blogs, the authors explain and 3. TECHNICAL REALIZATION describe music pieces and make them available for download In this section, we describe our technique to build a natu- (whether legally or illegally depends on the blog). Hence, ral language search engine for music.