Content-Based Music Information Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
I N V I T E D P A P E R Content-Based Music Information Retrieval: Current Directions and Future Challenges Current retrieval systems can handle tens-of-thousands of music tracks but new systems need to aim at huge online music collections that contain tens-of-millions of tracks. By Michael A. Casey, Member IEEE, Remco Veltkamp, Masataka Goto, Marc Leman, Christophe Rhodes, and Malcolm Slaney, Senior Member IEEE ABSTRACT | The steep rise in music downloading over CD sales fingerprinting, content-based music retrieval) and other cues has created a major shift in the music industry away from (e.g., music notation and symbolic representation), and physical media formats and towards online products and identifies some of the major challenges for the coming years. services. Music is one of the most popular types of online information and there are now hundreds of music streaming KEYWORDS | Audio signal processing; content-based music and download services operating on the World-Wide Web. information retrieval; symbolic processing; user interfaces Some of the music collections available are approaching the scale of ten million tracks and this has posed a major challenge for searching, retrieving, and organizing music content. I. INTRODUCTION Research efforts in music information retrieval have involved Music is now so readily accessible in digital form that experts from music perception, cognition, musicology, engi- personal collections can easily exceed the practical limits neering, and computer science engaged in truly interdisciplin- on the time we have to listen to them: ten thousand ary activity that has resulted in many proposed algorithmic and music tracks on a personal music device have a total methodological solutions to music search using content-based duration of approximately 30 days of continuous audio. methods. This paper outlines the problems of content-based Distribution of new music recordings has become easier, music information retrieval and explores the state-of-the-art prompting a huge increase in the amount of new music methods using audio cues (e.g., query by humming, audio that is available. In 2005, there was a three-fold growth in legal music downloads and mobile phone ring tones, worth $1.1 billion worldwide, offsetting the global decline Manuscript received September 3, 2007; revised December 5, 2007. This work in CD sales; and in 2007, music downloads in the U.K. was supported in part by the U.K. Engineering and Physical Sciences Research reached new highs [1]–[3]. Council under Grants EPSRC EP/E02274X/1 and EPSRC GR/S84750/01. M. A. Casey and C. Rhodes are with the Department of Computing, Goldsmiths Traditional ways of listening to music, and methods for College, University of London, SE14 6NW London, U.K. (e-mail: [email protected]; discovering music, such as radio broadcasts and record [email protected]). R. Veltkamp is with the Department of Information and Computing Sciences, Utrecht stores, are being replaced by personalized ways to hear and University, 3508TB Utrecht, The Netherlands (e-mail: [email protected]). learn about music. For example, the advent of social net- M. Goto is with the National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki 305-8568, Japan (e-mail: [email protected]). working Web sites, such as those reported in [4] and [5], M. Leman is with the Department of Art, Music, and Theater Sciences, has prompted a rapid uptake of new channels of music Ghent University, 9000 Ghent, Belgium (e-mail: [email protected]). M. Slaney is with Yahoo! Research Inc., Santa Clara, CA 95054 USA discovery among online communities, changing the nature (e-mail: [email protected]). of music dissemination and forcing the major record labels Digital Object Identifier: 10.1109/JPROC.2008.916370 to rethink their strategies. 668 Proceedings of the IEEE | Vol. 96, No. 4, April 2008 0018-9219/$25.00 Ó2008 IEEE Casey et al.: Content-Based Music Information Retrieval: Current Directions and Future Challenges It is not only music consumers who have expectations approximately 50 person-years to enter the metadata for of searchable music collections; along with the rise of one million tracks. consumer activity in digital music, there are new opportu- Social media web services address the limitations of nities for research into using large music collections for centralized metadata by opening the task of describing discovering trends and patterns in music. Systems for content to public communities of users and leveraging the trend spotting in online music sales are in commercial power of groups to exchange information about content. development [6], as are systems to support musicological This is the hallmark of Web 2.0. With millions of users of research into music evolution over the corpus of Western portals such as MySpace, Flickr, and YouTube, group be- classical music scores and available classical recordings [7], havior means that users naturally gravitate towards those [8]. Musicology research aims to answer questions such as: parts of the portalVcategories or groupsVwith which they which musical works and performances have been share an affinity; so they are likely to find items of interest historically the most influential? indexed by users with similar tastes. However, the activity Strategies for enabling access to music collections, both on social networking portals is not uniform across the new and historical, need to be developed in order to keep interests of society and culture at large, being predomi- up with expectations of search and browse functionality. nantly occupied by technologically sophisticated users, These strategies are collectively called music information therefore social media is essentially a type of editorial retrieval (MIR) and have been the subject of intensive metadata process. research by an ever-increasing community of academic and In addition to metadata-based systems, information industrial research laboratories, archives, and libraries. about the content of music can be used to help users find There are three main audiences that are identified as the music. Content-based music description identifies what beneficiaries of MIR: industry bodies engaged in record- the user is seeking even when he does not know speci- ing, aggregating and disseminating music; end users who fically what he is looking for. For example, the Shazam want to find music and use it in a personalized way; and system (shazam.com), described in [10], can identify a professionals: music performers, teachers, musicologists, particular recording from a sample taken on a mobile copyright lawyers, and music producers. phone in a dance club or crowded bar and deliver the At present, the most common method of accessing artist, album, and track title along with nearby locations to music is through textual metadata. Metadata can be rich purchase the recording or a link for direct online pur- and expressive so there are many scenarios where this chasing and downloading. Users with a melody but no approach is sufficient. Most music download services cur- other information can turn to the online music service rently use metadata-only approaches and have reached a Nayio (nayio.com) which allows one to sing a query and degree of success with them. However, when catalogues attempts to identify the work. become very large (greater than a hundred thousand In the recording industry, companies have used sys- tracks) it is extremely difficult to maintain consistent tems based on symbolic information about musical con- expressive metadata descriptions because many people tent, such as melody, chords, rhythm, and lyrics, to analyze created the descriptions and variation in concept encod- the potential impact of a work in the marketplace. Services ings impacts search performance. Furthermore, the such as Polyphonic HMI’s Hit Song Science and Platinum descriptions represent opinions, so editorial supervision Blue Music Intelligence use such symbolic information with of the metadata is paramount [9]. techniques from Artificial Intelligence to make consulta- An example of a commercial metadata-driven music tive recommendations about new releases. system is pandora.com where the user is presented with the Because there are so many different aspects to music instruction Btype in the name of your favorite artist or song information retrieval and different uses for it, we cannot and we’ll create a radio station featuring that music and address all of them here. This paper addresses some of the more like it.[ The system uses metadata to estimate artist recent developments in content-based analysis and re- similarity and track similarity; then, it retrieves tracks that trieval of music, paying particular attention to the methods the user might want to hear in a personalized radio station. by which important information about music signals and Whereas the query was simple to pose, finding the answer symbols can be automatically extracted and processed for is costly. The system works using detailed human-entered use with music information retrieval systems. We consider track-level metadata enumerating musical-cultural prop- both audio recordings and musical scores, as it is beneficial erties for each of several hundred thousand tracks. It is to look at both, when they are available, to find clues about estimated that it takes about 20–30 minutes per track of what a user is looking for. one expert’s time to enter the metadata.1 The cost is The structure of this paper is as follows: Section II therefore enormous in the time taken to prepare a database introduces the types of tasks, methods, and approaches to to contain all the information necessary to perform evaluation of content-based MIR systems; Section III similarity-based search. In this case, it would take presents methods for high-level audio music analysis; Section IV discusses audio similarity-based retrieval; 1http://www.pandora.com/corporate. symbolic music analysis and retrieval are presented in Vol.