Musicbrainz for the World: the Chilean Experience

MUSICBRAINZ FOR THE WORLD: THE CHILEAN EXPERIENCE Gabriel Vigliensoni1, John Ashley Burgoyne2, and Ichiro Fujinaga1 1 CIRMMT 2 ILLC McGill University University of Amsterdam Canada The Netherlands [gabriel,ich]@music.mcgill.ca [email protected] ABSTRACT 1.1 Music Metadata Metadata is structured information that identifies, describes, In this paper we present our research in gathering data from locates, relates, and expresses several, different layers of several semi-structured collections of cultural heritage— data about an information resource [3, 5, 8]. It can be of Chilean music-related websites—and uploading the data three basic types: descriptive, for purposes such as iden- into an open-source music database, where the data can be tification and discovery; structural, for expressing rela- easily searched, discovered, and interlinked. This paper tions among resources; and administrative, for managing also reviews the characteristics of four user-contributed, resources [8]. Descriptive music metadata commonly pro- music metadatabases (MusicBrainz, Discogs, MusicMoz, vides information about recordings, expressing the song and FreeDB), and explains why we chose MusicBrainz as title and length, the artist name, and the release name of a the repository for our data. We also explain how we col- musical object, usually stored in a MP3 ID3 tag. Structural lected data from the five most important sources of Chilean music metadata is used to document relationships within music-related data, and we give details about the context, and among digital musical objects to allow navigation, such design, and results of an experiment for artist name com- as the song’s order in an album or a playlist, linking song parison to verify which of the artists that we have in our names to video clips, artists to their biographies, and so database exist in the MusicBrainz database already. Al- on. Administrative music metadata can include informa- though it represents a single case study, we believe this tion such as software and hardware used to digitize the information will be of great help to other MIR researchers musical resource, system requirements to read the files, use who are trying to design their own studies of world music. restrictions or license agreements that constrain the use of the resource. Thus, music metadata has been called a digital music commodity because it adds value to the musical 1. INTRODUCTION objects, mediating the experience of listeners with music and artists, helping them to browse, explore, sort, collect, Thousands of commercial and non-commercial websites use, and finally enjoy music [4, 7]. offer information and metadata about different aspects of For this project, we collected data about Chilean mu- music and artists, for example, their recordings, biogra- sic from several websites and databases of different scope phies, discographies, video clips, and other resources. How- and size. 1 Centralizing and interlinking these resources ever, the data they provide is often disorganized and not should create synergies in the data because one single query interlinked, and the websites disappear frequently. Hence, will give access to all resources, creating relations that were it is likely that the information collected over years can be not established in their previous locations, thus contribut- lost. It seems sensible to gather all music-related data in a ing to a larger web of data. Hence, finding the best repos- centralized database that can be accessed by several web- itory capable in storing and expressing these data relation- sites or systems. The goal of our project is to take data ships was considered significant and crucial for the success from several semi-organized collections of Chilean-music of our endeavour. cultural heritage—websites and databases which combined represent almost all music that has been composed and per- 1.2 Music-Related Metadatabases formed in Chile—and integrate it into an open-source mu- Although there are many commercial, professionally re- sic database, where information is easy to search and will viewed, online music libraries, we focused only on user- last for a longer time. built, open-data, music metadatabases. By contrast to commercial libraries, open databases provide user-access to en- Permission to make digital or hard copies of all or part of this work for ter data and have similar types of licenses for the use of its personal or classroom use is granted without fee provided that copies are data as the Chilean websites. We present now a list of the not made or distributed for profit or commercial advantage and that copies main services available for free, public use in terms of their bear this notice and the full citation on the first page. 1 Data from all databases available at c 2013 International Society for Music Information Retrieval. http://www.vigliensoni.com/McGill/BDMC/8_DATA Scope Size Information Relations IDs Data Language API Other resources stored quality guidelines assurance FreeDB CD tracklists ∼2M albums Album, title, year, Disc to genre FreeDB Unspecified No No Limited search genre, time offsets propriety automatic engine. Small, fixed Disc-ID method set of genres. Two artists can have the same ID. MusicMoz Music-related factual >136K items Artist’s biographies, No relations Artist MusicMoz’s No No Many links are no data and Internet discographies, allowed name- authorized longer available. links profile, reviews, and based editors Unusual ontology of articles. URI musical categories. Music-related links and resources Discogs Physical >2M artists Artist, release, Small set of Artist Community- English RESTful, 30s previews. Well discographies and >3M albums master, label, image relations name- based only XML- designed solution for music releases based high- based artist name variation. URI quality API Marketplace data available. Could evolve to a commercial site. MusicBrainz Any kind of music >600K artists Core entities (artist, Advanced MB IDs are Community- Guidelines RESTful, MB links data from release >1M releases label, recording, Relationships universally based for 30 XML- all other >11M tracks release, can be unique high- languages based metadatabases. release-group, expressed for identifier quality API MB’s Advanced work), entities, and all core and (UUID) data Relationships can be their relationships external entities mapped into RDF. MB stores acoustic fingerprints. Table 1. Comparison of free, user-built, open-data, music metadatabases: FreeDB, MusicMoz, Discogs, and MusicBrainz. scope, size, stored information resources, relations among as well as for providing efficient ways of managing and these resources, ID system, data quality and assurance, lan- interlinking them. For example, the MusicBrainz univer- guage guidelines, and API. sal unique identifier-based IDs (MBIDs) practically ensure unique identifiers for all its core entities (i.e., artist, la- FreeDB is a license-free database of CDs track-listings bel, recording, release, release-group, and work), and they from user-contributed data, originally based on the Com- have been already widely used for linking music data in the pact Disc Database (CDDB). 2 semantic-web community. Furthermore, the MusicBrainz community decided to develop a set Advanced Relation- MusicMoz is a user-contributed database that stores music- ships to describe relations between its core entities, and to 3 related, factual data and Internet links. publish all the MusicBrainz database, resources and their Discogs is a large, user-populated database of discogra- relationships, as Linked Data. The recently-finished Linked- 6 phies and music releases from physical sources. It pro- Brainz subproject was intended to map all these rela- vides data of high quality that is ensured by a strict in- tionships into RDF [6]. MusicBrainz also stores acous- put form mechanism and a large community of users. 4 tic fingerprints (PUIDs) that can be used to search for a resource even if its descriptive metadata is not available. MusicBrainz is a large, community-based, user-contribu- Users, as well as performance right organizations, could ted metadatabase that stores the three aforementioned benefit from this feature for music rights identification pur- types of metadata for any kind of music release. Its poses. Moreover, MusicBrainz has also a strict, language- high-quality database is managed by an open commu- specific, style guidelines for 30 different languages, which nity of non-professionals that negotiate periodically and explain how data should be formatted in their database. consistently, with strict standards and routines, about This fact is very important to this project—and also to the orientations, developments, style guidelines, and other similar projects coming from countries and regions mostly everything on MusicBrainz [4]. 5 where English is not the primary language—because it al- lows, and forces, the non-English-speaking users to fol- Table 1 shows a comparison of the four metadatabases. low the correct standard for their language. Finally, Mu- It can be seen that, among all these databases, MusicBrainz sicBrainz is the most “open” of the four reviewed meta- is the database with the broadest scope, not being restricted databases because it provides methods to link data from to only physical copies, as in the case of Discogs, or CDs, other websites and databases to MusicBrainz (e.g., by ex- as in FreeDB. Also, MusicBrainz is the only music database tracting CD track lists from FreeDB, linking artists’ images capable of storing not only descriptive, but also structural from Discogs, data from MusicMoz, Allmusic, BBC Music and administrative metadata. This fact is critical for ex- and Wikipedia; or album covers and videos from Amazon pressing the many relationships among musical resources, and YouTube, respectively). 2 http://www.freedb.org/ 3 http://www.musicmoz.org/ 4 http://www.discogs.com/ 5 http://www.musicbrainz.org/ 6 http://wiki.musicbrainz.org/LinkedBrainz By analyzing all the aforementioned characteristics, we Mus (MUS) is a website devoted to new album and con- decided to work with the MusicBrainz metadatabase in or- cert reviews.

Load more