Interlinking Music-Related Data on The
Total Page:16
File Type:pdf, Size:1020Kb
The Many Faces of Semantics information, and instead allows arbitrary mix- ing and reuse of information by applications. For example, an ethnomusicological archive Interlinking might benefit from being linked to a geograph- ical data set such as GeoNames (see http:// geonames.org). In this way, an archive could be tightly focused on its primary topic and Music-Related leave the burden of ancillary descriptions to other focused data sets. In the same web of data, we could publish items corresponding to the potential output Data on the Web of a music-analysis algorithm. Such results could then be reused for further research. In this way, a research group publishing a new al- Yves Raimond gorithm could leave the burden of computing BBC Audio & Music Interactive its supporting data to other algorithms pub- lished by other groups. In this article, we de- Christopher Sutton scribe our efforts toward building such a web Intrasonics of data for music-related information. Mark Sandler Toward a web of data Queen Mary, University of London The need to make currently published infor- mation on multimedia resources available in a nformation management is an impor- common, structured, interlinked format is a This article describes tant part of multimedia, covering the topic frequently discussed in this publication. how Semantic Web administration of public and personal Tim Berners-Lee’s vision of the Semantic technologies can be collections, the construction of large Web,1 and the vast array of technologies al- used to interlink editorialI databases, and the storage of analysis ready established in pursuit of it, provide the musical data sources results. Applications for each of these aspects functionality required to begin building such that have traditionally of multimedia management have emerged, a web of data. This section provides a brief over- been isolated and with notable examples being Greenstone (see view of the technologies currently being used. difficult to integrate. http://www.greenstone.org) for digital libraries, iTunes for personal media-collection manage- Identifiers and descriptions ment, MusicBrainz (see http://musicbrainz.org) The W3C’s Resource Description Framework, for classification data, and traditional relational or RDF, (see http://www.w3.org/RDF) allows databases for managing analysis results. the description of resources by expressing state- However, despite the ability of these applica- ments about them in the form of triples: sub- tions to work with different facets of multi- ject, predicate, and object. Each element of media information, they are typically isolated such a triple is specified by a uniform resource from one another. Sharing and reusing data, identifier (URI). A set of triples can be inter- even between instances of the same tool, is preted as a graph of these resources, with arcs difficult and often involves manual effort for corresponding to the relationships between one-time data migration. Common data for- them. mats reduce the need for such efforts, but re- RDF alone provides a common, structured strict the expressivity of the applications’ data. format for expressing data. Interlinking data The problem becomes more difficult if we sets can be achieved by ensuring URIs are extend our range of interest to data that can unique across data sets, and providing a com- be produced, for example, by audio-analysis mon access mechanism for following references algorithms that provide higher-level represen- between data sets. In practice, HTTP proves tations than the audio signals themselves. ideal for this task. If each resource is identified A promising solution to such problems is to by an HTTP URI (such as http://example.com/ take a data-oriented rather than an application- resource7341), we gain an established system oriented view, with a web of data that doesn’t for ownership of URIs, and we can traverse limit the formats or ontologies used to record data sets using simple HTTP GET operations. 52 1070-986X/09/$25.00 c 2009 IEEE Published by the IEEE Computer Society A user agent that wishes to know more automatically dereferences resources if they are about a resource x dereferences the URI of x semantically marked ‘‘the same as’’ or ‘‘see by performing an HTTP GET operation on the also’’ from a retrieved resource, then provides URI address, and receives RDF data containing the user with a view of all the information triples related to the resource. Doing so allows found. dynamic exploration of linked data sets that Semantic Web user agents can also express can be distributed across geographic locations, complex queries directly to remote data pub- institutions, and owners, much like documents lishers. One example is using SPARQL Protocol on the traditional Web.2 and RDF Query Language, known as SPARQL This idea of using HTTP addresses to provide (see http://www.w3.org/TR/rdf-sparql-query), machine-processable resource descriptions might to submit a query to a publisher’s SPARQL end- seem at odds with the current use of the HTTP point over HTTP. The query language is SQL- namespace. However, various techniques (such like, and allows requests ranging from simple as specification of content type by user agents; describes (‘‘return all information about content negotiation by 303 redirects; and resource x’’) to complex queries about the end- embedded microformats, or RDFa) allow the point’s database (‘‘return the latest album from same HTTP address to provide both machine- each artist who had an album in the US charts processable data and human-readable HTML in the 70s’’). These queries are typically issued data describing a resource. The Semantic Web to a single endpoint, but there is ongoing can therefore be built alongside the current research into efficient mechanisms for stream- Web, and a content publisher with knowledge of lined querying of multiple endpoints (see Semantic Web technologies can ensure the pub- http://darq.sourceforge.net). lisheddataisusefultoahumanreaderviaatradi- Because Semantic Web ontologies (identi- tional Web browser and to a Semantic Web user fying important concepts and relations in a agent performing data integration, reasoning, particular domain) are themselves part of the and deduction on behalf of a human user. web of data, domain-specific user agents might encounter new ontologies. By reasoning Semantic Web user agents on their relationships to known ontologies, The term user agent describes any software these domain-specific user agents can handle acting directly on user requests. A Semantic data expressed using those ontologies. Web user agent is one that accesses resources on the Semantic Web to satisfy a user’s Music-related web of data demands. One example of a Semantic Web There is a vast amount of music-related data user agent would be a simple browser, analo- currently online, some of it provided without gous to a modern Web browser, that allows restrictions (such as through the MusicBrainz the user to navigate data on the Semantic Web database, FreeDB CD listings, the MusicMoz di- just as a Web browser allows a user to navigate rectory, Wikipedia articles, and the Jamendo Web sites. and Magnatune labels) and some of it provided Although we are beginning to see quite so- with copyright restrictions (such as through the phisticated uses of Web resources—such as All Music Guide, Gracenote, Amazon, and scripts that modify Web page content on the iTunes Music Store). Although interlinking be- fly and mash-ups that dynamically combine tween these resources would benefit all con- the functionality of multiple sites—considerable cerned, each data source instead uses its own effort has gone toward working around the fact identifiers, data formats, and APIs. that the traditional Web is designed for docu- Providing unfettered access to the data is a ments rather than data. The Semantic Web, on first step toward flexible integration,4 but the other hand, is designed from the outset to doing so necessitates writing code to combine allow much more complex interaction with data sources (for example, a mash-up that April available data sources, so the term Semantic uses your Last.FM, see http://www.last.fm, lis- Web user agent encompasses more complex tening profile to plot your recently heard artists June 2009 modes of data consumption, including pro- on a map). In addition, new code must be writ- grams that automatically explore and derefer- ten for each desired combination. If this data ence extra resources to satisfy a user’s query. were instead integrated into the Web, such A simple example is the Tabulator,3 which code would be unnecessary, and a generic 53 mo:compose No single ontology could hope to cover mo:MusicArtist mo:Composition the requirements of all music descriptions.6 The Music Ontology, like any ontology that mo:produced_work provides URIs for its terms, is designed to be extended with specialized ontologies. For exam- mo:MusicalWork ple, the ontology itself provides only basic mo:performance_of instrument and genre terms, but can be extended by using the Simple Knowledge Orga- nization System adaptation of the MusicBrainz instrument taxonomy (see http://purl.org/ mo:produced_sound ontology/mo/mit) and the DBpedia7 adaptation mo:Performance mo:Sound of Wikipedia’s genre taxonomy. In addition, mo:recorded_as mo:recorded_in some more complex extensions are available, dealing with chords and symbolic music nota- tion (see http://purl.org/ontology/chord and mo:Recording mo:Signal mo:Record http://purl.org/ontology/symbolic-music). mo:produced_signal mo:published_as Linking open data The open-data movement aims to make data freely available to everyone. We contribute to the Linking Open Data on the Semantic Web Figure 1. Describing a user interface could allow arbitrary reuse and community project,8 which aims to interlink music production new data combinations. such open sources of information using the process using level 2 of technologies described previously.