The Many Faces of Semantics

information, and instead allows arbitrary mix- ing and reuse of information by applications. For example, an ethnomusicological archive Interlinking might benefit from being linked to a geograph- ical data set such as GeoNames (see http:// geonames.org). In this way, an archive could be tightly focused on its primary topic and Music-Related leave the burden of ancillary descriptions to other focused data sets. In the same web of data, we could publish items corresponding to the potential output Data on the Web of a music-analysis algorithm. Such results could then be reused for further research. In this way, a research group publishing a new al- Yves Raimond gorithm could leave the burden of computing BBC Audio & Music Interactive its supporting data to other algorithms pub- lished by other groups. In this article, we de- Christopher Sutton scribe our efforts toward building such a web Intrasonics of data for music-related information.

Mark Sandler Toward a web of data Queen Mary, University of London The need to make currently published infor- mation on multimedia resources available in a nformation management is an impor- common, structured, interlinked format is a This article describes tant part of multimedia, covering the topic frequently discussed in this publication. how Semantic Web administration of public and personal Tim Berners-Lee’s vision of the Semantic technologies can be collections, the construction of large Web,1 and the vast array of technologies al- used to interlink editorialI databases, and the storage of analysis ready established in pursuit of it, provide the musical data sources results. Applications for each of these aspects functionality required to begin building such that have traditionally of multimedia management have emerged, a web of data. This section provides a brief over- been isolated and with notable examples being Greenstone (see view of the technologies currently being used. difficult to integrate. http://www.greenstone.org) for digital libraries, iTunes for personal media-collection manage- Identifiers and descriptions ment, MusicBrainz (see http://musicbrainz.org) The W3C’s Resource Description Framework, for classification data, and traditional relational or RDF, (see http://www.w3.org/RDF) allows databases for managing analysis results. the description of resources by expressing state- However, despite the ability of these applica- ments about them in the form of triples: sub- tions to work with different facets of multi- ject, predicate, and object. Each element of media information, they are typically isolated such a triple is specified by a uniform resource from one another. Sharing and reusing data, identifier (URI). A set of triples can be inter- even between instances of the same tool, is preted as a graph of these resources, with arcs difficult and often involves manual effort for corresponding to the relationships between one-time data migration. Common data for- them. mats reduce the need for such efforts, but re- RDF alone provides a common, structured strict the expressivity of the applications’ data. format for expressing data. Interlinking data The problem becomes more difficult if we sets can be achieved by ensuring URIs are extend our range of interest to data that can unique across data sets, and providing a com- be produced, for example, by audio-analysis mon access mechanism for following references algorithms that provide higher-level represen- between data sets. In practice, HTTP proves tations than the audio signals themselves. ideal for this task. If each resource is identified A promising solution to such problems is to by an HTTP URI (such as http://example.com/ take a data-oriented rather than an application- resource7341), we gain an established system oriented view, with a web of data that doesn’t for ownership of URIs, and we can traverse limit the formats or ontologies used to record data sets using simple HTTP GET operations.

52 1070-986X/09/$25.00 c 2009 IEEE Published by the IEEE Computer Society A user agent that wishes to know more automatically dereferences resources if they are about a resource x dereferences the URI of x semantically marked ‘‘the same as’’ or ‘‘see by performing an HTTP GET operation on the also’’ from a retrieved resource, then provides URI address, and receives RDF data containing the user with a view of all the information triples related to the resource. Doing so allows found. dynamic exploration of linked data sets that Semantic Web user agents can also express can be distributed across geographic locations, complex queries directly to remote data pub- institutions, and owners, much like documents lishers. One example is using SPARQL Protocol on the traditional Web.2 and RDF Query Language, known as SPARQL This idea of using HTTP addresses to provide (see http://www.w3.org/TR/rdf-sparql-query), machine-processable resource descriptions might to submit a query to a publisher’s SPARQL end- seem at odds with the current use of the HTTP point over HTTP. The query language is SQL- namespace. However, various techniques (such like, and allows requests ranging from simple as specification of content type by user agents; describes (‘‘return all information about content negotiation by 303 redirects; and resource x’’) to complex queries about the end- embedded microformats, or RDFa) allow the point’s database (‘‘return the latest album from same HTTP address to provide both machine- each artist who had an album in the US charts processable data and human-readable HTML in the 70s’’). These queries are typically issued data describing a resource. The Semantic Web to a single endpoint, but there is ongoing can therefore be built alongside the current research into efficient mechanisms for stream- Web, and a content publisher with knowledge of lined querying of multiple endpoints (see Semantic Web technologies can ensure the pub- http://darq.sourceforge.net). lisheddataisusefultoahumanreaderviaatradi- Because Semantic Web ontologies (identi- tional Web browser and to a Semantic Web user fying important concepts and relations in a agent performing data integration, reasoning, particular domain) are themselves part of the and deduction on behalf of a human user. web of data, domain-specific user agents might encounter new ontologies. By reasoning Semantic Web user agents on their relationships to known ontologies, The term user agent describes any software these domain-specific user agents can handle acting directly on user requests. A Semantic data expressed using those ontologies. Web user agent is one that accesses resources on the Semantic Web to satisfy a user’s Music-related web of data demands. One example of a Semantic Web There is a vast amount of music-related data user agent would be a simple browser, analo- currently online, some of it provided without gous to a modern Web browser, that allows restrictions (such as through the MusicBrainz the user to navigate data on the Semantic Web database, FreeDB CD listings, the MusicMoz di- just as a Web browser allows a user to navigate rectory, Wikipedia articles, and the Jamendo Web sites. and Magnatune labels) and some of it provided Although we are beginning to see quite so- with copyright restrictions (such as through the phisticated uses of Web resources—such as All Music Guide, Gracenote, Amazon, and scripts that modify Web page content on the iTunes Music Store). Although interlinking be- fly and mash-ups that dynamically combine tween these resources would benefit all con- the functionality of multiple sites—considerable cerned, each data source instead uses its own effort has gone toward working around the fact identifiers, data formats, and APIs. that the traditional Web is designed for docu- Providing unfettered access to the data is a ments rather than data. The Semantic Web, on first step toward flexible integration,4 but the other hand, is designed from the outset to doing so necessitates writing code to combine allow much more complex interaction with data sources (for example, a mash-up that April available data sources, so the term Semantic uses your Last.FM, see http://www.last.fm, lis- Web user agent — encompasses more complex tening profile to plot your recently heard artists 2009 June modes of data consumption, including pro- on a map). In addition, new code must be writ- grams that automatically explore and derefer- ten for each desired combination. If this data ence extra resources to satisfy a user’s query. were instead integrated into the Web, such A simple example is the Tabulator,3 which code would be unnecessary, and a generic

53 mo:compose No single ontology could hope to cover mo:MusicArtist mo:Composition the requirements of all music descriptions.6 The Music Ontology, like any ontology that mo:produced_work provides URIs for its terms, is designed to be extended with specialized ontologies. For exam- mo:MusicalWork ple, the ontology itself provides only basic mo:performance_of instrument and genre terms, but can be extended by using the Simple Knowledge Orga- nization System adaptation of the MusicBrainz instrument taxonomy (see http://purl.org/ mo:produced_sound ontology/mo/mit) and the DBpedia7 adaptation mo:Performance mo:Sound of Wikipedia’s genre taxonomy. In addition, mo:recorded_as mo:recorded_in some more complex extensions are available, dealing with chords and symbolic music nota- tion (see http://purl.org/ontology/chord and

mo:Recording mo:Signal mo:Record http://purl.org/ontology/symbolic-music). mo:produced_signal mo:published_as Linking open data The open-data movement aims to make data freely available to everyone. We contribute to the Linking Open Data on the Semantic Web Figure 1. Describing a user interface could allow arbitrary reuse and community project,8 which aims to interlink music production new data combinations. such open sources of information using the process using level 2 of technologies described previously. For exam- the Music Ontology. Music Ontology overview ple, when providing a description of a particu- Integration and interlinking data sources is lar artist in the DBTune project (see http:// possible even when they don’t share a common dbtune.org), we link the artist resource to a ontology, but is much easier when they do. To- location in the GeoNames data set, which pro- ward this end, we contributed to the design of vides additional knowledge about this location, 5 the Music Ontology, which provides a stan- instead of providing a complete geographic dard base ontology for describing musical infor- description ourselves. An agent crawling the mation. Currently, it can describe a wide range Semantic Web can jump from our knowledge of music information at three levels of detail: base to the GeoNames one by following the link. level 1 describes top-level editorial informa- The Music Ontology helps this process tion, such as the data found in an ID3 tag; for music-related information, providing a framework for publishing heterogeneous level 2 describes the process behind the pro- music-related content in RDF. Moreover, as duction of music, whether in the studio, on mentioned previously, the Music Ontology a home PC, or in concert; and can be extended with other ontologies to cover additional domains. For example, we level 3 describes the structure and compo- can use the Music Ontology alongside ontolo- nent events of the music being played, gies for reviews, social networking, or geo- such as the notes, chords, or samples. graphic information, without having to work around any of the traditional forced boundaries The Music Ontology is interlinked with other between domains. ontologies, most notably Functional Require- Members of the linked-data community ments for Bibliographic Records, the Timeline have published and interlinked several music- and Event ontologies, and the Friend-of-a- related data sets in this web of data. To date, Friend (FOAF) ontology. Figure 1 depicts the these data sets include MusicBrainz, DBpedia, main concepts in level 2, while the ‘‘Music and the Jamendo and Magnatune labels, pub- Ontology Example’’ sidebar contains an exam- lished within our DBTune project. In addition,

IEEE MultiMedia ple track description. several BBC data sets have been published.

54 Music Ontology Example The Music Ontology allows the descrip- Friend of a friend tion of a wide range of musical data. Below is one such description that details an artist composing a musical work that is Jonathan RIT’s Surround performed in a studio and recorded as an Coulton Sound album track. Such a description could be handcrafted by a fan, extracted from an ed- Composition itorial database, or even automatically com- of Code Monkey piled by the music production tools used by the artist in creating his or her work. @prefix : . Level 1 :joco

rdfs:type mo:MusicArtist ; Studio performance by Live performance by foaf:name ; Acappella ‘‘Jonathan Coulton’’ Jonathan Coulton arrangement owl:sameAs . signal (remix) signal (album) :taw3 rdfs:type mo:Record ; Acappella performance rdfs:label ‘‘ III’’ ; Code Monkey mo:track :c_m_album_version; (track) owl:sameAs . Thing a Week III :code monkey rdfs:type mo:MusicalWork ; MusicBrainz dc:title ‘‘Code Monkey’’ . :c_m_album_track Figure A. Example of a track description using the Music Ontology. rdfs:type mo:Track ; dc:title ‘‘Code Monkey’’ ; :c_m_album_perf dc:creator :joco ; rdfs:type mo :Performance ; owl:sameAs mo:performance_of :code_monkey ; ; event:time mo:free download ; music/thingaweek/CodeMonkey.mp3>. mo:recorded_as :c_m_album_signal . :c_m_album_signal Level 2 rdfs:type mo:Signal ; :c_m_comp mo:published_as :c_m_album_track . rdfs:type mo:Composition ; rdfs:label ‘‘Composition of Code Monkey The RDF here corresponds to theshadedregionofFigureA, (Thing a Week 29)’’; which illustrates how we can extend the description to a per- event:hasAgent :joco ; formance by the author at the Temple Bar venue in California, event:hasProduct :code_monkey ; an alternative arrangement and its performance by the musical event:time group RIT’s Surround Sound, along with a fan remix of . examples/cm.n3.

55 automatic linking from a personal-audio collec- tion to the MusicBrainz data set. It uses audio fingerprinting and available ID3 tags to find corresponding identifiers, then outputs RDF statements to make the links between local audio files and the remote manifestation identi- fiers. GNAT can be used to build small applica- tions, such as for plotting an audio collection on a timeline to generate playlists of songs com- posed during a particular decade, or plotting an audio collection on a map to create playlists

User Via according to geographical data. interaction SPARQL We are using our GNARQL tool in the same project to explore some of these application pos- sibilities by loading the data from GNAT, then GNAT GNARQL crawling the Semantic Web for more information about the user’s audio collection. We see GNARQL as a Semantic Web version of the Ex- Identify 10 Load pose´ tool. With the data sets currently available Aggregate and interlinked, GNARQL can answer query additional information requests such as ‘‘create a playlist of performances of works by German composers, written between RDF RDF RDF 1800 and 1850’’ or ‘‘find rock bands from the RDF 1970s that have more than five tribute bands.’’ RDF Interlinked data With only a little further work, we expect RDF RDF on the Web GNARQLwillbeabletohandlemoreuseful Web identifiers queries, such as ‘‘find gigs by artists similar to Audio collection for tracks my most-played artists that fit with my vaca- tion plan.’’ The following is an example of Figure 2. Management Managing music collections such a query for SPARQL: of personal music Personal music collections can be part of such PREFIX geo: on top of GNAT and same distinction as the Functional Requirements PREFIX wgs: tions (all physical objects that bear the same char- PREFIX tags: (a concrete entity, such as my copy of the album SELECT DISTINCT ?an ?lat ?long ?name on CD). A predicate mo:available_as links a WHERE manifestation and a corresponding item. Given {?a a set of audio files in a personal music collection, a mo:MusicArtist; it is possible to keep track of the set of statements foaf:based_near ?place; linking the collection to identifiers denoting the foaf:name ?an; corresponding manifestations available else- foaf:made ?alb. where in the Semantic Web. These statements ?alb tags:taggedWithTag . Web, allowing access to information, such as ?place the birth date of the artists responsible for some geo:name ?name; of the items in the collection and geographical wgs:lat ?lat; locations of the recordings. wgs:long ?long } To facilitate the management of personal The SPARQL end-point at http://dbtune.org/ music collections in this way, we developed jamendo can answer this query for the Jamen- GNAT, which is available as part of the Motools do catalog, selecting artists whose records have project (see http://www.sourceforge.net/projects/ been tagged as ‘‘punk’’ along with their

IEEE MultiMedia motools). GNAT is an implementation of location. As shown in Figure 2, we can build

56 Notation3 We use Notation31 in all our code snippets. Each block correspond to the identifier rdf:type. The keyword <¼ corresponds to a set of statements (subject, predicate, and corresponds to the inverse property of log:implies. object) about one subject. Web identifiers are either be- tween angle brackets or in a prefix:name notation. Univer- Reference sally quantified variables start with ?.Wedenoteasetof 1. T. Berners-Lee et al., ‘‘N3Logic : A Logical Framework statements describing an existentially quantified variable for the World Wide Web. Theory and Practice of Logic with square brackets. Curly brackets denote a literal resource Programming,’’ to be published in Theory and Practice of corresponding to a particular RDF graph. The keywords a Logic Programming (TPLP); http://arxiv.org/abs/0711.1533. user interfaces on top of the SPARQL end- Modeling analysis algorithms point that GNARQL provides. In the this case We consider every analysis algorithm as an we are using the /Facet browsing interface11 to RDF property, associating the results of an anal- interact with the data aggregated by GNARQL ysis with the inputs and parameters used. For and plot the artists in a user’s collection on example, we would describe a deterministic in- a map. strument classifier and a content-based similar- ity measure between two audio signals as Dynamic resources follows (see the ‘‘Notation3’’ sidebar for more So far, we have discussed only static resources, information about the code): but the Semantic Web interface can be used to mt:instrument access dynamic resources that are computed a rdf:Property; only when requested (and possibly then cached for future requests). In the music realm, this a owl:FunctionalProperty; means current research algorithms12 for tempo rdfs:domain mo:Signal; and rhythm estimation, harmonic analysis, par- rdfs:range mo:Instrument; tial transcription, or source separation could be rdfs:label ‘‘instrument’’; exposed as Semantic Web resources. Doing so rdfs:comment ‘‘‘‘‘‘ would be of great benefit to researchers, allowing Tries to determine the musical them to more easily compare or build upon instruments involved in the cre- others’ algorithms. It would also benefit the gen- ation of an audio signal. eral public by letting them use research algo- ‘‘‘‘‘‘; . rithms without requiring each researcher to mt:similarity design end-user applications. In these cases, the Semantic Web would act as a processing web as a rdf:Property; well as a data web, providing a possible answer a owl:FunctionalProperty; to concerns expressed earlier in this publication.13 rdfs:domain rdf:List; Because automated analysis tasks may require rdfs:range xsd:oat; a significant amount of computation, care must rdfs:label ‘‘similarity’’; be taken to avoid wasted effort. We can immedi- rdfs:comment ‘‘‘‘‘‘ ately discard the approach of precomputing all Computes a similarity measure, information on all known resources. In addi- between 0 and 1, given two signals tion, because each known resource might have as an input. a wide range of computable information, com- ‘‘‘‘‘‘; puting all information about a particular re- . source when its identifier is dereferenced is We consider these predicates built-in within unlikely to be an acceptable approach. Our ourparticularRDFstoreimplementation. April approach is to expose algorithms as part of the When queried in the right mode (with all data web in a nonwasteful manner. We use —

input arguments and parameters bound), a 2009 June two examples to illustrate this approach. The computation binds the output arguments to first is a toy example corresponding to a trained the corresponding results. We can use these instrument classifier, while the second is a predicates to describe the analysis steps that content-based similarity service. produced a particular result.

57 Consider the following axiom—such resource the client retrieves is the actual result. an axiom could be accessed as part of the repre- This process leads to the following RDF doc- sentation of a Web resource—which derives ument, accessed via a HTTP GET on ex: a higher-level interpretation of the mt: advert1: similarity predicate: ex:advert1 owl:sameAs mit:Trumpet. {?signal1 mo:similar to ?signal2} ex:signal mo:instrument mit:Trumpet. <¼ { ?signal1 a mo:Signal. ?signal2 a When a user agent requests ex:advert2, the mo:Signal. built-in mt:similarity analysis computes a (?signal1 ?signal2) mt:similarity content-based distance between the ex: ?distance. signal audio signal and every other signal ?distance math:lessThan we know about. Then, this distance is thre- "0.8"^^xsd:oat sholded according to the previous axiom to }. give the following RDF statements:

If we put a Web server on top of our built-in ex:advert2 owl:sameAs ex:signal23. predicates and this derivation rule, a request for ex:signal mo:similar_to ex:signal23. the description of any accessible signal will dy- ex:signal mo:similar_to ex:signal36. namically derive statements holding informa- ex:signal mo:similar_to ex:signal42. tion about the musical instrument used and about which signals are similar. Because we The ex:advert2 resource is the same as one might have a large number of such built-in of the similar signals. It is inconsequential predicates,wemusttakecaretoensurethat which signal is chosen, though results should we don’t waste computation effort. We detail be consistent across multiple requests to the one approach to avoiding waste below. service. By using such a mechanism, only the com- Advertising dynamic resources putation that the user agent is interested in We developed a simple approach to expos- will be triggered. If the user agent looks for an ing algorithms on the Semantic Web. It’s com- instrument associated with a signal, it will trig- patible with existing user agents while avoiding ger a call only to mt:instrument.Ifitlooksfor wasteful computation. We begin by publishing similar signals, it will trigger a call only to only advertisement statements for the compu- mt:similarity. Where appropriate, we can tation axioms, providing user agents with cache the results of these underlying built-in a URI that, when dereferenced, will trigger a predicates for later use. unique class of computations. The property we used for these advertisement links is the Content-based similarity property they advertise (mt:instrument and We adapted a content-based similarity mea- mo:similar_to, in our example). sure developed for the playlist-generation For instance, an end-point providing such a tool SoundBite (see http://www.isophonics.net/ mechanism on top of the axioms mentioned SoundBite) to use for SBSimilarity (see http:// earlier will issue just two statements, when a www.isophonics.net/SBSimilarity). We started description of an accessible signal ex:signal with a database consisting of 300,000 tracks, is requested: and incorporated it into a SPARQL endpoint ex:signal mt:instrument ex:advert1. built using Jena and Joseki (see http://jena.sf. ex:signal mo:similar_to ex:advert2. net). With precomputed features for each track, we can quickly perform similarity searches across Then, when a user agent requests ex:advert1, thedatabasebycomputingthedistancebetween the built-in mt:instrument analysis is trig- each track and the requested track. gered and we append the resulting statements The system performs approximately to the returned description. 300,000 simple distance calculations to sat- To make the process transparent for the cli- isfy each similarity request. We used derefer- ent, we state that ex:advert1 is the same enceable identifiers for the tracks from the

IEEE MultiMedia as the matching output. The advertisement MusicBrainz database, and set up URIs

58 using URISpace (see http://code.google.com/ p/km-rdf). Doing so provided on-demand computation of between-track similarity mea- sures, with track descriptions linking to basic editorial metadata along with audio previews and purchase pages on the Amazon music store. We built a simple interface that ties in iTunes, illustrated in Figure 3. Users can browse the Semantic Web to find more infor- mation about the tracks, preview their audio, and perhaps choose to buy them from Amazon. This user interface demonstrates how an audio collection can easily provide entry points for exploring the growing body of data on the Semantic Web. It also shows how computation can be hidden behind Semantic Web resources to provide user agents with a uniform data- access mechanism for static and dynamic infor- mation, which can be either human-authored (coming from an editorial database such as MusicBrainz, for example) or automatically extracted from multimedia content (as in our similarity example).

Future work The mechanisms we’ve described here pro- vides a way to break down analysis into compo- nent steps and distribute the computation across multiple hosts. For example, we could break down our single built-in predicate mt: similarity into several:14

Framing the audio signal as mt:frames,

Mel-Frequency Cepstral Coefficient (MFCC) computation as mt:mfccs,

Gaussian modeling of the resulting MFCCs as mt:model,and

Kullback-Leibler divergence between two such models as mt:kldiv. Figure 3. We wrapped a modified version of the GNAT as a plug-in for Apple’s iTunes media player. (a) A user can click a menu item to identify the When computation delay is not a significant currently playing track, and (b) display available information in the Tabulator. issue, we can spread the computation behind (c) This information includes the advertisement resource for similar tracks, the predicates across different hosts. which can be dereferenced to trigger the calculation and retrieve the actual For example, host A (serving resources under similarity statements. the namespace hA) might provide the mecha- nism described earlier for the mt:frames and facility (values of the mt:model predicate). the mt:mfcc predicates, therefore providing Then, such models could be used by host C on-demand MFCCs for all the audio items it (namespace hC), itself providing the mt:kldiv knows about. Such MFCCs could be used by predicate. Finally, we can wrap this similarity host B (namespace hB), providing a modeling measure into the mt:similarity predicate

59 Figure 4. Advertised graph for a decomposed mt:frames mt:model content-based mt:mfccs hB:advert3 similarity measure. hA:Signal1 hA:advert1 hA:advert2 mt:kldiv hC:advert7 hA:Signal2 hA:advert4 hA:advert5

mt:frames mt:mfccs hB:advert6 mt:model

Advertisement resources hosted by A Hosted by C

Hosted by B

used in our earlier examples with the following In this setup, only host A needs access to the rule: actual content. The other hosts only deal with {(?signal1 ?signal2) mt:similarity higher-level representations of the underlying ?distance} signal. <¼ { At first, we can depict the advertised graph as ?signal1 mt:frames ?frames1. showninFigure4.OncehostCaggregatesthis ?frames1 mt:mfccs ?mfccs1. graph, the mechanism is exactly as described ?mfccs1 mt:model ?model1. in B from the user agent’s point of view. The ?signal2 mt:frames ?frames2. only difference lies in the process triggered by ?frames2 mt:mfccs ?mfccs2. dereferencing the similar signal’s advertisement ?mfccs2 mt:model ?model2. resource. First, this process involves dereferenc- (?model1 ?model2) mt:kldiv ing the advertisement resource for the Kull- ?distance. back-Leibler divergence (hC:advert7), the }. actual value of which must be thresholded to derive mo:similar to statements. To com- Namespaces pute this divergence, host C must obtain the The following defines the namespaces used throughout the article: two MFCC models (advertised through hB:ad- vert3 and hB:advert6). And to compute @prefix foaf: , these models, host B must access the MFCCs for @prefix mo: , the two signals (advertised through hA:advert2 @prefix owl: , and hA:advert5). Host A provides these results @prefix rdf: , @prefix rdfs: , actual models, which the latter can process. @prefix xsd: , The advertisements serve to define the @prefix math: , gered by host C. However, if any of the hosts @prefix dc: , have aggregated different advertisements @prefix mit: , for the same information, they can substitute @prefix event:,and the computation network dynamic. In this @prefix log: . work might lead to. A new algorithm and the #Toy namespaces: structured nature of its output can be shared by publishing a new rule on the Web stating @prefix ex: , how to stack different operations to derive @prefix mt: , particular statements (as in the last code snip- @prefix hA: , @prefix hB: ,and pet). Results are then computed only when @prefix hC: . needed, either by a user agent exploiting them (for visualization purposes, for example)

60 Related Work While maintaining the dynamic nature of our framework, we analyze.echonest.com) provides a Web service that returns a set can substitute several alternatives to our advertisement resour- of content-based features from particular audio files. Finally, the ces mechanism. One suggested approach1 is that within the On-Demand Metadata Extraction Framework (OMEN) is the representation of a resource (a signal, for example), we use a most comparable to our work.6 specialized vocabulary to link to a Resource Description Frame- Music analysis algorithms in the OMEN system are decom- work (RDF) document, which gives the user agent some indica- posed into several stages, and only higher-level representations tion of what it would find by retrieving this document. The user of the actual content are exchanged between these stages. How- agent then makes an informed decision as to whether to retrieve ever, this framework has several limitations. It does not provide the extra information. any facilities to go beyond the first level of representation and For example, a specialized predicate mo:onsetDocument it’s not fully distributed because a single master node manages subclassing rdfs:seeAlso could indicate ‘‘more information all of its aspects. Still, one advantage of this framework is that about the onsets detected in this signal can be found in that doc- processing code can be sent to the nodes that have access to ument.’’ If the user agent is a beat tracker operating on onsets as the actual audio data, allowing the user to access new higher- an input, it would likely retrieve this extra document, whereas if level representations. the user agent is an editorial metadata crawler it would not. In our own framework, we can achieve similar functionality by Compared with the approach presented in the main article, deploying feature-extraction systems through a plug-in architec- the disadvantage of this method is that it requires user agents ture that can retrieve code on the basis of query parameters. to understand the semantics of such links (‘‘information about Web service frameworks require significant effort to combine onsets are available there’’) rather than just the semantics of functionality from several providers. In the data-oriented the desired RDF (‘‘this signal has the following onsets’’) to browse approach presented in this article, we can avoid this effort as and query all available data. The main advantage is that the a published algorithm specifies explicitly what it needs. The structure is explicit and left to the discretion of the user agent, user agent, then, is free to meet those requirements with suit- which might interpret it in unexpected and useful ways. able data from any provider. Of course, specifying the data requirements of a particular al- Description framework gorithm means using a suitably expressive ontology.7 Several We leave aside a general comparison of MPEG standards and terms in the Music Ontology framework allow such descriptions 2 Semantic Web technologies. Briefly, however, several aspects of to be made, but we plan to do additional work to develop more 3 MPEG- 7 pushed us toward other description frameworks. For specific audio-feature ontologies. example, MPEG-7 tends to limit the expression of multimedia- related information by explicitly restricting the kind of information References MPEG-7 documents might contain. While this approach can work well for closed, isolated data sets, it’s inappropriate for building a 1. T. Berners-Lee, Linked Data, July 2006; http://www.w3.org/ large-scale web of data where such enforced domain boundaries DesignIssues/LinkedData.html. are artificial and should be avoided if at all possible. 2. J. van Ossenbruggen, F. Nack, and L. Hardman, ‘‘That Ob- In addition, the inability to use URIs as identifiers within MPEG- scure Object of Desire: Multimedia Metadata on the Web, 7 descriptions makes it difficult to distribute and interlink data Part 2,’’ IEEE MultiMedia, vol. 12, no. 4, 2005, pp. 54-63. across multiple locations on the Web. For example, when annotat- 3. F. Nack and A.T. Lindsay, ‘‘Everything You Wanted to Know ing a particular recording as involving Glenn Gould, we want to about MPEG-7, Part 2,’’ IEEE MultiMedia, vol. 6, no. 4, refer to him by his DBpedia URI (see http://dbpedia.org/ 1999, pp. 64-73. resource/Glenn_Gould) instead of having to redescribe him. 4. J. Hunter, ‘‘Adding Multimedia to the Semantic Web: Build- Finally, MPEG-7 lacks an expressive conceptual model for complex ing an MPEG-7 Ontology,’’ Proc. 1st Semantic Web Working music production data. Such concepts are at the core of most Symp. (SWWS), 2001, pp. 261-283. music-related data sets we have so far published within the web 5. A.F. Ehmann, J.S. Downie, and M.C. Jones, ‘‘The Music Infor- of data. Despite these issues, there are some efforts toward bridg- mation Retrieval Evaluation Exchange ‘Do-It-Yourself’’ Web 4 ing the gap between MPEG-7 and the Semantic Web. Service,’’ Proc. Int’l Conf. Music Information Retrieval, Austrian Web services Computer Society, 2007, pp. 323-324. Several projects have attempted to create a distributed frame- 6. D. McEnnis, C. McKay, and I. Fujinaga, ‘‘Overview of work for automatically extracting content-based data with service- OMEN,’’ Proc. Int’l Conf. Music Information Retrieval, Univ. oriented architectures. One group is working on a set of Web of Victoria, 2006; http://ismir2006.ismir.net/PAPERS/ services (see http://www.ifs.tuwien.ac.at/mir/webservice) for ISMIR06145_Paper.pdf. audio feature extraction for rhythm and timbre. Another group 7. F. Pachet and P. Roy, ‘‘Exploring Billions of Audio Features’’ is working on a Web service that allows remote music information Proc. Int’l Workshop Content-Based Multimedia Indexing, retrieval.5 In addition, the EchoNest Analyze API (see http:// 2007, pp. 227-235.

61 or by another analysis process wanting to build from one side to the other: we are well placed upon them. to begin closing the so-called semantic gap. In The dynamically generated statements can contrast to the traditional application-oriented be clustered in Named Graphs9 that serve as approach, this data-oriented approach allows anchors for more information about the steps us to be far more flexible about what data is needed to achieve a particular result (rules published and how it is reused. Further research derived, information about a particular compu- can be built on top of published algorithms and tational step, and so forth): data sets, and end-user tools can be directly built on top of the Web of data. While the ex- (ex:signal1 ex:signal2) mt:similarity ample queries and applications presented here "0.8"^^xsd:float. are influenced by the data sets currently avail- <> ex:computation [ able, as more information is added to the web ex:host ex:dsp_cluster; of data, some very exciting and unforeseen ex:computation_time uses of these technologies will emerge. MM "PT3.12S"^^xsd:duration; ]; Acknowledgments ex:confidence "0.7"^^xsd:float; ex:premises { The authors thank the reviewers for their ex:signal1 mt:frames ex:frames1. helpful feedback and acknowledge the support of the Centre for Digital Music and the Depart- ex:signal2 mt:frames ex:frames2. ment of Computer Science at Queen Mary, Uni- etc. versity of London. This work was partially }. supported by the Engineering and Physical To the statement derived by the rule men- Sciences Research Council (EPSRC) funded In- tioned previously, we attached the host on formation and Communication Technologies which the logical inference occurred, the (ICT) project OMRAS-2 (EP/E017614/1). premises allowing the statement to be con- cluded, the computation time taken, and an References associated overall confidence. 1. T. Berners-Lee, J. Handler, and O. Lassila, ‘‘The A topic of ongoing discussion in the re- Semantic Web,’’ Scientific American, May 2001, search community is how to perform analysis pp. 34-43. across large music data sets when any one lab- 2. T. Berners-Lee, Linked Data, July 2006; http:// oratory has access to only a subset of the audio www.w3.org/DesignIssues/LinkedData.html. data because of copyright considerations. 3. T. Berners-Lee et al, ‘‘Tabulator: Exploring and Some argue that higher-level representations Analyzing Linked Data on the Semantic Web,’’ (such as rhythm and timbre information that Proc. 3rd Int’l Semantic Web User Interaction can’t be used to reconstruct the actual audio) Workshop, 2006, p. 6. derived from the copyright signal are them- 4. S. Boll, ‘‘Multitube: Where Web 2.0 and Multi- selves not covered by the copyright, in which media Could Meet,’’ IEEE MultiMedia, vol. 14, case the system described previously offers a no. 1, 2007, pp. 9-13. clear advantage. It allows any number of sys- 5. Y. Raimond et al., ‘‘The Music Ontology,’’ Proc. tems to derive additional information about Int’l Conf. Music Information Retrieval, Austrian music tracks without violating copyright by Computer Society, 2007, pp. 417-422; http:// accessing the audio directly. Unfortunately, ismir2007.ismir.net/proceedings/ISMIR2007_ the extent to which these derived representa- p417_raimond.pdf. tions really are copyright-free is currently 6. J.R. Smith and P. Schirling, ‘‘Metadata Standards unclear. Roundup,’’ IEEE MultiMedia, vol. 13, no. 2, 2006, pp. 84-88. Conclusion 7. S. Auer et al., ‘‘DBpedia: A Nucleus for a Web With the system described here, we can of Open Data,’’ Proc. Int’l Semantic Web Conf., gather automatically generated representations Springer, 2007, pp. 722-735. and manually asserted data in a distributed and 8. Chris Bizer et al., ‘‘Interlinking Open Data arbitrarily large data set: the Web. This puts us on the Web,’’ Demonstrations Track, Proc.

IEEE MultiMedia in a good position to start making inferences 4th European Semantic Web Conf., 2007;

62 http://www.eswc2007.org/pdf/demo-pdf/ Yves Raimond is a software engineer for BBC Audio LinkingOpenData.pdf. & Music Interactive. His research interests include 9. IFLA Study Group on the Functional Requirements music analysis, knowledge representation, Seman- for Bibliographic Records, Functional Requirements tic Web, and linked data. Raimond has a PhD in for Bibliographic Records—Final Report, UBCIM electronic engineering from Queen Mary, Univer- Publications—New Series, vol 19, Sept. 1998; sity of London. Contact him at yves.raimond@bbc. http://www.ifla.org/VII/s13/frbr/frbr1.htm. co.uk 10. S. Luke and J. Hendler, ‘‘Web Agents that Work,’’ Christopher Sutton is a research and development IEEE MultiMedia, vol. 4, no. 3, 1997, pp. 76-80. engineer for Intrasonics. His research interests in- 11. M. Hildebrand, J. van Ossenbruggen, and clude music information retrieval, the Semantic L. Hardman, ‘‘The Semantic Web,’’ Proc. Int’l Web, and audio steganography. Sutton has an MSc Semantic Web Conf., LNCS 4273, Springer, 2006, in digital music processing from Queen Mary, Univer- pp. 272-285. sity of London. Contact him at christopher.sutton@ 12. P. Herrera et al., ‘‘Simac: Semantic Interaction cantab.net. with Music Audio Contents,’’ Proc. 2nd European Workshop Integration of Knowledge, Semantic and Mark Sandler is a professor of signal processing Digital Media Technologies, IEEE Press, 2005, at Queen Mary, University of London, and director pp. 399-406. of the Centre for Digital Music. His research interests 13. S. Boll, ‘‘Share It, Reveal It, Reuse It, and Push include music informatics, music information Multimedia into a New Decade,’’ IEEE MultiMedia, retrieval, Semantic Web for music, and 3D sound. vol. 14, no. 4, 2007, pp. 14-19. Sandler has a PhD in digital audio power amplifica- 14. E. Pampalk, Computational Models of Music Simi- tion from the University of Essex, UK. He is a Fellow larity and their Application in Music Information of the IEE and of the Audio Engineering Society. He is Retrieval, doctoral dissertation, Vienna Univ. of a two-time recipient of the IEE A.H. Reeves Premium Technology, 2006. Prize. Contact him at [email protected].

63