A Distributed Music Information System
Total Page:16
File Type:pdf, Size:1020Kb
A Distributed Music Information System Thesis submitted in partial fulfilment of the requirements of the University of London for the Degree of Doctor of Philosophy Yves Raimond Submitted: November 2008 Department of Electronic Engineering, Queen Mary, University of London I certify that this thesis, and the research to which it refers, are the product of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. I acknowledge the helpful guidance and support of my supervisor, Prof Mark Sandler. 2 Abstract Information management is an important part of music technologies today, covering the man- agement of public and personal collections, the construction of large editorial databases and the storage of music analysis results. The information management solutions that have emerged for these use-cases are still isolated from each other. The information one of these solutions manages does not benefit from the information another holds. In this thesis, we develop a distributed music information system that aims at gathering music- related information held by multiple databases or applications. To this end, we use Semantic Web technologies to create a unified information environment. Web identifiers correspond to any items in the music domain: performance, artist, musical work, etc. These web identifiers have structured representations permitting sophisticated reuse by applications, and these representations can quote other web identifiers leading to more information. We develop a formal ontology for the music domain. This ontology allows us to publish and interlink a wide range of structured music-related data on the Web. We develop an ontology evaluation methodology and use it to evaluate our music ontology. We develop a knowledge representation framework for combining structured web data and analysis tools to derive more information. We apply these different technologies to publish a large amount of pre-existing music-related datasets on the Web. We develop an algorithm to automatically relate such datasets among each other. We create automated music-related Semantic Web agents, able to aggregate musical resources, structured web data and music processing tools to derive and publish new information. Finally, we describe three of our applications using this distributed information environment. These applications deal with personal collection management, enhanced access to large audio streams available on the Web and music recommendation. 3 Acknowledgements I would like to thank my supervisor, Prof. Mark Sandler, for his constant support throughout my years in the C4DM group in Queen Mary, University of London. I would also like to thank the people who have been part of C4DM while I've been there. In no particular order: David P, Chris C, Chris H, Chris L, Matthew, Xue, Matthias, Adam, Louis, Juan, Katy, Amelie, Kurt, Emmanuel, Paul, Enrique, Dan S, Dan T, Steve, Andrew N, Andrew R, Mark P, Mark L, Beiming, Peyman, Robert, Nico, Gyorgy, Becky, Josh, Simon, Ivan and Maria. I would especially like to thank Samer Abdallah for developing the original concepts underlying the Music Ontology framework. His ideas have had a profound impact on my work, and I will never thank him enough for that. I would also like to thank Christopher Sutton, for this amazing year we spent working together, and for taking the time to read this thesis. I also thank everybody in the Music Ontology community, especially Frederick Giasson, and in the Linking Open Data project, for their kindness and motivation, the Musicbrainz community for their outstanding work, as well as all the DBTune users for helping me spotting and fixing more bugs than I can count. I also thank the many people with whom I had the chance to work over these three years, in particular Oscar Celma, Alexandre Passant, Michael Hausenblas, Tom Heath, Chris Bizer, Wolfgang Halb and Francois Scharffe. I would also like to thank the people behind SWI- Prolog and ClioPatria, for developing the framework I used for most of the experiments described in this thesis. I would like to thank the BBC Audio & Music interactive team, for providing ideas, support and lots of data. I also acknowledge the support of a studentship from both the Department of Electronic Engineering and the Department of Computer Science at Queen Mary. Finally, many thanks to my family for their continued support. I would especially like to thank Anne-Sophie for her constant love, support and understanding, even when I had to work overnight, on week-ends or on holidays. 4 Contents Abstract 3 I Introduction and background 12 1 Introduction 13 1.1 Problem definition . 13 1.2 Requirements . 14 1.3 Research method . 14 1.3.1 Constructs and models . 15 1.3.2 Methods and instantiations . 15 1.4 Thesis outline . 15 1.5 Major contributions . 17 1.6 Published Work . 17 1.6.1 International Journals . 17 1.6.2 International Conferences . 18 1.6.3 Refereed Workshops . 18 2 Knowledge Representation and Semantic Web technologies 20 2.1 Knowledge Representation . 20 2.1.1 Propositional logic . 21 2.1.2 First-Order Logic . 21 2.1.2.1 FOL syntax . 21 2.1.2.2 FOL semantics . 21 2.1.2.3 Entailment and proof . 22 2.1.2.4 Inference mechanisms . 22 2.1.3 Higher-order logic and reification . 23 2.1.4 Semi-decidability of FOL . 23 2.1.5 The need for ontologies . 23 2.1.6 Description Logics . 24 2.2 Web technologies . 26 2.2.1 Introduction . 26 2.2.2 Towards the Semantic Web . 26 2.2.3 A web of data . 27 2.2.4 Machine-processable representations . 27 2.2.4.1 XML . 27 2.2.4.2 The RDF model . 28 2.2.4.3 RDF syntaxes . 30 2.2.4.4 Microformats and GRDDL . 31 2.2.4.5 N3 data model and syntax . 32 2.2.5 Semantic Web user agents . 33 2.2.6 Web ontologies . 34 2.2.6.1 SHOE . 34 2.2.6.2 RDF Schema . 34 2.2.6.3 OWL . 34 2.3 Summary . 35 5 II A web-based music information system 36 3 Conceptualisation of music-related information 37 3.1 Models of the music domain . 37 3.1.1 Layered modelling of the music domain . 37 3.1.1.1 Conceptual layering in the music domain . 38 3.1.1.2 FRBR . 38 3.1.2 Hierarchical modelling of the music domain . 39 3.1.2.1 Performances and scores . 39 3.1.2.2 Audio items . 40 3.1.2.3 Dublin Core . 40 3.1.2.4 MPEG-7 . 40 3.1.3 Workflow-based modelling . 41 3.2 Core elements of a Music Ontology . 42 3.2.1 Time . 42 3.2.1.1 Overview of temporal theories . 42 3.2.1.2 The Timeline ontology . 43 3.2.2 Event . 45 3.2.2.1 Event modelling . 46 3.2.2.2 The Event ontology . 46 3.3 The Music Ontology . 48 3.3.1 Editorial data . 48 3.3.1.1 FOAF . 49 3.3.1.2 FRBR Items and Manifestations . 49 3.3.1.3 Music Ontology terms for editorial information . 49 3.3.2 Music production workflow . 50 3.3.2.1 Event, FRBR Expressions and Works, and physical time . 50 3.3.2.2 Music Ontology terms for production workflow information . 50 3.3.3 Signals, temporal annotations and event decomposition . 53 3.3.3.1 Signals and timelines . 53 3.3.3.2 Temporal annotations . 54 3.3.3.3 Event decomposition . 54 3.3.3.4 Web ontologies . 55 3.3.4 Available Extensions . 55 3.3.4.1 Instrument taxonomy . 56 3.3.4.2 Chord Ontology . 56 3.3.4.3 Symbolic Ontology . 58 3.3.4.4 Synchronisation and timing . 59 3.3.4.5 Ontologies of audio features . 60 3.4 Discussion . 61 3.4.1 Comparison with related web ontologies . 61 3.4.1.1 Musicbrainz RDF Schema . 61 3.4.1.2 Nepomuk ID3 Ontology . 63 3.4.1.3 Music Vocabulary . 63 3.4.1.4 SIMAC ontology . 63 3.4.2 An open music representation system . 63 3.4.2.1 Web technologies and open music representation systems . 64 3.4.2.2 Extensibility of music production workflows . 64 3.5 Summary . 65 4 Evaluation of the Music Ontology framework 66 4.1 Techniques for ontology evaluation . 66 4.1.1 Qualitative evaluation . 67 4.1.2 Structural and ontological metrics . ..