Semantic Audio Analysis Utilities and Applications
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Audio Analysis Utilities and Applications PhD Thesis Gyorgy¨ Fazekas Centre for Digital Music School of Electronic Engineering and Computer Science, Queen Mary University of London April 2012 I certify that this thesis, and the research to which it refers, are the product of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. I acknowledge the helpful guidance and support of my supervisor, Professor Mark Sandler. Abstract Extraction, representation, organisation and application of metadata about audio recordings are in the concern of semantic audio analysis. Our broad interpretation, aligned with re- cent developments in the field, includes methodological aspects of semantic audio, such as those related to information management, knowledge representation and applications of the extracted information. In particular, we look at how Semantic Web technologies may be used to enhance information management practices in two audio related areas: music informatics and music production. In the first area, we are concerned with music information retrieval (MIR) and related research. We examine how structured data may be used to support reproducibility and provenance of extracted information, and aim to support multi-modality and context adap- tation in the analysis. In creative music production, our goals can be summarised as follows: O↵-the-shelf sound editors do not hold appropriately structured information about the edited material, thus human-computer interaction is inefficient. We believe that recent developments in sound analysis and music understanding are capable of bringing about significant improve- ments in the music production workflow. Providing visual cues related to music structure can serve as an example of intelligent, context-dependent functionality. The central contributions of this work are a Semantic Web ontology for describing record- ing studios, including a model of technological artefacts used in music production, method- ologies for collecting data about music production workflows and describing the work of audio engineers which facilitates capturing their contribution to music production, and fi- nally a framework for creating Web-based applications for automated audio analysis. This has applications demonstrating how Semantic Web technologies and ontologies can facilitate interoperability between music research tools, and the creation of semantic audio software, for instance, for music recommendation, temperament estimation or multi-modal music tutoring. 3 Acknowledgements I would like to take this opportunity to thank everyone in the Centre for Digital Music at Queen Mary University of London. My research would not have been possible without the help I received, the numerous discussions and wide-ranging collaborations I have had, and the stimulating and friendly atmosphere I have found in this group. I would particularly like to thank my supervisor, Mark Sandler, for his encouragement, guidance and support. His wide-ranging knowledge helped every aspects of my work. I would like to thank to my second supervisor Simon Dixon for his advice and meticulous corrections of my various PhD progress reports, and my internal accessor Nick Bryan-Kinns whose objective advice has helped to steered my research. I owe a great deal to the work of people I frequently collaborated with. In no particular order, thanks to Dan Tidhar, who shared his knowledge of musical temperament with me, Thomas Wilmering for countless discussions that helped me in creating the Studio Ontology, Mathieu Barthet, Am´elie Angalade, and Sefki Kolozali who I very much enjoyed working with at various music hack days, Kurt Jacobson and Matt Bernstein for their invaluable help in server configuration, Matthias Mauch and Katy Noland for enthusiastically sending me bug reports about software I wrote, and Matthew Davies for his encouragement in the early stages of my academic work. Special thanks to Yves Raimond, without his work and the Music Ontology, this thesis would not have been possible. I owe a lot to our early discussions which made me realise the importance of semantics, enabled me to understand Semantic Web technologies, and helped me to formulate my long term research goals. Thanks to all past and present members of C4DM who created software I used in my research. The applications discussed in this thesis would not have been possible without the contributions of Chris Cannam, Mark Levy, Chris Sutton, Chris Landone, and the brilliant research of all members of the group that enable the work of Vamp plugins. I must also acknowledge the work of open source communities, including the enthusiastic people who cre- ated Audacity, Python, Numpy, Scipy, libRDF, rdflib, Matplotlib, BibDesk, LaTeX, TeXshop and numerous other open source libraries and software packages I used. Finally, I thank my Mother and Father and all my family, for their encouragement and continued support in my endeavour in a distant country. 4 A note on language and style Throughout this thesis the term ”we” is used (in line with recent trends in academic writing) to refer to the author’s work, which is influenced by numerous discussions with his thesis supervisor, faculty and colleagues. This fact is acknowledged through the use of the plural form, except in cases where its use invites the reader to think alongside, or may well disagree with the author. The thesis may reflect the sole opinion of the author in some cases however. Statements in this category are delineated using phrases such as ”in the author’s opinion” or the occasional use of first person singular. 5 License This work is copyright c 2012 Gy¨orgy Fazekas, and is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported Licence. To view a copy of this licence, visit http://creativecommons.org/licenses/by-sa/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. 6 Contents Contents 10 1 Introduction 17 1.1 Semantic analysis of musical audio ........................ 17 1.2 Organisation of this work ............................. 21 1.3 Utilities and applications of semantic audio ................... 21 1.3.1 Creative music production ......................... 22 1.3.2 Music Information Retrieval ........................ 26 1.4 Signal processing techniques for semantic audio analysis ............ 30 1.4.1 Physical and perceptual features of sound ................ 31 1.4.2 Semantic features of audio and music .................. 37 1.5 Machine intelligence in music production ..................... 49 1.5.1 Audio engineering workflows ....................... 50 1.5.2 Motivations ................................. 52 1.5.3 Studio specific problems .......................... 56 1.6 Research scope and methodology ......................... 59 2 Information Management and Knowledge Representation for Audio Applications 61 2.1 Data, Metadata, Information and Knowledge .................. 61 2.2 Information Management in Semantic Audio applications ........... 63 2.2.1 A Semantic Audio Tool .......................... 64 2.2.2 Information Management and Knowledge Representation ....... 66 2.3 Logical Foundations ................................ 69 2.3.1 Propositional Logic ............................. 69 2.3.2 First Order Logic .............................. 69 2.3.3 Higher Order Logics and reification .................... 73 2.3.4 Temporal and Modal Logics ........................ 73 2.3.5 Description Logics ............................. 73 2.3.6 Knowledge Representation and Ontologies ................ 74 7 2.3.7 Taxonomies and Partonomies ....................... 74 2.4 Semantic Web Technologies ............................ 75 2.4.1 Information Management and the Semantic Web ............ 75 2.4.2 Resource Description Framework ..................... 76 2.4.3 Semantic Web ontologies ......................... 78 2.4.4 RDFS and the OWL ............................ 79 2.4.5 SPARQL .................................. 80 2.4.6 Notation 3 and RIF ............................ 80 2.4.7 Linked data ................................. 81 2.5 Summary ...................................... 83 3 Ontology Engineering and Multimedia Ontologies 84 3.1 Ontologies and basic ontological decisions .................... 84 3.1.1 Ontology and Philosophy ......................... 85 3.1.2 Ontology engineering ............................ 86 3.1.3 Some use cases for ontology design .................... 87 3.1.4 Ontology design principles ......................... 89 3.1.5 Summary .................................. 93 3.2 Conceptualisations of Music and Multimedia Information ........... 93 3.2.1 Metadata standards ............................ 94 3.2.2 Basic categories of information about intellectual works ........ 95 3.2.3 Bibliographic information ......................... 96 3.2.4 Cultural information ............................ 98 3.2.5 Content-based information ........................ 99 3.2.6 Provenance and workflow information .................. 107 3.3 Metadata harmonisation using core and foundational ontologies ........ 115 3.3.1 ABC ..................................... 116 3.3.2 DOLCE and COMM ............................ 117 3.4 Reflections on design principles .......................... 120 4 Ontologies for Semantic Audio Information Management 121 4.1 Overview of the Music Ontology Framework ................... 121 4.1.1 Utilities