P405to410 25Th Conf.Prog
Total Page:16
File Type:pdf, Size:1020Kb
AES 25th International Conference Program Metadata for Audio 2004 June 17–19 London, UK Technical Sessions Thursday, June 17 what it can do for the user, and various ways that it TUTORIALS can be employed in the area of metadata for audio. The contents of this paper form the basis for a SESSION T-1: INTRODUCTION number of the papers that appear later in the conference. T1-1 Metadata, Identities, and Handling Strategies— Chris Chambers, BBC R&D, Tadworth, Surrey, UK T2-3 Keeping it Simple: BWF and AES31—John Emmett, (invited) Broadcast Project Research Ltd., Teddington, Middlesex, UK (invited) With all the potential media material and its associated metadata becoming accessible on IT-based systems, Digital audio is spreading outward to the furthest how are systems going to find and associate the ele- reaches of the broadcast chain. Making the best use of ments of any single item? How are the users going to the opportunities presented by this demands a stan- know they have the correct items when assembling dardization procedure that is adaptable to a vast num- audio, video, and information for use within a larger ber of past, present, and future digital audio formats project? This short talk will explore the way areas of and scenarios. In addition, would it not be just great if our industry are hoping to tackle the problem and it cost nothing? This paper will point out the benefits of some of the standards being introduced to ensure what we already have and tell a tale of borrowing management of this material is possible. economical audio technology from many sources. T1-2 Before There Was Metadata—Mark Yonge (invited) Thursday, June 17 Audio has never existed in isolation. There has always SESSION T-3: PRACTICAL SCHEMES been a mass of associated information, both explicit and implicit, to direct, inform, and enhance the use of T3-1 The Role of Registries—Philippa Morrell, Metadata the audio. In the blithe days before information theory Associates Ltd., London, UK (invited) we didn’t know it was all metadata. This paper reviews Some forms of metadata, especially those that identify the extent of traditional metadata covering a range of objects or classes of objects, form classes of their own forms. Some of them may be surprising; all of them that need to be administered centrally in order to avoid need to be re-appraised in the light of newer, more for- the risk of duplication and consequent misidentifica- mal metadata schemes. tion. The concept of such a registry is not new; for example, International Standard Book Numbers (ISBN) Thursday, June 17 derive from a central registry that was originally set up in 1970. The registry that ensures that every ethernet- SESSION T-2: FILE BASICS connected device in the world is uniquely identifiable is T2-1 Introduction to MXF and AAF—Philip DeNier, BBC another example. Formal identifiers and other metada- R&D, Tadworth, Surrey, UK (invited) ta for use in commercial transactions will increasingly use the services of one or more metadata registries, as The AAF and MXF file formats provide a means to this paper will discuss. exchange digital media along with a rich (extendible) set of metadata. This presentation will be a basic intro- T3-2 Sound Effect Taxonomy Management in duction into the content of these file formats and will Production Environments—Pedro Cano, Markus include a description of the metadata scheme used. Koppenberger, Perfecto Herrera, Oscar Celma, Universitat Pompeu Fabra, Barcelona, Spain T2-2 XML Primer—Claude Seyrat (invited) Categories or classification schemes offer ways of nav- Most audio professionals have heard of the term “XML” igating and having higher control over the search and but not many know for sure what it means or have yet retrieval of audio content. The MPEG-7 standard pro- had to work with it. This paper sets out what XML is, vides description mechanisms and ontology man- ➥ J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 405 25rd International Conference Program agement tools for multimedia documents. We have manually or automatically and are stored in an XML implemented a classification scheme for sound effects database. Retrieval services are implemented in the management inspired by the MPEG-7 standard on top database. A set of musical transformations are of an existing lexical network, WordNet. WordNet is a defined directly at the level of musically meaningful semantic network that organizes over 100,000 con- MPEG-7 descriptors and are automatically mapped cepts of the real world with links between them. We onto low-level audio signal transformations. Topics show how to extend WordNet with the concepts of the included in the presentation are: (1) Description gen- specific domain of sound effects. We review some of eration procedure, manual annotation of editorial the taxonomies to acoustically describe sounds. Mining description: the MDTools, automatic description of legacy metadata from sound effects libraries further audio recordings, the SPOffline; (2) Retrieval function- supplies us with terms. The extended semantic net- alities, local retrieval: SPOffline, remote retrieval: work includes the semantic, perceptual, and sound Web-based retrieval; and (3) Transformation utilities: effects specific terms in an unambiguous way. We the SPOffline. show the usefulness of the approach, easing the task for the librarian and providing higher control on the Using MPEG-7 Audio Low-Level Scalability: A search and retrieval for the user. Guided Tour—Jürgen Herre, Eric Allamanche, Fraunhofer IIS, Ilmenau, Germany T3-3 Dublin Core—R. Wright, BBC (invited) [Abstract Not Available at Press Time] Dublin Core metadata provides card catalog-like defini- tions for defining the properties of objects for Web- based resource discovery systems. The importance of Friday, June 18 the Dublin Core is its adoption as a basis for many CONFERENCE DAY 1 more elaborate schemes. When the view ahead is SESSION CD-1: FRAMEWORKS obscured by masses of local detail, a firm grasp of the Dublin Core will often reveal the real landscape. 1-1 Data Model for Audio/Video Production—A. Ebner, IRT, Munich, Germany Thursday, June 17 When changing from traditional production systems to WORKSHOP—MPEG-7 IT-based production systems the introduction and usage of metadata is unavoidable. Direct access of the Coordinator: G. Peeters, IRCAM, Paris, France information stored in IT-based systems is not possible. (in association with SAA TC) Descriptive and structural metadata are the enablers to have proper access of selected material. Metadata Managing Large Sound Databases Using MPEG— does not focus on descriptive information about the Max Jacob, IRCAM, Paris, France content only. It describes the usage of the material, the structure of a program, handling processes, relevant Sound databases are widely used for scientific, com- information, delivery information about properties, and mercial, and artistic purposes. Nevertheless there is yet storage of information. The basis to achieve a com- no standard way to manage them. This is due to the plete collection of metadata is a detailed analysis of a complexity of describing and indexing audio content and broadcaster's production processes and usage cases. to the variety of purposes a sound database might A logical data model expresses the relationship address. Recently there appeared MPEG-7, a standard between the information and is the foundation for for audio/visual content metadata that could be a good implementations that enable a controlled exchange starting point. MPEG-7 not only defines a set of descrip- and storage of metadata. tion tools but is more generally an open framework host- ing specific extensions for specific needs in a common environment. This is crucial since there would be no way 1-2 P-META: Program Data Exchange in Practice— to freeze in a monolithic definition all the possible needs Wes Curtis, BBC Television, London, UK (invited) of a sound database. This paper outlines how the [Abstract Not Available at Press Time] MPEG-7 framework can be used, how it can be extend- ed, and how all this can fit into an extensible database design, gathering three years of experience during the Friday, June 18 CUIDADO project at IRCAM. SESSION CD-2: POSTERS, PART 1 Integrating Low-Level Metadata in Multimedia 2-1 Low-Complexity Musical Meter Estimation from Database Management Systems—Michael Casey, Polyphonic Music—Christian Uhle1, Jan Rohden1, City University, London, UK Markus Cremer1, Jürgen Herre2 1Fraunhofer AEMT, Erlangen, Germany [Abstract Not Available at Press Time] 2Fraunhofer IIS, Ilmenau, Germany Tools for Content-Based Retrieval and This paper addresses the automated extraction of Transformation of Audio Using MPEG-7: The musical meter from audio signals on three hierarchical SPOff and the MDTools—Emilia Gómez, Oscar levels, namely tempo, tatum, and measure length. The Celma, Emilia Gómez, Fabien Gouyon, Perfecto presented approach analyzes consecutive segments Herrera, Jordi Janer, David García, University of the audio signal equivalent to a few seconds length Pompeu Fabra, Barcelona, Spain each, and detects periodicities in the temporal progres- sion of the amplitude envelope in a range between In this workshop we will demonstrate three applica- 0.25 Hz and 10 Hz. The tatum period, beat period, and tions for content-based retrieval and transformations measure length are estimated in a probabilistic manner of audio recordings. They illustrate diverse aspects of from the periodicity function. The special advantages a common framework for music content description of the presented method reside in the ability to track and structuring implemented using the MPEG-7 stan- tempo also in music with strong syncopated rhythms, dard. MPEG-7 descriptions can be generated either and its computational efficiency. 406 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 25rd International Conference Program 2-2 Percussion-Related Semantic Descriptors of signal spectral shape.