Conference Report
Total Page:16
File Type:pdf, Size:1020Kb
AAEESS 2255tthh IINNTTEERRNNAATTIIOONNAALL CCOONNFFEERREENNCCEE Metadata for Audio London, UK June 17–19, 2004 John Grant, conference chair 952 J. Audio Eng. Soc., Vol. 52, No. 9, 2004 September etadata is the key to managing audio information and making sure that audiences know what is Mavailable in the way of audio program material. This was the theme for the AES 25th International Conference held from June 17–19 at Church House in London, drawing an audience of over 100 delegates from broadcasting, universities, digital libraries, and audio program-management operations from around the world. Chaired by John Grant, with papers cochairs Russell Mason and Gerhard Stoll, the conference spanned topics such as feature extraction, frameworks, broadcast implementations, libraries and archives, and delivery. ➥ I’M AN AUDIO ENGINEER, TEACH ME ABOUT METADATA! Since metadata is not audio at all, but rather structures for information about audio, it lies squarely in the domain of information professionals. For this reason audio engineers can find themselves at something of a loss when it comes to understanding what is going on in metadata systems and protocols. The metadata world is full of confusing acronyms like SMEF, P-META, MPEG-7, XML, AAF, MXF and so forth, representing a minefield for the unwary. Addressing the need for education in this important area, the first day of the conference was designed as a tutorial program, containing papers from key presenters on some of the basic principles of metadata. John Grant, conference chair, welcomed everyone and then introduced the BBC’s Chris Chambers who discussed identities and handling strategies for metadata, helping the conference delegates to understand the ways in which systems can find and associate the elements of a single program item. He outlined some of the standards being introduced by the industry to deal with this issue, neatly setting the scene for the following speakers who would describe the various approaches in greater detail. Philip de Nier from BBC R&D proceeded to describe the Advanced Authoring Format (AAF) and Material Exchange Format (MXF), which are both file formats for the storage and exchange of audio–visual material together with various forms of metadata describing content and composition. This was followed by an XML primer given by Adam Lindsay from Lancaster University, UK. XML is a very widely used means of encoding metadata within so-called schema. Continuing the theme of file formats, John Emmett proceed- ed to discuss ways of “keeping it simple,” describing the broadcast wave format (BWF) and AES31, which is an increasingly important standard for the exchange of audio content and edit lists, using BWF as an audio file structure. Russell Mason chaired Practical Schemes, the second session. Philippa Morrell, industry standards manager of the BOSS Federation in the UK, took delegates through the rela- tively obscure topic of registries.These are essentially ➥ Photos courtesy Paul Troughton (http://photos.troughton.org) J. Audio Eng. Soc., Vol. 52, No. 9, 2004 September 953 AES 25th INTERNATIONAL CONFERENCE INVITED AUTHORS Clockwise, from top left: Max Jacob, Jügren Herre, Emilia Gómez, Michael Casey, Wes Curtis, Oscar Celma, Richard Wright, and Alison McCarthy. databases that are administered centrally so as to avoid the managing large sound databases, given by Max Jacob of risk of duplication and consequent misidentification of IRCAM, Paris, and a second on integrating low-level meta- content. The issue of classification and taxonomy of sound data in multimedia database management systems, given by effects is also something that requires wide agreement, and Michael Casey of City University, London. Emilia Gómez, the management of this was outlined by Pedro Cano and his Oscar Celma, and colleagues from the University of colleagues from the Music Technology Group at the Pompeu Fabra described tools for content-based retrieval University of Pompeu Fabra in Spain. Cano spoke about a and transformation of audio using MPEG-7, concentrating classification scheme inspired by the MPEG-7 standard and on the so-called SPOffline and MDTools. SPOffline stands based on an existing lexical network called WordNet, in for Sound Palette Offline and is an application based on which an extended semantic network includes the semantic, MPEG-7 for editing, processing, and mixing intended for perceptual, and sound-effects-specific terms in an unam- sound designers and professional musicians. MDTools was biguous way. designed to help content providers in the creation and Richard Wright of BBC Information and Archives was production phases for metadata and annotation of multime- keen to point out that the term metadata is not used in the dia content. Finishing this session, Jürgen Herre of the same way by audio and video engineers as it is in standard Fraunhofer Institute in Germany gave a tour of the issues definitions. Metadata, correctly defined, is the structure of surrounding the use of MPEG-7 low-level scalability. information, or to more strictly refine the definition, the organization, naming, and relationships of the descriptive METADATA FRAMEWORKS AND TOOLKITS elements. In other words it is everything but data itself, it is The first main conference day following the tutorial beyond data. Describing so-called core metadata, he showed sessions began with a discussion of frameworks for meta- that this could be either the lowest common denominator or data. First, Andreas Ebner of IRT introduced data models a complete description that fits the most general case. The for audio and video production, and then Wes Curtis and first has limitations in what it can do and the second does Alison McCarthy of BBC TV outlined the EBU’s P/Meta not work; however, the first lends itself to standardization scheme for the exchange of program data between because it fits the requirements of being essential, general, businesses. simple, and popular. The Digital Media Project is led by Leonardo Chariglione, Ending the tutorial day, Geoffroy Peeters from IRCAM in mastermind of the MPEG standardization forums, and this France chaired a workshop on MPEG-7, a key international exciting project was reviewed at the conference by Richard standard involving metadata. There were four presentations Nicol from Cambridge-MIT Institute. He spoke about the on different aspects of the standard, including the first on need to preserve end-to-end management of digital content 954 J. Audio Eng. Soc., Vol. 52, No. 9, 2004 September AES 25th INTERNATIONAL CONFERENCE INVITED AUTHORS Clockwise, from top left: Philippa Morrell, Philip de Nier, Richard Nicol, Kate Grant, Chris Chambers, Andreas Ebner, John Emmett, and Adam Lindsay. in a world where such content is potentially widely distribut- described methods by which different musical features ed and easy access is needed to the information about it. can be analyzed and described, such as instrument Following this Kate Grant of Nine Tiles provided delegates recognition, song complexity, percussion information, with an interesting summary of the “what and why” of the and rhythmic meter. MPEG-21 standard, which is essentially a digital rights Some unusual presentations in this series included a management structure. “The 21st Century is a new means of retrieving spoken documents in accordance with Information Age, and the ability to create, access, and use the MPEG-7 standard, given by Nicolas Moreau and multimedia content anywhere at any time will be the fulfill- colleagues from the Technical University of Berlin, and an ment of the MPEG-21 vision,” she proclaimed. opera information system based on MPEG-7, presented by Turning to the issue of spatial scene description, Oscar Celma from Pompeu Fabra. Guillaume Potard of the University of Wollongong in Australia introduced an XML-based 3-D audio-scene meta- FEATURE EXTRACTION data scheme. He explained that existing 3-D audio stan- The topic of feature extraction is closely related to that of dards such as VRML (X3D) have very basic metadata. It also has a lot in common with the field of sound-description capabilities, but that MPEG-4 had semantic audio analysis—the automatic extraction of mean- powerful 3-D audio features. However, the problem is that ing from audio. Essentially, feature extraction is concerned these are based on a so-called “scene graph” with connect- with various methods for identifying important aspects of ed nodes and objects, which can get really complex even audio signals, either in the spectral, temporal, or spatial for simple scenes. Such structures are not optimal for meta- domains, searching for objects and their characteristics and data and cannot easily be searched. Consequently he bases describing them in some way. the proposed XML-based scheme on one from CSound, A variety of work in this field was described in papers using “orchestra” and “score” concepts, whereby the presented during the afternoon of the first conference day, orchestra describes the content or objects and the score many of which related to the analysis of music signals. For describes the temporal behavior of the objects. example, Claas Derboven of Fraunhofer presented a paper on a system for harmonic analysis of polyphonic music, POSTERS based on a statistical approach that examines the frequency A substantial number of poster papers were presented of occurrence of musical notes for determining the musical during the 25th International Conference, almost all key. Chords are determined by a pattern-matching algo- concerned with the topic of feature extraction and auto- rithm, and tonal components are determined by using a matic content recognition. Many of the posters nonuniform frequency transform. ➥ J. Audio Eng. Soc., Vol. 52, No. 9, 2004 September 955 AES 25th INTERNATIONAL CONFERENCE Excellent lecture facilities were provided by Church House; located in the heart of London, it also hosts the annual meeting of the Church of England. Bee-Suan Ong was interested to find out how representa- BROADCASTING IMPLEMENTATIONS tive sections of music can be identified automatically.