AES 25th International Conference Program for Audio 2004 June 17–19 London, UK

Technical Sessions

Thursday, June 17 what it can do for the user, and various ways that it TUTORIALS can be employed in the area of metadata for audio. The contents of this paper form the basis for a SESSION T-1: INTRODUCTION number of the papers that appear later in the conference. T1-1 Metadata, Identities, and Handling Strategies— Chris Chambers, BBC R&D, Tadworth, Surrey, UK T2-3 Keeping it Simple: BWF and AES31—John Emmett, (invited) Broadcast Project Research Ltd., Teddington, Middlesex, UK (invited) With all the potential media material and its associated metadata becoming accessible on IT-based systems, Digital audio is spreading outward to the furthest how are systems going to find and associate the ele- reaches of the broadcast chain. Making the best use of ments of any single item? How are the users going to the opportunities presented by this demands a stan- know they have the correct items when assembling dardization procedure that is adaptable to a vast num- audio, video, and information for use within a larger ber of past, present, and future digital audio formats project? This short talk will explore the way areas of and scenarios. In addition, would it not be just great if our industry are hoping to tackle the problem and it cost nothing? This paper will point out the benefits of some of the standards being introduced to ensure what we already have and tell a tale of borrowing management of this material is possible. economical audio technology from many sources.

T1-2 Before There Was Metadata—Mark Yonge (invited) Thursday, June 17 Audio has never existed in isolation. There has always SESSION T-3: PRACTICAL SCHEMES been a mass of associated information, both explicit and implicit, to direct, inform, and enhance the use of T3-1 The Role of Registries—Philippa Morrell, Metadata the audio. In the blithe days before information theory Associates Ltd., London, UK (invited) we didn’t know it was all metadata. This paper reviews Some forms of metadata, especially those that identify the extent of traditional metadata covering a range of objects or classes of objects, form classes of their own forms. Some of them may be surprising; all of them that need to be administered centrally in order to avoid need to be re-appraised in the light of newer, more for- the risk of duplication and consequent misidentifica- mal metadata schemes. tion. The concept of such a registry is not new; for example, International Book Numbers (ISBN) Thursday, June 17 derive from a central registry that was originally set up in 1970. The registry that ensures that every ethernet- SESSION T-2: FILE BASICS connected device in the world is uniquely identifiable is T2-1 Introduction to MXF and AAF—Philip DeNier, BBC another example. Formal identifiers and other metada- R&D, Tadworth, Surrey, UK (invited) ta for use in commercial transactions will increasingly use the services of one or more metadata registries, as The AAF and MXF file formats provide a means to this paper will discuss. exchange digital media along with a rich (extendible) set of metadata. This presentation will be a basic intro- T3-2 Sound Effect Taxonomy Management in duction into the content of these file formats and will Production Environments—Pedro Cano, Markus include a description of the metadata scheme used. Koppenberger, Perfecto Herrera, Oscar Celma, Universitat Pompeu Fabra, Barcelona, Spain T2-2 XML Primer—Claude Seyrat (invited) Categories or classification schemes offer ways of nav- Most audio professionals have heard of the term “XML” igating and having higher control over the search and but not many know for sure what it means or have yet retrieval of audio content. The MPEG-7 standard pro- had to work with it. This paper sets out what XML is, vides description mechanisms and ontology man- ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 405 25rd International Conference Program agement tools for multimedia documents. We have manually or automatically and are stored in an XML implemented a classification scheme for sound effects database. Retrieval services are implemented in the management inspired by the MPEG-7 standard on top database. A set of musical transformations are of an existing lexical network, WordNet. WordNet is a defined directly at the level of musically meaningful semantic network that organizes over 100,000 con- MPEG-7 descriptors and are automatically mapped cepts of the real world with links between them. We onto low-level audio signal transformations. Topics show how to extend WordNet with the concepts of the included in the presentation are: (1) Description gen- specific domain of sound effects. We review some of eration procedure, manual annotation of editorial the taxonomies to acoustically describe sounds. Mining description: the MDTools, automatic description of legacy metadata from sound effects libraries further audio recordings, the SPOffline; (2) Retrieval function- supplies us with terms. The extended semantic net- alities, local retrieval: SPOffline, remote retrieval: work includes the semantic, perceptual, and sound Web-based retrieval; and (3) Transformation utilities: effects specific terms in an unambiguous way. We the SPOffline. show the usefulness of the approach, easing the task for the librarian and providing higher control on the Using MPEG-7 Audio Low-Level Scalability: A search and retrieval for the user. Guided Tour—Jürgen Herre, Eric Allamanche, Fraunhofer IIS, Ilmenau, Germany T3-3 Dublin Core—R. Wright, BBC (invited) [Abstract Not Available at Press Time] Dublin Core metadata provides card catalog-like defini- tions for defining the properties of objects for Web- based resource discovery systems. The importance of Friday, June 18 the Dublin Core is its adoption as a basis for many CONFERENCE DAY 1 more elaborate schemes. When the view ahead is SESSION CD-1: FRAMEWORKS obscured by masses of local detail, a firm grasp of the Dublin Core will often reveal the real landscape. 1-1 Data Model for Audio/Video Production—A. Ebner, IRT, Munich, Germany Thursday, June 17 When changing from traditional production systems to WORKSHOP—MPEG-7 IT-based production systems the introduction and usage of metadata is unavoidable. Direct access of the Coordinator: G. Peeters, IRCAM, Paris, France information stored in IT-based systems is not possible. (in association with SAA TC) Descriptive and structural metadata are the enablers to have proper access of selected material. Metadata Managing Large Sound Databases Using MPEG— does not focus on descriptive information about the Max Jacob, IRCAM, Paris, France content only. It describes the usage of the material, the structure of a program, handling processes, relevant Sound databases are widely used for scientific, com- information, delivery information about properties, and mercial, and artistic purposes. Nevertheless there is yet storage of information. The basis to achieve a com- no standard way to manage them. This is due to the plete collection of metadata is a detailed analysis of a complexity of describing and indexing audio content and broadcaster's production processes and usage cases. to the variety of purposes a sound database might A logical data model expresses the relationship address. Recently there appeared MPEG-7, a standard between the information and is the foundation for for audio/visual content metadata that could be a good implementations that enable a controlled exchange starting point. MPEG-7 not only defines a set of descrip- and storage of metadata. tion tools but is more generally an open framework host- ing specific extensions for specific needs in a common environment. This is crucial since there would be no way 1-2 P-META: Program Data Exchange in Practice— to freeze in a monolithic definition all the possible needs Wes Curtis, BBC Television, London, UK (invited) of a sound database. This paper outlines how the [Abstract Not Available at Press Time] MPEG-7 framework can be used, how it can be extend- ed, and how all this can fit into an extensible database design, gathering three years of experience during the Friday, June 18 CUIDADO project at IRCAM. SESSION CD-2: POSTERS, PART 1 Integrating Low-Level Metadata in Multimedia 2-1 Low-Complexity Musical Meter Estimation from Database Management Systems—Michael Casey, Polyphonic Music—Christian Uhle1, Jan Rohden1, City University, London, UK Markus Cremer1, Jürgen Herre2 1Fraunhofer AEMT, Erlangen, Germany [Abstract Not Available at Press Time] 2Fraunhofer IIS, Ilmenau, Germany

Tools for Content-Based Retrieval and This paper addresses the automated extraction of Transformation of Audio Using MPEG-7: The musical meter from audio signals on three hierarchical SPOff and the MDTools—Emilia Gómez, Oscar levels, namely tempo, tatum, and measure length. The Celma, Emilia Gómez, Fabien Gouyon, Perfecto presented approach analyzes consecutive segments Herrera, Jordi Janer, David García, University of the audio signal equivalent to a few seconds length Pompeu Fabra, Barcelona, Spain each, and detects periodicities in the temporal progres- sion of the amplitude envelope in a range between In this workshop we will demonstrate three applica- 0.25 Hz and 10 Hz. The tatum period, beat period, and tions for content-based retrieval and transformations measure length are estimated in a probabilistic manner of audio recordings. They illustrate diverse aspects of from the periodicity function. The special advantages a common framework for music content description of the presented method reside in the ability to and structuring implemented using the MPEG-7 stan- tempo also in music with strong syncopated rhythms, dard. MPEG-7 descriptions can be generated either and its computational efficiency.

406 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 25rd International Conference Program 2-2 Percussion-Related Semantic Descriptors of signal spectral shape. The use of principal component Music Audio Files—Perfecto Herrera1, Vegard analysis in conjunction with support vector machine Sandvold2, Fabien Gouyon1 classification yields to nearly perfect recognition accu- 1Universitat Pompeu Fabra, Barcelona, Spain racy on varied musical solo phrases from ten instru- 2University of Oslo, Oslo, Sweden ments issued from different instrument families. Automatic extraction of semantic music content meta- data from polyphonic audio files has traditionally Friday, June 18 focused on melodic, rhythmic, and harmonic aspects. SESSION CD-3: TOOLKITS In the present paper we will present several music con- tent descriptors that are related to percussion instru- 3-1 Digital Media Project—R. Nicol, BT, Ipswich, UK mentation. The “percussion index” estimates the (invited) amount of percussion that can be found in a music [Abstract Not Available at Press Time] audio file and yields a (numerical or categorical) value that represents the amount of percussion detected in 3-2 MPEG-21: What and Why—Jan Bormans1, Kate the file. A further refinement is the “percussion profile,” Grant2 (invited) which roughly indicates the existing balance between 1IMEC, Leuven, Belgium drums and cymbals. We finally present the “percusive- 2Nine Tiles, Cambridge, UK ness” descriptor, which represents the overall impul- siveness or abruptness of the percussive events. Data The MPEG-21 vision is to define a multimedia frame- from initial evaluations, both objective (i.e., errors, work to enable transparent and augmented use of mul- misses, false alarms) and subjective (usability, useful- timedia resources across a wide range of networks ness) will also be presented and discussed. and devices used by different communities. The tech- nical report “Vision, Technologies and Strategy” 2-3 Tonal Description of Polyphonic Audio for Music describes the two basic building blocks: the definition Content Processing—Emilia Gómez, Perfecto of a fundamental unit of distribution and transaction Herrera, Universitat Pompeu Fabra, Barcelona, Spain (the digital item) and the concept of users interacting with digital items. The digital items can be considered The purpose of this paper is to describe a system that the “what” of the multimedia framework (e.g., a video automatically extracts metadata from polyphonic audio collection, a music album), and the users can be con- signals. This metadata describes the tonal aspects of sidered the “who” of the multimedia framework. MPEG- music. We use a set of features to estimate the key of 21 is developing a number of specifications the piece and to represent its tonal structure, but they enabling the integration of components and standards could also be used to measure the tonal similarity to facilitate harmonisation of “technologies” for the cre- between two songs and to perform some key-based ation, modification, management, transport, manipula- segmentation or establish a tonal structure of a piece. tion, distribution, and consumption of digital items. This paper will explain the relationship of the different 2-4 Phone-Based Spoken Document Retrieval in MPEG-21 specifications by describing a detailed use- Conformance with the MPEG-7 Standard—Nicolas case scenario. Moreau, Hyoung-Gook Kim, Thomas Sikora, Technical University of Berlin, Berlin, Germany 3-3 A 3-D Audio Scene Description Scheme Based on XML—Guillaume Potard, Ian Burnett, University of This paper presents a phone-based approach of spo- Wollongong, NSW, Australia ken document retrieval, developed in the framework of the emerging MPEG-7 standard. The audio part of An object-oriented schema for describing time-vary- MPEG-7 encloses a SpokenContent tool that provides ing 3-D audio scenes is proposed. The creation of a standardized description of the content of spoken this schema was motivated by the fact that current documents. In the context of MPEG-7, we propose an virtual reality description schemes (VRLM, X3D) have indexing and retrieval method that uses phonetic infor- only basic 3-D audio description capabilities. In con- mation only and a vector space IR model. Experiments trast, MPEG-4 AudioBIFs have advanced 3-D audio are conducted on a database of German spoken docu- features but are not designed as a metadata lan- ments with ten city name queries. Two phone-based guage. MPEG-4 BIFs are particularly targeted as a retrieval approaches are presented and combined. The binary scene description language for scene render- first one is based on the combination of phone ing purposes only. Our proposed 3-D audio scene N-grams of different lengths used as indexing terms. description schema features state-of-the art 3-D The other consists of expanding the document repre- audio description capabilities while being usable both sentation thanks to the phone confusion probabilities. as a metadata scheme for describing 3-D audio con- tent (for example, 5.1 or Ambisonics B-format) and as 2-5 Efficient Features for Musical Instrument a format for scene rendering. Recognition on Solo Performances—Slim Essid, Gaël Richard, Bertrand David, GET-Télécom Paris Friday, June 18 (ENST), Paris, France SESSION CD-4: FEATURE EXTRACTION, SESSION A Musical instrument recognition is one of the important 4-1 A System for Harmonic Analysis of Polyphonic goals of musical signal indexing. While much effort has Music—Claas Derboven, Markus Cremer, Fraunhofer already been dedicated to such a task, most studies IIS AEMT, Ilmenau, Germany were based on limited amounts of data that often included only isolated musical notes. In this paper we A system for harmonic analysis of polyphonic musical address musical instrument recognition on real solo signals is presented. The system uses a transform with performance based on larger training and test sets. A a nonuniform frequency resolution for the extraction of highly efficient set of features is proposed that is prominent tonal components and determines the key obtained from signal cepstrum but also from spectrum and the contained chords of a musical input signal with low- and higher-order statistical moments describing high accuracy. A statistical approach based on the ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 407 25rd International Conference Program frequency of occurrence of musical notes for determin- the performance of MPEG-7 audio spectrum projection ing the key is described. An algorithmic solution for (ASP) features based on several basis decomposition chord determination is presented with a concise expla- algorithms vs. mel-scale frequency cepstrum coeffi- nation. Finally, a qualitative evaluation of the system’s cients (MFCC). For basis decomposition in the feature performance is conducted to demonstrate the applica- extraction we evaluate three approaches: principal com- bility to real-world audio signals. ponent analysis (PCA), independent component analy- sis (ICA), and non-negative matrix factorization (NMF). 4-2 Robust Identification of Time-Scaled Audio—Rolf Audio features are computed from these reduced vec- Bardeli, Frank Kurth, University of Bonn, Bonn, Germany tors and are fed into a hidden Markov model (HMM) classifier. We found that established MFCC features Automatic identification of audio titles on radio broad- yield better performance compared to MPEG-7 ASP in casts is a first step toward automatic annotation of general sound recognition under practical constraints. radio programs. Systems designed for the purpose of identification have to deal with a variety of postpro- 5-3 Automatic Optimization of a Musical Similarity cessing potentially imposed on audio material at the Metric Using Similarity Pairs—Thorsten Kastner, radio stations. One of the more difficult techniques to Eric Allamanche, Oliver Hellmuth, Christian Ertel, be handled is time-scaling, i.e., the variation of play- Marion Schalek, Jürgen Herre, Fraunhofer IIS, back speed. In this paper we propose a robust finger- Ilmenau, Germany printing technique designed for the identification of time-scaled audio data. This technique has been With the growing amount of multimedia data available applied as a feature extractor to an algebraic indexing everywhere and the necessity to provide efficient meth- technique that has already been successfully applied ods for browsing and indexing this plethora of to the task of audio identification. audio content, automated musical similarity search and retrieval has gained considerable attention in recent 4-3 Computing Structural Descriptions of Music years. We present a system which combines a set of through the Identification of Representative perceptual low-level features with appropriate classifica- Excerpts from Audio Files—Bee Suan Ong, tion schemes for the task of retrieving similar sounding Perfecto Herrera, Universitat Pompeu Fabra, songs in a database. A methodology for analyzing the Barcelona, Spain classification results to avoid time consuming subjective listening tests for an optimum feature selection and com- With the rapid growth of audio databases, many music bination is shown. It is based on a calculated “similarity retrieval applications have employed metadata index” that reflects the similarity between specifically em- descriptions to facilitate better handling of huge data- bedded similarity pairs. The system’s performance as bases. Music structure creates the uniqueness identity well as the usefulness of the analyzing methodology is for each musical piece. Therefore, structural descrip- evaluated through a subjective listening test. tion is capable of providing a powerful way of interact- ing with audio content and serves as a link between 5-4 Automatic Extraction of MPEG-7 Metadata for low-level description and higher-level descriptions of Audio Using the Media Asset Management System audio (e.g., audio summarization, audio fingerprinting, iFinder—Jobst Löffler, Joachim Köhler, Fraunhofer etc.). Identification of representative musical excerpts IMK, Sankt Augustin, Germany is the primary step toward the goal of generating struc- tural descriptions of audio signals. In this paper we dis- This paper describes the MPEG-7 compliant media cuss various approaches in identifying representative asset management system iFinder, which provides a musical excerpts of music audio signals and propose set of automatic methods and software tools for media to classify them into a few categories. Pros and cons of analysis, archiving, and retrieval. The core technology each approach will also be discussed. of iFinder comprises several modules for audio and video metadata extraction that are bundled in the Friday, June 18 iFinderSDK, a commercial product offered to the media industry. The workflow for audio content processing SESSION CD-5: POSTERS, PART 2 together with pattern recognition methods used will be 5-1 Toward Describing Perceived Complexity of presented. Of special note, a technique for precise Songs: Computational Methods and audio-text alignment together with a browser applica- Implementation—Sebastian Streich, Perfecto tion for synchronized display of retrieval results will be Herrera, Universitat Pompeu Fabra, Barcelona, Spain demonstrated. An insight to using MPEG-7 as a stan- dardized metadata format for media asset manage- Providing valuable semantic descriptors of multimedia ment will be provided from a practical point of view. content is a topic of high interest in current research. Such descriptors should merge the two predicates of 5-5 An Opera Information System Based on MPEG-7— being useful for retrieval and being automatically Oscar Celma Herrada, Enric Mieza, Universitat extractable from the source. In this paper the semantic Pompeu Fabra, Barcelona, Spain descriptor concept of music complexity is introduced. Its benefit for music retrieval and automated music recom- We present an implementation of the MPEG-7 stan- mendation is addressed. The authors provide a critical dard for a multimedia content description of lyric opera review of existing methods and a detailed prospect of in the context of the European IST project: OpenDra- new methods for automated music complexity estimation. ma. The project goals are the definition, development, and integration of a novel platform to author and deliv- 5-2 How Efficient Is MPEG-7 for General Sound er the rich cross-media digital objects of lyric opera. Recognition?—Hyoung-Gook Kim, Juan José MPEG-7 has been used in OpenDrama as the base Burred, Thomas Sikora, Technical University Berlin, technology for a music information retrieval system. In Berlin, Germany addition to the MPEG-7 multimedia description scheme, different classification schemes have been Our challenge is to analyze/classify video sound track proposed to deal with operatic concepts such as musi- content for indexing purposes. To this end we compare cal forms (acts, scenes, frames, introduction, etc.),

408 J. Audio Eng. Soc., Vol. 52, No. 3, 2004 April 25rd International Conference Program musical indications (piano, forte, ritardando, etc.), and Despite the subjective nature of associating a certain genre and creator roles (singers, musicians, production song or artist with a specific musical genre, this type staff, etc.). Moreover, this project has covered the of characterization is frequently used to provide a con- development of an authoring tool for an MPEG-7 stan- venient way of expressing very coarse information on dard, namely MDTools, which includes segmentation, the basic stylistic and rhythmic elements and/or instru- classification scheme generation, creation and produc- mentation of a song. An audio database that is struc- tion, and media information descriptors. tured according to different musical genres is a first important step to provide an easy/intuitive access to a 5-6 Morphological Sound Description: Computational large music collection. Thus, a convenient way for Model and Usability Evaluation—Julien Ricard, indexing large databases by musical genre is desired. Perfecto Herrera, Universitat Pompeu Fabra, This paper describes a system for an automatic genre Barcelona, Spain classification into several musical genres. Different features as well as classification strategies will be Sound samples of metadata are usually limited to low- evaluated and compared. The system’s performance level descriptors and a source label. In the context of is assessed by means of a subjective listening test. sound retrieval only the latter is used as a search cri- terion, which makes the retrieval of sounds having no identifiable source (abstract sounds) a difficult task. Saturday, June 19 We propose a description framework focusing on Conference Day 2 intrinsic perceptual sound qualities, based on Schaef- SESSION CD-7: BROADCAST IMPLEMENTATIONS, fer’s research on sound objects that could be used to SESSION A represent and retrieve abstract sounds and to refine a traditional search by source for nonabstract sounds. 7-1 Audio Metadata in Radio Broadcasting— Shigeru We show that some perceptual labels can be auto- Aoki1, Masahito Kawamori2 matically extracted with good performance, avoiding 1TokyoFM Broadcasting, Tokyo, Japan the time-consuming manual labeling task, and that the 2NTT, Tokyo, Japan resulting representation is evaluated as useful and usable by a pool of users. Generally an audio sequence or program is produced by DAW (Digital Audio Workstation) and delivered as a dig- ital audio file. However, the descriptive data of the audio Friday, June 18 program, such as the cue sheet of the radio program, is SESSION CD-6: FEATURE EXTRACTION, SESSION B transferred apart from the audio file. The content descriptive data is commonly known as metadata. The 6-1 Drum Pattern-Based Genre Classification from most effective method to transfer the audio data and the Popular Music—Christian Uhle, Christian Dittmar, metadata is to embed those as one digital file that an Fraunhofer AEMT, Ilmenau, Germany audio player plays and offer the description of that audio sequence simultaneously. This paper describes the for- This paper addresses the identification of drum patterns mat and scheme of the audio file with metadata. and the classification of their musical genres. The drum patterns are estimated from audio data automatically. This process involves the transcription of percussive 7-2 Integrated Metadata in the Broadcast Environment 1 2 unpitched instruments with a method based on indepen- —Joe Bull , Kai-Uwe Kaup 1 dent subspace analysis and a robust estimation of the SADiE UK, Cambridgeshire, UK 2 tatum grid and the musical meter. The rhythmic patterns VCS Aktiengesellschaft, Bochum, Germany are identified from pattern histograms, describing the In a modern broadcast environment, efficient and effec- frequency of occurrence of the percussive events. The tive handling of metadata becomes more important classification procedure evaluates the meter information, every day. Much time and money can be wasted reen- the pattern histogram as well as other high-level rhyth- tering data that is already present in the digital domain. mic features derived from the estimated drum pattern. This money could be better spent on program-making. The authors will describe practical examples of how this 6-2 Assessing the Relevance of Rhythmic Descriptors can be achieved in a real broadcast environment using in a Musical Genre Classification Task—Fabien real products in use or in development. Gouyon1, Simon Dixon2, Elias Pampalk2, Gerhard Widmer2 1Universitat Pompeu Fabra, Barcelona, Spain Saturday, June 19 2Austrian Research Institute for AI, Vienna, Austria SESSION CD-8: BROADCAST IMPLEMENTATIONS Organizing or browsing music collections in a musically SESSION B meaningful way calls for tagging the data in terms of, e.g., 8-1 Broadcast Wave and AES Audio in MXF—Bruce rhythmic, melodic or harmonic aspects, among others. In Devlin, Snell & Wilcox some cases, such metadata can be extracted automati- cally from musical files; in others, a trained listener must The SMPTE has established MXF is the new open stan- extract it by hand. In this paper we consider a specific set dard for interchange in the broadcast world. of rhythmic descriptors for which we provide procedures of One important aspect of the standard is audio mapping. automatic extraction from audio signals. Evaluating the rel- This paper will be a basic tutorial on how MXF and the evance of such descriptors is a difficult task that can easily audio mapping standard work. It will include issues of become highly subjective. To avoid this pitfall, we as- physically interleaving audio and video as well as adding sessed the relevance of these descriptors by measuring rich metadata using the MXF data model. their rate of success in genre classification experiments. 8-2 The Advanced Authoring Format and its Relevance 6-3 Music Genre Estimation from Low-Level Audio to the Exchange of Audio Editing Decisions—David Features—Oliver Hellmuth, Eric Allamanche, McLeish1, Phil Tudor2 Thorsten Kastner, Ralf Wistorf, Nicolas Lefebvre, 1SADiE, Cambridgeshire, UK, Jürgen Herre, Fraunhofer IIS, Ilmenau, Germany 2BBC R&D, Tadworth, Surrey, UK ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 409 25rd International Conference Program This paper explores how the Advanced Authoring tives in an integrated world of interconnected databas- Format (AAF) model, a vehicle for exchanging es coming into all audio-related companies. metadata and media-rich content, can be used to describe audio program compositions so that they Saturday, June 19 can be more seamlessly exchanged between audio editors who work with tools designed by different SESSION CD-10: DELIVERY OF AUDIO manufacturers. In addition, the extensibility of the 10-1 Watermarking and Copy Protection by Information format is discussed as a means of looking at its fu- Hiding in Soundtracks—Tim Jackson, Keith Yates, ture potential. Francis Li, Manchester Metropolitan University, Manchester, UK Saturday, June 19 In this paper digital audio watermarking techniques SESSION CD-9: LIBRARIES AND ARCHIVES are reviewed and categorized. Applications of water- marking schemes are discussed, and their capabili- 9-1 Development of a Digital Preservation Program at ties and limitations clarified in the context of audio the Library of Congress—Carl Fleischhauer, copyright management and copy protection. Tradi- Samuel Brylawski, Library of Congress, Washington, tional watermarking schemes embed hidden signa- DC, USA tures in soundtracks and are found to be effective in ownership authentication and copyright manage- This paper will trace the development of a digital ment. Nevertheless, they do not prevent unautho- preservation program for sound recordings at the rized copying unless dedicated watermark detectors Library of Congress. It will outline the Library’s use of are added to the recording devices. Purpose-chosen METS (Metadata Encoding and Transmission Stan- hidden signals are known to interfere with some dard); survey the challenges faced in the Library’s recording devices, for example, magnetic tape work to create digital objects for public use, comprised recorders, offering a potential solution to copy pro- of sound files and images of packaging and accompa- tection. It is therefore reasonable to postulate that nying materials; and review the tools and methods watermarking techniques could be extended to the utilized to create metatdata. general audio copy protection without the resort to dedicated detectors. 9-2 Audio Metadata Used in the Radio Nacional de España Sound Archive Project—Miguel Rodeno1, 10-2 Metadata Requirements for Enabling the On-line Jesus Nicolas2, Isabel Diaz2 Music Industry’s New Business Models and 1Alcala University, Madrid, Spain Pricing Structures—Nicolas Sincaglia, MusicNow 2Radio Nacional de España, Madrid, Spain Inc., Chicago, IL, USA The 20th century Spanish sound history has been pre- The music industry has begun selling and distributing its served in digital format and may now be consulted media assets online. Online music distribution is vastly online through the Internet. This a pioneer project in the different from the normal means of media distribution. broadcasting industry around the world finished in These untested methods of music sales and distribution December 2002. The archive is considered the most require experimentation in order to determine which important audio archive in the Spanish language in the business models and pricing tiers will most resonate world. This paper describes the metadata used in this with the consumer. This translates into the need for ver- project. Radio Nacional de España followed the 1997/98 satile and robust data models to enable these market tri- European Broadcasting Union (EBU) standard for the als. Copyright owners and media companies require interchange of audio files and their broadcasting: the well-designed data structures to enable them to transmit Broadcast Wave Fornmat (BWF). The voice/word and and receive these complicated sets of business rules. the different kinds of music (classical, light, international This metadata is an essential part of an overall digital or Spanish) have different types of metadata. Some rights management system to control and limit access to examples are shown with the detailed metadata. the associated media assets.

9-3 Integration of Audio Computer Systems 10-3 Audio Meta Data Generation for the Continuous and Archives Via the SAM/EBU Dublin Core Media Web—Claudia Schremmer1, Steve Cassidy2, Standard,Tech.doc 3293—Lars Jonsson1, Gunnar Silvia Pfeiffer1 Dahl2 1CSIRO, Epping, NSW, Australia 1Swedish Radio 2Macquarie University, Sydney, Australia 2KSAD, Norsk Rikskringkasting, Oslo, Sweden The Continuous Media Web (CMWeb) integrates time- Dublin Core is a well-known metadata initiative from continuous media into the searching, linking, and brows- W3C that has been widely spread and used for text ing function of the World Wide Web. The file format un- and Web pages on the Internet. The Scandinavian derlying the CMWeb technology, Annodex, streams the SAM-group, with 25 archive specialists and engineers media content multiplexed with metadata in CMML for- have defined semantic definitions and converted the mat that contains information relevant to the whole media commonly used Dublin Core initiative for general use file (e.g., title, author, language) as well as time-sensitive within the audio industry. The 15 basic elements of information (e.g., topics, speakers, time-sensitive hyper- Dublin Core and new subsets have proven to cover links). This paper discusses the problem of generating most of the tape protocols and database fields existing Annodex streams from complex linguistic annotations: in broadcast production chain from early capturing over annotated recordings collected for use in linguistic various types of production and all the way to distribu- research. We are particularly interested in tion and archiving. This presentation covers some automatically annotated recordings of meetings and tele- examples of the use of metadata transfer with Dublin conferences and see automatically-generated CMML Core expressed in XML in Sweden and Norway. It files as one way of viewing such recordings. The paper ends in a discussion of the future possibilities of Dublin presents some experiments with generating Annodex Core in comparison with other existing metadata initia- files from hand annotated meeting recordings.

410 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April