Issues Surrounding the Preservation of Digital Music Documents

BRENT LEE

RÉSUMÉ Les défis que pose la conservation des musicales ont été compli- qués au cours des dernières décennies par la création de documents sous forme numérique. Cette révolution numérique a affecté toutes les étapes de la création musicale traditionnelle, depuis l’ébauche et la notation des compositions jusqu’à l’enregistrement des performances. De plus, les nouvelles formes de musique unique- ment numériques impliquant la composition à l’aide d’algorythmes informatiques, d’environnements interactifs et de synthèse de son numérique ont amené la création de nouvelles variétés de documents numériques. Cet article décrit les différents types de documents qui sont produits aujourd’hui dans le cours du processus de création musicale et expose les défis que pose le passage du temps sur ces documents. Il s’arrête plus spécifiquement sur la capacité de les lire (comment savoir si nous pourrons récupérer correctement les documents numériques?), leur compréhension (comment pourrons-nous savoir ce que ces documents signifient exactement?), de même que sur l’adéquation entre leur représentation et leur authenticité (comment savoir que la fiabilité, l’identité et l’intégrité des documents n’ont pas été compromis d’une façon ou d’une autre?). Comprendre ces défis est crucial pour la conservation de notre culture musicale contemporaine.

ABSTRACT The challenges of preserving musical archives have been complicated over the last few decades by the generation of documents in digital form. The digital revolution has affected all stages of traditional musical creation, from the sketching and notation of compositions to recordings of performances; also, new forms of uniquely digital music involving computer-aided algorithmic composition, interactive environments, and digital sound synthesis have created corresponding new varieties of digital documents. This study describes the types of documents currently generated in the process of making music and then articulates the challenges the passage of time poses for these documents, specifically: their readability (how do we know we will be able to retrieve the digital documents?) and intelligibility (how will we know what the documents mean?), as well as adequacy of representation and authenticity (how will we know that the reliability, identity, and integrity of the document has not been com- promised in some way?). Understanding these challenges is crucial for the preservation of our contemporary musical culture.

Introduction

Archives of traditional musical documents have enjoyed the same relative 194 Archivaria 50 stability as most other archives; the long shelf life and handling ease of paper have allowed for the preservation1 of documents over centuries. Still today, hitherto unknown manuscripts of even the most famous composers occasionally resurface in the archives of various individuals and institutions, such as the recent discovery of a string quartet movement by Beethoven in the possession of the family of John Ford, for whom the work was composed in 1817.2 The advent of recording technology in the twentieth century engendered a new type of musical document, the sound recording. The wide variety of processes and media developed for the recording of sound has proven to be problematic for and librarians, both within public institutions and in the private sector. Record companies are only now developing strategies to cope with the imminent disintegration of their analogue master tapes, and the extent to which recorded material has already been compromised or completely lost is not yet known.3 The challenges of preserving musical archives have been complicated over the last few decades by the generation of documents in digital form. The digital revolution has affected all stages of traditional musical creation, from the sketching and notation of compositions to recordings of performances; also, new forms of uniquely digital music involving computer-aided algo- rithmic composition, interactive environments, and digital sound synthesis have created corresponding new varieties of digital documents. This study describes the types of documents currently generated in the process of making music and then articulates the challenges the passage of time poses for these documents, specifically: their readability (how do we know we will be able to retrieve the digital documents?), intelligibility (how will we know what the documents mean?), as well as adequacy of representation and authenticity (how will we know that the reliability, integrity, and identity of the document have not been compromised in some way?). Understanding these challenges is crucial for the preservation of our contemporary musical culture.

Data Formats

Data formats designed to represent notated music (scores and sketches), audio recordings of musical performances, and other miscellaneous aspects of musical composition have proliferated over the last several decades. While a

1 I am using the term “to preserve” in a broad sense, i.e., to permanently maintain. 2 Abigail Frymann, “New Beethoven Piece Discovered in Cornwall,” Gramophone 77, no. 922 (December 1999), p. 23. 3 For a further discussion of efforts on the part of the recording industry to preserve their archives, see Bill Holland, “A Management/Preservation Scorecard,” Billboard – The Interna- tional Newsweekly of Music, Video, and Home Entertainment 111, no. 45 (6 November 1999), p. 92. Issues Surrounding the Preservation of Digital Music Documents 195 few data formats have (at least for a period of time) been adopted as industry standards, musical archival documents have been generated in a bewildering array of formats. Their diversity arises from differences among the various computers used to create the music, among the particular operating systems (and versions of these systems) that reside on computers, among the various music software packages for these systems, and among the variety of file interchange formats developed to cope with these differences. The digital representation of music can be broken down into three broad categories based on the type of information represented. The first category includes file formats that represent actual sound (digital recordings), while the second includes formats that represent notated music (notation files). A third category includes formats that represent neither notated nor recorded music, but serve to control computer operations that could then generate notation or sound.

Digital Recording (Audio) Formats

Digital audio files have one thing in common: they all contain a stream of numbers that represent changes in the amplitude of sound pressure over time. When sound is recorded digitally, a measuring device records the amplitude of the sound thousands of times each second. Each of these measurements is called a sample. The frequency at which samples are measured is called the sampling rate, and is described in samples/second. Thus a sampling rate of 44.1KHz (the rate used for CDs) means that the sound amplitude was mea- sured and recorded 44,100 times each second. The higher the sample rate, the better the sound quality, as it gives a more detailed representation of the sound. For example, imagine a curve being represented by the series of numbers 1, 2, 5, 9, 11, 13, 16, 13, 10, 8, 6, 4, 3, 2, 1. If one plotted these numbers on graph paper and connected the dots, one could reconstruct the curve. One could represent the same curve by a series comprising every second number of the original series – 1, 5, 11, 16, 10, 6, 3, 1 – but this reconstruction would not be as detailed as the one with more numbers. Sampling rate is only one of several variable elements that affect the structure of an audio file. Others, which will be explained below, include the precision of the measurement or sample size, the number of channels, the encoding algorithm, the type of compression used (if any), and possibly commands and/or information useful to the operating system for which the file format was developed. The sample size used in the recording of a digital sound document also affects the recording’s fidelity to the original sound; the larger the sample size, the more precise the measurement. The most common sample sizes are 8-bit (a scale of 0 to 255) and 16-bit (a scale of 0 to 65535), though 24-bit (a scale of 0 to 16777215) is becoming an industry standard for professional audio. 196 Archivaria 50

Most of the other variable elements mentioned above affect the file format (the way the information is stored) more than the quality of sound. For example, many audio file formats allow for a variable number of channels. A file could be monophonic (one channel), stereophonic (two channels), or any number of discrete channels. The samples themselves can be encoded in different ways; most encoding schemes are linear, but some are logarithmic. In some encoding schemes, the samples are interpreted as signed or unsigned integers; an 8-bit recording may represent values from zero to 255 or –128 to +127. In addition, some file formats (like MP3) use a compression scheme to greatly reduce the file size. Consider again the series of numbers used in the sampling example: 1, 2, 5, 9, 11, 13, 16, 13, 10, 8, 6, 4, 3, 2, 1. This same series could be encoded with numbers which represent the change from one measurement to the next: +1, +3, +4, +2, +3, –3, –3, –2, –2, –2, –1, –1, –1. In this second set, the numbers are much smaller, and can thus be stored in a smaller file. Lastly, audio files generally contain commands and informa- tion specific to the systems to which they belong. Such files are only readable by certain computer systems. The variety of operating systems used to create music accounts for the wide array of file formats developed for use within these systems, including AU (Macintosh), WAV (Windows), SND (Amiga), and AVR (Atari). The AIFF (Audio Interchange File Format) was developed by Apple in the late 1980s in accordance to the EA IFF 85 standard developed by Electronic Arts; due to its flexibility and non-proprietary nature, AIFF has since become a widely used format for audio files. A number of utility programmes will easily convert files from one format to another. The differences in file format are normally encoded in a header at the beginning of the file that describes the status of all of the above-mentioned variable elements. To some extent, knowledge of the various file formats used can help to determine the chronology of files in an archives. Some formats have become obsolete as technology improved, and some operating systems have decreased in popularity. For example, if one audio file was recorded in 8-bit 32K mono, and a second similar file was recorded in 16-bit, 44.1K stereo, the first is most likely older than the second.4

Notation Formats

File formats that are used to represent the notation of music are graphical in nature, typically using sets of music character fonts to draw music on a screen

4 A more complete description of the variety of file formats and their technical specifications can be found at the Audio File Formats FAQ (Frequently Asked Questions) created by Chris Bagwell: (last visited 18 January 2001). Issues Surrounding the Preservation of Digital Music Documents 197

and then to print music. Some aspects of music notation (such as phrase markings, beams, and layout) must be calculated by the programme in much the same manner as conventional graphics software. Nearly all music notation programmes allow for file playback via Musical Instrument Digital Interface (MIDI), which will be discussed later. Numerous programmes for the notation of music have been developed for personal computers over the last twenty years. Until recently, file formats were software-specific, although a handful of unsuccessful attempts were made to create a standard interchange format. With the advent of music scanning software and the World Wide Web, a number of new initiatives have appeared in the last decade to establish an accepted file exchange format. There are currently several such formats proposed, the most prominent being: NIFF (Notation Interchange File Format, based on Microsoft’s RIFF), GUIDO (not an acronym, but a format which uses ASCII characters in a human- readable way), and SMDL (Standard Music Description Language, based on SGML [Standard Generalized Markup Language]). At the moment, most composers using notation packages store their files in proprietary formats (Finale, Sibelius, NoteAbility, etc.). While archival documents are generated using these formats, their long-term stability is extremely suspect; as such, the migration of information from a proprietary format to a standard format may be a necessary aspect of preservation.

Control Formats

The third broad category of music-related file formats involve those used and generated by various types of music software in the process of creating notation or sound. While published scores and recordings in digital format may conceivably fall outside the scope of a musician’s archives, files in control formats are almost exclusively archival material. Perhaps the most ubiquitous music file type is the MIDI (Musical Instrument Digital Interface) file. MIDI was developed as a communications protocol in the early 1980s by synthesizer manufacturers interested in allowing one digital synthesizer to control (play) the synthesized sounds stored in another synthesizer. When the protocol is used in performance situations it need not entail the creation of MIDI files, but MIDI-encoded information is commonly stored on computers for playback at a later time. Software programmes that record and play back performance MIDI data are called sequencers, and the individual MIDI files created are called sequences. MIDI sequences contain less information about a piece of music than a score; they usually specify only the pitches to be played, their timing, and their loudness. While MIDI can also be used to instruct synthesizers to switch from one sound (patch) to another, to add vibrato, sustain pedal, etc., the way each MIDI event actually sounds depends on the synthesizer that receives the MIDI 198 Archivaria 50 instructions. The same MIDI sequence will thus sound different when played back through different synthesizers. Composers generally use MIDI devices to play back compositions in progress, or as instruments in an audio recording.5 Other control formats fall into four categories: software synthesis, algorith- mic composition files, synthesizer patches and samples, and audio editing files. Software synthesis is the use of a computer’s processing power to create digital audio data by mathematical formulas. Examples include FM (frequency modulation) synthesis, additive synthesis, and granular synthesis. A variety of software packages (CSound, Common Lisp Music, Cmix) allow the user to specify the synthesis method. In each case, a certain number of constraints must be defined by the composer; this information is stored in either text files or in proprietary formats. For example, in using CSound to generate a record- ing, the composer will specify both the synthesis variables (which determine the timbre of the sound) and event information (which specifies pitches and rhythms) in standard text files. Other files may also be used in the synthesis such as samples, filter descriptions, and spectral analyses.6 Algorithmic composition allows a computer to make compositional deci- sions based on rules predetermined by the composer, or on input received during the running of the software. Programmers have created rule bases for compositions in the style of Palestrina, Bach, Mozart, Bartók, and other composers. Similarly, some composers create rules to generate original com- positions. Like synthesis software, algorithmic programmes require input files in text or proprietary formats; they output files in audio, MIDI, or graphical formats.7 One commercial example of algorithmic composition software is Band-in-a-Box, with which the user specifies metre, chords, tempo, and musical style. The software then generates MIDI data based on algorithms associated with that style. Many composers who use algorithms generate notation or MIDI files and then cobble together the final score from the passages they find most success- ful. In effect, these files function as sketches or first drafts. As such, the algorithms that generated files do not form part of the completed work, but they do document a phase of the compositional process. These documents may pose the most profound problems for archivists, as the programmes often

5 Although few professional composers create MIDI files in lieu of scores or audio recordings, amateur composers often create and exchange MIDI files representing their work. 6 A general introduction to music synthesis on computers may be found in Charles Dodge and Thomas A. Jerse, Computer Music: Synthesis, Composition, and Performance (New York, 1985), chapters 3-6. 7 For a more detailed history of the development of algorithmic composition, see the first chapter of David Cope, Computers and Musical Style (Madison, 1991). Issues Surrounding the Preservation of Digital Music Documents 199 rely on formatted audio or MIDI input as well as specific hardware to func- tion; their preservation will rely on our ability to recreate their functionality. In other words, without the algorithmic composition software, the formatted audio or MIDI input, and the requisite hardware, the algorithms will not function properly, if at all. Synthesizer patches (with names like “violin” or “cheezy synth”) are files that contain the information needed to recreate a sound on a particular synthe- sizer. Composers create patches as part of an electroacoustic composition. Patches specify the timbral quality of a sound, but not its pitch or timing; these aspects are controlled by MIDI sequencers, as explained above. Obvi- ously, synthesizer patches are meaningless without the synthesizer for which they were created. Many synthesizer patches are created from short recordings of acoustic instruments lasting from a fraction of a second to several seconds; confusingly, these short recordings are also called samples. Like other audio files, samples (in the context of synthesis) comprise a header with information about sample rate, bit-depth, etc., followed by data representing changes in sound pressure. While these sample files may be used by general purpose computers, they are more commonly used by samplers, which are digital devices designed for the recording, editing, and playing of samples. Each sampler has a proprietary format for its files, but samples lend themselves more easily to migration than synthesizer patches because converting files from one sample format to another is only a matter of editing the header information. In the last category of control files are the numerous files created in the process of editing digital audio files. These edits might include splices, fade- ins, and audio processing (reverb, chorus, etc.). Most editing programmes allow for non-destructive editing of the original audio file: small files describ- ing each edit are created so that the original remains unaltered. The edits and the audio file to which they apply are kept together and constitute a documen- tary history of the process of making a recording. Composers who routinely use computers in the course of their work typi- cally maintain a (nearly invariably chaotic) personal archives of files in each of the three broad categories. As the first generation of these composers donate or bequeath their archives to archival institutions, strategies for the preservation of this material must be developed.

Challenges of Preserving Digital Music Documents

The preservation of digital music documents poses a number of major chal- lenges, many of which are common to traditional music preservation, such as the long-term stability of storage media. The transfer of digital documents to paper or microfilm is not always feasible, and often cannot be accomplished without loss of information. For example, music notation software normally 200 Archivaria 50 allows for playback of a musical score; the ability to play back a file is lost in the printing of the document. Clearly the option of preserving hard copies as substitutes for many digital documents is not a viable one, and will become less viable as time goes by. The major challenges to the preservation of these digital documents lie in their future readability, intelligibility, adequacy of representation, and authenticity.

Readability

Readability refers to the preservation of digital documents in such a way that the data is retrievable in years to come. In other words, will one be able to open a particular file in the future? The readability of a document depends upon the stability of an entire series of components essential to the medium of storage. Much has been written on this topic in relation to the preservation of digital documents in general, so the description of the problem here will be confined to a brief overview of the issue as it relates to music files. At the core of the processes of storage and retrieval is the storage medium itself, be it magnetic (such as floppy diskettes or magnetic tape), optical (such as compact discs), or other. While some media are more stable than others, all have a limited shelf life. Magnetic tape (the most widely used medium for the storage of audio files) is particularly susceptible to degradation, with some documents showing signs of decay in as little as five years. Even with a more stable medium, the readability of a file can be compro- mised by an unstable retrieval system. Mechanical devices, like disk drives or tape drives which read files, are subject to breakdown. With older systems, replacing or even repairing the drive may not be feasible. Even if a file can successfully be read by a drive, the software necessary to play back or view the file may not be available. The requisite operating system and hardware may be unavailable as well. Moreover, if the file can indeed be successfully read by the drive and opened with the appropriate software, certain peripheral hardware components essential for experiencing the music represented by that file the way it was originally experienced may be missing, such as a sound card, a MIDI interface, or a specific synthesizer. Certain aspects of the soft- ware’s functionality can thus be compromised. For example, most composers active in the creation of electroacoustic music have a of patches programmed for various synthesizers they have owned or used, saved on a hard drive in a specific sound librarian file format. While the files may be easily opened, their usefulness is contingent on the presence of the synthesizer and an interface. All of the potential problems described above relate to simply reading the file with a completely functional system. Assuming that these problems can be overcome and the file can be read, the issues of intelligibility, adequacy of representation, and authenticity remain. Issues Surrounding the Preservation of Digital Music Documents 201

Intelligibility

Intelligibility in this context refers to the ability to understand the meaning of the preserved file. Even if the data stream can be successfully preserved to be readable, the preservation of meaning is in some ways a more complex issue. Records managers and archivists must often rely on metadata to under- stand the nature of an electronic document, that is, its administrative, pro- venancial, procedural, documentary, and technological context. Metadata may identify a number of elements, such as the creator of the document, the date of its creation, the type of file, and its classification.8 In the creation of music-related files, such information is rarely included in the file itself, and indeed may not be affixed to the file in any physical way. Essential metadata regarding the creation of a sound recording – the titles, composer, and per- formers – would normally be noted on a tape box or CD insert rather than embedded in the document itself or noted directly on the medium. Metadata may also address the issue of readability described above by identifying the file format used and the various system requirements. Preserving the link between a digital music document and its metadata becomes an integral part of preserving the document itself. Of course, understanding the meaning of musical scores and sketches (digital- ly created or otherwise) is also contingent upon understanding the notation system used. Our inability to interpret some older musical manuscripts with surety is ample evidence of the transitory nature of notational systems.9 Even our highly developed modern system of notation omits quantities of musical information inherent in the performance of the music, including the use of vibrato, subtle changes of pitch, tempo and dynamic, the tuning of instruments, and ornamentation. While composers often allow these characteristics of the music to be determined by the interpreter, the degree to which the composer relies upon the interpreter to follow current performance practice can vary from composer to composer and piece to piece. These issues form the nucleus of the historical performance movement.10 Historically informed performances of early music have proliferated in the last

8 A more thorough description of metadata and record profiling can be found in Heather MacNeil, Trusting Records: Legal, Historical, and Diplomatic Perspectives (Dordrecht, 2000), pp. 96–97. 9 “The study of notation [reveals] that there remain problems in every aspect of early music that are yet unsolved (some perhaps unsolvable), so that in many cases a transcription is merely a personal interpretation, and other interpretations may also be possible, at least in the light of our present knowledge.” See Carl Parrish, The Notation of Medieval Music (New York, 1959, reprinted 1978), p. xvii. 10 Briefly, followers of this movement maintain that research into the performance practices of the composer’s time effectively informs modern performances. 202 Archivaria 50 few decades; should historical performance continue to interest future inter- preters, the performance practice of our own time might need to be made explicit in metadata. This is especially true for works created by a composer using an unusual notation system, or for sketches that use some form of compositional shorthand.

Adequacy of Representation and Authenticity

The related issues of adequacy of representation and authenticity address respectively the trustworthiness of the document and our ability to preserve the identity and integrity of the document over time. Adequacy relates to the content of the document: does the retrieved document represent the music as imagined by the musician? Authenticity relates to the form of the document: does the retrieved document exhibit the necessary characteristics to be deemed an authentic rendering of the original stored document? Both issues pose new challenges for documents which are digital in origin. Adequacy of representation can be understood on a number of levels corresponding to the various stages in the process of creating music. Does a score or sketch adequately represent the music imagined by the composer? Does a performance adequately represent the music as described in the score? Does a recording adequately represent a performance? Although these issues are closely tied to the intelligibility of a document, they are proper to the process of music creation as opposed to the preservation of the documents associated with that process. As such, the responsibility for adequately repre- senting music lies with the creator, although inadequate representation can create subsequent problems for the who may have to determine what constitutes authentic preservation for the document in question. Any archives intended to preserve digital documents must rely on one or more of a handful of preservation strategies such as migration of files from one format to another, emulation of system software and hardware, and refreshing the medium on which the data is stored. In all cases, the preserved document will not be the original document. As such, the authenticity of a digital document becomes a key issue for archivists: what sort of procedures, policies, and standards must be in place for the retrieved document to stand in the place of the original? This question is currently being explored by the InterPARES project (Inter- national Research on Permanent Authentic Records in Electronic Systems), an international research initiative based at the University of British Columbia and comprising representatives from fifteen countries.11 Through the analysis

11 More information on the InterPARES project, its structure, research aims, and methodology is available on the project Web site (last visited 18 January 2001). Issues Surrounding the Preservation of Digital Music Documents 203 of case studies and the modeling of archival processes, the project is endeav- ouring to describe the kinds of electronic records generated in various con- texts and to propose a framework for the development of procedures, policies, and standards to cope with the permanent preservation of electronic records in such a way as to ensure their authenticity. Records generated in the process of creating music constitute a special focus of the Canadian research team; it is expected that a better understanding of the complexities of preserving musical documents, outlined above, will be more broadly applicable to other digital administrative records in a variety of fields as they become more complex. The case studies in music undertaken so far have yielded disquieting results. Individual musicians have learned that works created as little as five years ago have become impossible to reproduce due to the unavailability of functional hardware and software. Within public and private institutions, the expense of migrating archival digital recordings has necessitated a narrow selection of recordings for preservation and the loss, in hindsight, of valuable material. In most cases, digital documents are stored on diskettes, hard drives, or tapes, awaiting an uncertain future. More research needs to be done in order to formulate the best strategies for preserving these documents.

Conclusion

The plethora of digital music file formats and the difficulties inherent in the preservation of records in these formats pose serious challenges. Future listeners and scholars will depend on well-preserved archives to understand the music of our time. I would like to conclude with a few observations that underscore some of the recent changes in the ways in which music is created that will have implications for archivists. Until recently, the making of recordings as a routine part of creating music was limited by the expense of professional recording equipment. The appear- ance of high-quality, inexpensive recording equipment has allowed musicians who previously created their music without generating any records of the process (as was the case with most popular music groups) to maintain person- al archives of practice sessions, live recordings, and preliminary versions. With the advent of CD writers, the storage of these audio files has become inexpensive as well. While notated sketches are becoming less common, recorded sketches are multiplying exponentially. For all the reasons outlined above, preserving these audio records will be expensive and time-consuming. Further, the abundance of digital documents native to different operating systems and saved to a variety of media of different shapes and sizes creates a challenge to physical storage and the maintenance of the archival bond between documents. A typical recording project may generate electronic correspondence, artwork and liner notes, several tapes and CDs, and, of 204 Archivaria 50 course, paper. Storing these records together is space intensive; storing these records apart requires a high (and time intensive) standard of description. Finally, the distribution of music in a variety of digital formats on the Internet has made it simple for music to be modified and redistributed. While the popular vision of the Internet as a sort of publicly accessible archives becomes more widespread, it will become incumbent upon archival institu- tions to maintain digital musical records as readable, intelligible, and authentic so as to faithfully preserve these valuable cultural records.