A Fast Data Structure for Disk-Based Audio Editing

Dominic Mazzoni and A Fast Data Structure Roger B. Dannenberg Computer Science Department for Disk-Based Carnegie Mellon University 5000 Forbes Ave. Audio Editing Pittsburgh, PA 15213 USA {dmazzoni,rbd}@cs.cmu.edu For some time, ordinary personal computers have Nomenclature is not always consistent; for ex- been powerful enough to allow people to edit, fil- ample Peak from BIAS, Inc., is called a non- ter, and mix digital audio without any special destructive editor, but it performs most operations added hardware. The earliest editors, such as those by creating temporary files on disk to hold the described by Freed (1987), Kirby and Shute (1988), changes, and it displays the actual values of edited and Moorer (1990), were modeled after tape-based samples. This is closer in behavior and perfor- editors, with similar control panels and basic mance to in-place editors. Also, most in-place edi- operations; the main advantage of this is that edits tors are designed for single small files with the could be performed non-destructively and then assumption that a separate, non-destructive editor changed or ‘‘undone’’ later. However, these types of will be used to combine many smaller files into a editors still force users to keep track of all of the final mix. We think a well-designed editor ought to original audio clips that are used to create the final be able to perform well in both sorts of tasks. mix, and once the editing is complete, an addi- This article examines how to combine the tional step is required to actually produce the out- strengths of both in-place and non-destructive ap- put audio file from the originals. As personal proaches to audio editing, yielding an editor that is computers have grown faster and more powerful, almost as fast and reversible as a non-destructive new audio editors have emerged that more closely editor, while almost as simple and space-efficient resemble a computer word processor or computer as an in-place editor. Although we create an inter- painting program than a reel-to-reel tape editor. face that looks like that of an in-place editor, we These editors allow users to perform many opera- also support multiple tracks with editable ampli- tions on their audio files in place, with all changes tude envelopes. This allows us to manipulate and affecting the original waveform data on disk. Fur- combine many audio files efficiently. thermore, the visual display reflects the results of We think this work is particularly interesting to all edits, which is not always the case for non- the computer music community for several rea- destructive editors. This makes editing much sim- sons. First, longer works of music require larger pler and faster, especially for small files, and files; editors that suffice for 3-minute pop songs eliminates the extra step at the end, because the may not work well with larger works. Second, current copy of the entire project is always stored computer music composers may want to see the ef- on disk. However, these ‘‘in-place’’ audio editors fects of signal processing effects or to apply effects are not usually able to provide more than a single that are too slow for real-time processing. The non- level of undo, and they are often very slow in deal- destructive editors do not support these capabilities ing with large files. Today, one can find a variety of directly. Finally, composers might want fast undo both types of audio editors for personal computers. for many levels and quick redisplay to facilitate ex- Some popular in-place editors are SoundEdit 16 perimentation and creative exploration. We believe from Macromedia, CoolEdit from Syntrillium, and our approach offers good support for all of these Sound Forge from Sonic Foundry. Non-destructive features and suffers from fewer problems than editors include Cubase from Steinberg Media Tech- other editors we have seen. Ultimately, the best ap- nologies AG, Digital Performer from Mark of the proach is determined by the application and per- Unicorn, Inc., and ProTools from Digidesign, a di- sonal preferences, and we will compare all three vision of Avid Technology, Inc. approaches later. We believe computer music com- Computer Music Journal, 26:2, pp. 62–76, Summer 2002 posers and researchers will find our approach espe- ᭧ 2002 Massachusetts Institute of Technology. cially attractive. 62 Computer Music Journal Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892602760137185 by guest on 23 September 2021 Figure 1. Audacity, a free cross-platform digital audio editor that uses a novel data structure for fast editing. The idea for our approach came in part from the We demonstrate that, by imposing certain simple work of Charles Crowley of the University of New constraints, only a small constant number of these Mexico, in his examination of the various data files must be read or changed on disk to perform structures used by word processors in a paper enti- simple editing operations such as insertions and de- tled ‘‘Data Structures for Text Sequences’’ (avail- letions of arbitrary size, or undoing the last opera- able online at www.cs.unm.edu/ϳcrowley/papers tion. By reducing the number of disk operations to /sds/sds.html). In designing a word processor, it a small constant, editing operations can be made to would not make sense always to store an entire file seem almost instantaneous. The Undo section of in one contiguous block, because inserting a char- this article discusses how our sequence structure acter at the beginning would be unreasonably slow. can provide very fast multi-level undo. Following However, it would also make no sense for a word that, we discuss how to store reductions of the au- processor to use an entirely non-destructive ap- dio data for fast screen display, and how this is af- proach, because as the number of edits grows, the fected by storing the data in blocks. time required just to render a page of text would To demonstrate the effectiveness of our solution, also grow. Crowley showed that most of the ap- we measured its performance, as described in the proaches used by existing word processors were Performance section. We also implemented a free- variations of a general data structure called a se- ware cross-platform audio editor called Audacity quence, which is optimized to allow fast insertion (see Figure 1), and we invite readers to experience and deletion of contiguous blocks of data. The next this approach first-hand. section describes the functional requirements of a After the detailed description of our approach, we sequence structure for digital audio editing. present a discussion of the advantages and disad- In the Implementation section of this article, we vantages of the three approaches we have identi- propose a particular variant of the sequence data fied: in-place editors, non-destructive editors, and structure that is well suited for storing large audio sequence-based editors. This is followed by conclu- tracks. Our basic idea is to store a large audio track sions about some of the major trade-offs that seem as a set of small files of approximately equal size. inherent in these different approaches. Mazzoni and Dannenberg 63 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892602760137185 by guest on 23 September 2021 The original motivation for this work included data structure and its associated algorithms can be the need for a music data-visualization tool. We augmented to support an undo history with very want to display audio, continuous parameters, little space overhead. spectral information, and discrete information such A primary concern for us is that our solution is as MIDI data and labels. A good tool should allow not only theoretically preferable but actually fast in flexible display and editing of various forms of data. practice. Specifically, we want to ensure that edit- Our program, Audacity, fulfills many of our music ing operations take much less than 1 sec to per- data-visualization needs, and its clean design and form on a typical computer and that a large open source allow researchers to build customized number of tracks can be played from the disk in data visualizers, starting from an already powerful real time. base. Although data visualization is not the focus of this article, we encourage readers to use Audac- ity as a visualization tool. Implementation Data Structure Requirements At first glance, it is tempting to think that a tree- like structure would be a good approach. Ignoring the memory overhead, suppose that we stored each Suppose that we have a single sequence of consecu- sample in its own node in a balanced binary tree tive audio samples that we would like to store in a (such as a Red-Black Tree or a Splay Tree). Insert- data structure. The sequence data structure sup- ing a single sample would always take O(log n) ports the following operations: time, which is reasonable enough, but then insert- Get(i, l): Retrieve l consecutive samples ing l consecutive samples would take O(l log n) from the ith sample. time, which is definitely wasteful. Storing samples Set(i, l): Change l consecutive samples in a tree this way ignores the fact that most opera- from the ith sample. tions tend to work on large consecutive chunks of samples. Insert(i, l): Insert l consecutive samples before The idea of the binary tree is fine, but instead of the ith sample. storing one sample per node, consecutive samples Delete(i, l): Delete l consecutive samples from (maybe about 32 kB) could be stored in each node. the ith sample.

A Fast Data Structure for Disk-Based Audio Editing

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support