Eric D. Scheirer and Barry L. Vercoe Machine Listening Group SAOL: The MPEG-4 E15-401D MIT Media Laboratory Cambridge, Massachusetts 02139-4307, USA Structured Audio [email protected] [email protected] Orchestra Language

Since the beginning of the computer music era, The Motion Pictures Experts Group (MPEG), tools have been created that allow the description part of the International Standardization Organiza- of music and other organized sound as concise net- tion (ISO) finished the MPEG-4 standard, formally works of interacting oscillators and envelope func- ISO 14496, in October 1998; MPEG-4 will be des- tions. Originated by Max Mathews with his series ignated as an international standard and published of “Music N” languages (Mathews 1969), this unit in 1999. The work plan and technology of MPEG-4 generator paradigm for the creation of musical represent a departure from the previous MPEG-1 sound has proven highly effective for the creative (ISO 11172) and MPEG-2 (ISO 13818) standards. description of sound and widely useful for musi- While MPEG-4 contains capabilities similar to cians. Languages such as (Vercoe 1995), MPEG-1 and MPEG-2 for the coding and compres- Nyquist (Dannenberg 1997a), CLM (Schottstaedt sion of audiovisual data, it additionally specifies 1994), and SuperCollider (McCartney 1996b) are methods for the compressed transmission of syn- widely used in academic and production studios thetic sound and computer graphics, and for the today. juxtaposition of synthetic and “natural” (com- As well as being an effective tool for marshalling pressed audio/video) material. a composer’s creative resources, these languages Within the MPEG-4 standard, there is a set of represent an unusual form of digital audio com- tools of particular interest to computer musicians pression (Vercoe, Gardner, and Scheirer 1998). A called Structured Audio (Scheirer 1998, 1999; program in such a language is much more succinct Scheirer, Lee, and Yang forthcoming). The MPEG-4 than the sequence of digital audio samples that it Structured Audio tools allow synthetic sound to be creates, and therefore this method can allow for transmitted as a set of instructions in a unit-gen- more dramatic compression than traditional audio erator-based language, and then synthesized at the coding. The idea of transmitting sound by sending receiving terminal. The synthesis language used in a description in a high-level synthesis language MPEG-4 for this purpose is a newly devised one and then performing real-time synthesis at the re- called SAOL (pronounced “sail”), for Structured ceiving end, which Vercoe, Gardner, and Scheirer Audio Orchestra Language. By integrating a music- (1998) term structured audio, was suggested as synthesis language into a respected international early as 1991 (Smith 1991). A project at the Massa- standard, the required broad base of systems can be chusetts Institute of Technology (MIT) Media established, and industrial support for these power- Laboratory called NetSound (Casey and Smaragdis ful capabilities can be accelerated. The sound-syn- 1996) constructed a working system based on this thesis capabilities in MPEG-4 have a status concept, using Csound as the synthesis engine, equivalent to the rest of the coding tools; a compli- and allowing low-bit-rate transmission on the ant implementation of the full MPEG-4 audio sys- Internet. If it were possible to create a broad base tem must include support for real-time synthesis of mutually compatible installed systems and mu- from SAOL code. sical compositions designed to be transmitted in In this article, we describe the structure and ca- this manner, this technique could have broad util- pabilities of SAOL. Particular focus is given to the ity for music distribution. comparison of SAOL with other modern synthesis languages. SAOL has been designed to be inte- Computer Music Journal, 23:2, pp. 31–51, Summer 1999 grated deeply with other MPEG-4 tools, and a dis- © 1999 Massachusetts Institute of Technology. cussion of this integration is presented. However,

Scheirer and Vercoe 31

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 it is also intended to be highly capable as a stand- expressive (so that musicians can do complex alone music-synthesis language, and we provide things easily), and highly functional (so that any- some thoughts on the implementation of efficient thing that can be done with digital audio can be stand-alone SAOL musical instruments. Strengths expressed in SAOL). Additionally, SAOL as a lan- and weaknesses of the language in relation to guage should lend itself to efficient implementa- other synthesis languages are also discussed. A dis- tions in either hardware or software. cussion of the role of the MPEG-4 International As well as the new features of SAOL that are de- Standard in the development of future computer scribed below, many well-established features of music tools concludes the article. Music-N languages (Mathews 1969; Pope 1993) are retained. SAOL, like other Music-N languages, de- fines an instrument as a set of digital signal-pro- SAOL: Structure and Capabilities cessing algorithms that produces sound. A set of instruments is called an orchestra. Other retained SAOL is a declarative unit-generator-based lan- features include: the sample-rate/control-rate dis- guage. In this respect, it is more like Csound tinction, which increases efficiency by reducing (Vercoe 1995; Boulanger forthcoming) than it is sample-by-sample calculation and allowing block- like SuperCollider (McCartney 1996a, b; Pope based processing; the orchestra/score distinction, 1997) or Nyquist (Dannenberg 1997a); Nyquist in which the parametric signal-processing instruc- employs a functional-programming model in its tions in the orchestra are controlled externally by design, and SuperCollider employs an object-ori- a separate event list called the score (one of ented model. SAOL extends the syntax of Csound Nyquist’s innovations is the removal of this dis- to make it more understandable and concise, and tinction); the use of instrument variables to en- adds a number of new features to the Music-N capsulate intermediate states within instruments model that are discussed below. and global variables to share values between in- It is not our contention that SAOL is a superior struments; and a heavy dependency on stored- language to the others we cite and compare here. In function tables or wavetables to allow efficient fact, our belief is somewhat the opposite: the dif- processing of periodic signals, envelopes, and other ferences between general-purpose software-synthe- functions. These historical aspects of SAOL will sis languages are generally cosmetic, and features not be discussed further here, but excellent sum- of the languages’ implementations are much more maries on the evolution and syntactic construction crucial to their utility for composers. For the of synthesis languages may be found in other refer- MPEG-4 project, we developed SAOL anew because ences (Roads 1996; Dannenberg 1997a, b; and it has no history or intellectual-property encum- Boulanger forthcoming, among others). brances that could impede the acceptance of the standard. SAOL is not a research project that pre- sents major advances in synthesis-language design; Readability rather, it is an attempt to codify existing practice, as expressed in other current languages, to provide Where Csound is “macro-assembly-like,” Nyquist a fixed target for manufacturers and tools develop- is “Lisp-like,” and SuperCollider is “Smalltalk- ers making use of software-synthesis technology. like,” SAOL is a “C-like” language. In terms of There were several major design goals in the cre- making the language broadly readable, this is a ation of SAOL. These were: to design a synthesis good step, because C is the most widely used of language that is highly readable (so that it is easy these languages. The syntactic framework of to understand and to modify instrument code), SAOL is familiar to anyone who programs in C, al- highly modular (so that general-purpose process- though the fundamental elements of the language ing algorithms can be constructed and reused are still signal variables, unit generators, instru- without modification in many orchestras), highly ments, and so forth, as in other synthesis lan-

32 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 1. A SAOL instru- ment that makes a short tone.

// This is a simple SAOL instrument that makes a short tone, // using an oscillator over a stored function table.

instr beep(pitch,amp) { table wave(harm,2048,1); // sinusoidal wave function asig sound; // ‘asig’ denotes audio signal ksig env; // ‘ksig’ denotes control signal

env = kline(0,0.1,1,dur-0.1,0); // make envelope sound = oscil(wave, pitch) * amp * env; // create sound by enveloping an oscillator output(sound); // play that sound }

guages. (The exact syntax of C is not used; there Modularity are several small differences that make the lan- guage easier to parse.) The program in Figure 1 There is a highly capable set of unit generators shows a simple SAOL instrument that creates a built into the SAOL specification (100 in all; see simple beep by applying an envelope to the output Appendix 1). This set is fixed in the standard, and of a single sinusoidal oscillator. all implementations of SAOL must implement A number of features are immediately apparent them. However, SAOL may be dynamically ex- in this instrument. The instrument name (beep), tended with new unit generators within the lan- parameters (or “p-fields”: pitch and amp), stored- guage model. While other Music-N languages function table (wave), and table generator (harm) require rebuilding the language system itself to all have names rather than numbers. All of the sig- add new unit generators, this capability is a funda- nal variables (sound and env) are explicitly de- mental part of SAOL. An example orchestra using clared with their rates (asig for audio rate and this capability is shown in Figure 2. ksig for control rate), rather than being automati- The beep2 instrument makes use of a unit gen- cally assigned rates based on their names. There is erator, voscil, which is not part of the standard a fully recursive expression grammar, so that unit set. The user-defined opcode below it implements generators like kline and oscil may be freely this unit generator from a certain set of param- combined with arithmetic operators. The stored- eters. The aopcode tag indicates that the new unit function tables may be encapsulated in instru- generator produces an a-rate (audio-rate) signal. ments or in the orchestra when this is desirable; Each opcode parameter (wave, cps, depth, and they may also be provided in the score, in the rate) is similarly defined with a rate type manner of Music V (Csound also allows both op- (table, ivar, ksig, and ksig, respectively) tions). The unit generators kline and oscil are that indicates the maximum rate of change of each built into the language; so is the wavetable genera- parameter. Using the same core functionality as tor harm. instruments, user-defined opcodes perform a cer- The control signal dur is a standard name, tain calculation and then return their results. In which is a variable automatically declared in every this case, the voscil user-defined opcode makes instrument, with semantics given in the standard. use of the koscil and oscil core opcodes to cal- There is a set of about 20 standard names defined culate its result. Any orchestra containing this in SAOL; dur always contains the duration of the user-defined opcode may now make use of the note that invoked the instrument. voscil unit generator.

Scheirer and Vercoe 33

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 2. A SAOL orches- not defined in the stan- SAOL system is required tra that uses the user- dard. The opcode declara- to provide the capability defined opcode construc- tion beneath it to dynamically extend the tion. The instrument implements the opcode so language in this manner. beep2 makes use of the that it is available for use opcode voscil, which is in the orchestra. Every

// This is part of a SAOL orchestra showing the use of the // user-defined opcode syntax. The instrument ‘beep2’ makes use // of the ‘voscil’ opcode, which is not part of the core SAOL // syntax. The opcode definition beneath it implements the // opcode.

instr beep2(pitch,amp) { table wave(harm,2048,1); asig sound; ksig env;

env = kline(0,0.1,1,dur-0.1,0); sound = voscil(wave,pitch,0.05,5) * env; // ‘voscil’ is not a built-in ugen... output(sound * amp); }

aopcode voscil(table wave, ivar cps, ksig depth, ksig rate) { // ... so we declare it here. // It’s an ‘oscil’ with vibrato: // ‘wave’ is the waveshape, ‘cps’ the carrier freq, // ‘depth’ the vibrato depth as a fraction, // ‘rate’ the vibrato rate ksig vib,newfreq; asig sound; table vibshape(harm,128,1); // waveshape for vibrato

vib = koscil(vibshape,rate); // sinusoidal vibrato newfreq = cps * (1 – vib * depth); // FM banking sound = oscil(wave,newfreq); // new output return(sound); // return ‘sound’ to caller }

It is easy to imagine the construction of stan- fined opcodes may themselves depend on other dard libraries of desirable opcodes for use in vari- user-defined opcodes, so a complete abstraction ous synthesis applications; for example, model is provided. The only limit on this abstrac- mathematical functions such as Bessel functions tion is that recursive and mutually user-defined and Chebyshev polynomials for FM synthesis, or opcodes are prohibited; this can simplify the run- special filters for physical-modeling synthesis. time language model, because it means that care- Since the unit-generator set is arbitrarily exten- ful macro expansion can be used to implement sible within the language model, the problem of user-defined opcodes in a SAOL compiler if de- so-called opcode bloat that other synthesis lan- sired. However, each user-defined opcode has its guages have encountered may be avoided. User-de- own name space, so that procedural abstraction is

34 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 3. A SAOL orches- sound, but placed on the tra that defines a bus- bus for further processing. routing scheme for effects The bus echo_bus is sent processing. The beep in- to the instrument echo, strument is routed to the which implements a digi- bus echo_bus; the output tal-delay echo. of beep is not turned into

// This is a complete SAOL orchestra that demonstrates the // use of buses and routing in order to do effects processing. // The output of the ‘beep’ instrument is placed on the bus // called ‘echo_bus’; this bus is sent to the instrument called // ‘echo’ for further processing.

global { srate 32000; krate 500;

send(echo; 0.2; echo_bus}; // use ‘echo’ to process the bus ‘echo_bus’ route(echo_bus, beep); // put the output of ‘beep’ on ‘echo_bus’ }

instr beep(pitch, amp) { // as above }

instr echo(dtime) { // a simple digital-delay echo. ‘dtime’ is the // cycle time. asig x;

x = delay(x/2 + input[0],dtime); output(x); }

not affected by this restriction. All instruments In this orchestra, a global block is used to de- and user-defined opcodes in the orchestra live scribe global parameters and control. The srate within a single global name space; there is no and krate tags specify the sampling rate and con- mechanism for “packages” or similar concepts. trol (LFO) rate of the orchestra. The send instruc- As with unit generators, extensibility is pro- tion creates a new bus called echo_bus, and vided for wavetable (function) generators; about 20 specifies that this bus is sent to the effects-pro- built-in wavetable generators are provided (see Ap- cessing instrument called echo. The route in- pendix 2), but composers may also write opcodes struction specifies that the samples produced by that act as generators for new functions. the instrument beep are not turned directly into Another aspect of modularity in SAOL involves sound output, but instead are “routed onto” the its flow-of-control processing model. In Csound, bus echo_bus for further processing. the only way to allow instruments to post-process The instrument echo implements a simple ex- sound (for example, to add reverb to another ponentially decaying digital-echo sound using the instrument’s output) is to shuttle signals between delay core opcode. The dtime p-field specifies them with global variables. In SAOL, a metaphor the cycle time of the digital delay. Like dur in Fig- of bus routing is employed that allows the concise ure 1, input is a standard name; input always description of complex networks. Its use is shown contains the values of the input to the instrument, in Figure 3. which in this case is the contents of the bus

Scheirer and Vercoe 35

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 4. A SAOL instru- ables amp and off are ex- ment that can be con- posed to the score, allow- trolled continuously from ing their values to be a SASL score. The vari- modified there.

// This is a SAOL instrument that can be controlled with // continuous controllers in the score. The variables ‘amp’ // and ‘off’ are exposed to the score.

instr beep3(pitch) { imports ksig amp, off; // controllers ksig vol; table wave(harm,2048,1); asig sound;

if (!itime) { // first time we’re called amp = 0.5; }

if (off) { turnoff; } // we got the ‘off’ control vol = port(amp,0.2); // make a smooth volume signal sound = oscil(wave,pitch); output(sound * vol); }

echo_bus. Note that echo is not a user-defined Structured Audio Score Language). For the cases opcode that implements a new unit generator, but when MIDI is used, a set of standard names per- an effects-processing instrument. taining to MIDI allows access to the standard This bus-routing model is modular with regard MIDI control, pitch-bend, and after-touch param- to the instruments beep and echo. The beep eters; channel and preset mappings are also sup- sound-generation instrument does not “know” ported in SAOL. In SASL, more-advanced control that its sound will be modified, and the instru- is possible, as shown in Figures 4 and 5. The SAOL ment itself does not have to be modified to enable orchestra in Figure 4 can be controlled with the this. Similarly, the echo instrument does not SASL score in Figure 5. “know” that its input is coming from the beep in- In the orchestra (see Figure 4), the control sig- strument; it is easy to add other sounds to this bus nals amp and off are declared with the tag without modification to echo. The bus-routing imports, which indicates that they may be up- mechanism in SAOL allows easy reusability of ef- dated by the score. The amp signal allows continu- fects-processing algorithms. There are also facili- ous control of the amplitude of the instrument ties that allow instruments to manipulate busses output, and the off signal allows the instrument directly, if such modularity is not desirable in a to be instructed to turn itself off. Notice that the particular composition. meanings of these control signals are not fixed in the standard (unlike MIDI); the composer is free to specify as many controllers as needed, with what- Expressivity and Control ever meanings are musically useful. When the off control is received, the instrument uses the built- SAOL instruments may be controlled through Mu- in turnoff command to turn itself off; the built- sical Instrument Digital Interface (MIDI) files, in port (portamento) unit generator is used to real-time MIDI events, or a new score language convert the discrete changes in the amp control called SASL (pronounced “sazzle,” an acronym for signal into a continuous amplitude envelope.

36 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 5. A score that can instrument beep3. The be used to control the or- note labels (n1, n2, n3) chestra shown in Figure control the mapping of 4. The control lines set control changes to note the values of variables in instances. the various instances of

// This is a score for controlling the orchestra shown // above.

n1: 0.0 beep3 1 440 0.5 beep3 1 480

n2: 1.0 beep3 –1 220 n2: 1.0 beep3 –1 440 n3: 1.0 beep3 –1 660

2.0 control n2 amp 2.5 control n2 amp 0.5 3.0 control n3 amp 0.2 3.0 control n2 amp 0.2 4.0 control n2 off 1 4.0 control n3 off 1

The if (!itime) clause is used to control the More advanced control mechanisms are also behavior of the first pass through each instance of possible in SAOL and SASL. The built-in instr the instrument. Like dur in Figure 1, itime is a and extend commands allow instruments to standard name; itime always contains the spawn other instruments (for easy layering and amount of time the instrument instance has been synthetic-performance techniques) and dynami- executing. Thus, testing it for 0 allows cally change the durations of notes. A standard initializations of “k-rate” (control-rate) variables name (see the discussion of Figure 1), cpuload, to only be performed once. All variables in SAOL allows dynamic voice-stealing algorithms to be in- are like static variables in C; that is, they pre- cluded in an orchestra; cpuload always contains serve their values between iterations. Thus, the as- the current load of the processor on which an in- signment to amp is preserved in the next iteration. strument is running, and is expressed as a percent- In the SASL score (see Figure 5), n1, n2, and n3 age of capability. are labels that control the mapping of control in- SASL is a relatively simple format compared to formation to note events. Two types of score other scoring languages; it provides only tempo lines are shown; each has an optional label and a and event-dispatch methods, not commands relat- time stamp that indicates the time at which the ing to sections, repeats, looping, expressions, sto- event is dispatched. The instrument lines specify chastic performance, or many of the other features the instrument that is to be used to create a note seen in advanced score formats. SASL’s design as- (beep3 in each case), the duration of the note (–1 sumes that in the long run most scores will actu- indicates that the duration is indefinite), and any ally be written by composing tools such as other p-fields to be passed to the note, as defined sequencers, which provide advanced capabilities to in the orchestra. The control lines begin with a the musician; textual scores will not be written di- time stamp and the tag control, and then specify rectly by the musician. Thus, the primary design a label, a variable name, and a new value. The goal for SASL was simplicity, so that it would be variable name given will be set to the new value in easy to construct such tools. every note that was instantiated from an instru- This is not to say that control is “easier” or ment line with the given label. In this way, score- “less essential” than synthesis. In fact, the MPEG- based control is more general and flexible than in 4 viewpoint is precisely the opposite. The time is MIDI or Csound. right for standardization of the underlying signal-

Scheirer and Vercoe 37

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 processing structure of synthesis languages, be- scheme of OSC. Note that there is no prohibition cause more than 30 years of research on this topic in the MPEG-4 standard from making a SAOL have greatly refined and proven the efficacy of the implementation respond to controls expressed in unit-generator paradigm. But sensitive, expressive OSC; there is simply no requirement to do so. control is still a matter of art, not of engineering, The SAOL language is powerful enough for algo- and so it is best not to try to standardize a control rithmic composition and score-generation facili- representation; rather, that is left up to future ex- ties to be written directly in the orchestra code. periments in interface design and composition Using this method, synthetic performers may be tools. The best that can be accomplished at this included in the orchestra to mediate performance time is to create a standardized signal-processing between musical intent (scoring) and sound cre- substrate for such experiments. All that is needed ation (instrument dispatch). For example, the bass to satisfy MPEG-4 requirements for efficient trans- line of a jazz composition could be created by de- mission is to provide a simple format that can livering the algorithms to dynamically improvise flexibly control synthesis and play back a desired note choices and line shapes, and then controlling composition; this format need not be the one “in this virtual bassist with high-level commands which the composer thinks.” about chords, note density, tempo, and playing Other recent work, especially at Berkeley’s Cen- style. Such dynamic, stochastic compositions can ter for New Music and Audio Technologies be written directly in SAOL and do not need an ex- (CNMAT), has explored the creation of more pow- ternal-language interface or powerful score lan- erful protocols for communication and control. guage to generate note sequences. ZIPI (McMillen, Wessel, and Wright 1994; Wright 1994) is a protocol for communication of musical parameters; it expands on MIDI by standardizing a Advanced Functionality large set of specific controls (pitch, volume, pan, spectral information, articulation, spatial position, There are many advanced features of SAOL that etc.) and allowing more extensibility. ZIPI embeds were added during the evaluation stage to enable a Music Parameter Description Language, and in- easier construction of complex instrument mod- cludes a physical-layer specification for transport els. For example, arrays of signals and unit genera- over wires. Most of the functionality of ZIPI could tors may be created and easily manipulated, as be cross-coded into SASL without much difficulty, shown in Figure 6. just as MIDI functionality can be embedded in This instrument creates Shepard tones, octave- MPEG-4 Structured Audio (see below). One notable complex tones that have only relative, not absolute, aspect of ZIPI that SASL cannot emulate is the pitch height (Shepard 1964). The freq, amp, and ability to query the capabilities of a part variables are defined as signal arrays; each and make control decisions based on the results. holds eight signals rather than one. Depending on OpenSound Control (Wright and Freed 1997), or the application, such signal arrays may represent OSC, is another new format that moves entirely multichannel sounds or, as in this case, multiple away from the “channel” and “note” fixation of components of a single sound. An opcode array is most other control formats. It uses an open-ended used to declare a bank of eight oscillators, using the symbolic addressing scheme, including a powerful oscil unit generator. Any built-in or user-defined wild-card syntax, to enable hierarchical control opcode may be used in this construction. over real-time synthesis parameters. Researchers The opcode array is used in the sixth line from at CNMAT have been developing powerful tools the bottom of the program: oscil[j](...). It al- to enable the integration of OSC into other sys- lows for the concise description of a bank of paral- tems (Wright 1998). Perhaps in the future it will be lel oscillators with independent state and possible to converge the flexible synthesis capa- parameters. In this case, all of them use the same bilities of SAOL with the advanced control fundamental waveshape, but this could also be

38 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 6. A SAOL instru- oscil opcode to be easily ment that produces addressed in a loop. Any Shepard tones. The built-in or user-defined oparray construction opcode can be used in oscil[j] allows mul- this construction. tiple instances of the

// This is a SAOL instrument that uses ‘opcode arrays’ to // implement Shepard-tone sounds.

instr shepard(pc) { table env(window,120,4); // a Gaussian envelope table wave(harm,2048,1); ivar i, freq[8], amp[8]; // initialization variables // vectors to hold component freqs and amps asig s, j, part[8]; // an eight-channel sound oparray oscil[8]; // an 8-channel bank of oscillators

// i-time (done at instrument startup) freq[0] = 60 * pow(2,pc/12); // calculate base freq amp[0] = tableread(env,10*log(freq[0])); // look up amplitude in table. i = 1; while (i < 8) { // for each component... freq[i] = freq[i-1]*2; // it’s an octave up from the last one amp[i] = tableread(env,10*log(freq[i])); // look up amplitude in table i = i + 1; }

// a-time (each sample of synthesis) s = 0; j = 0; while (j < 8) { // for each channel... part[j] = oscil[j](wave,freq[j]); // run the j’th oscillator s = s + part[j] * amp[j]; // scale it by its amplitude and add it to // the total j = j + 1; }

output(s); // output the total }

controlled with an array index if desired. Arith- built in as core unit generators, but may easily be metic operations on arrays automatically account added if they are useful in a certain composition. for the number of channels involved; pointwise Other advanced features of SAOL include built- vector operations and operations involving both in spectral manipulation unit generators (fft, for vectors and scalars are built into the language. Vec- the fast Fourier transform, and ifft, for its in- tor operations such as sum, mean, etc. are not verse), which allow spectral-domain techniques

Scheirer and Vercoe 39

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 based on overlap-add windowing; a sophisticated interpreter or compiler. A SAOL implementation fractional and multitap delay-line unit generator that wishes to make use of block processing must (fracdelay); and built-in support for granular statically detect, through optimization methods, synthesis (grain), Karplus-Strong plucked-string when it is and is not possible in a particular in- synthesis (pluck), waveshaping synthesis, high- strument. An informative note in the MPEG-4 quality parametric compression (compressor), standard discusses this point. and others. A template mechanism is provided for In practice, this is not a barrier to efficient or- the concise description of multiple instruments chestras once these sorts of sophisticated imple- that differ only in a simple way. For details on mentations become available. Any algorithms that these and other features, readers are referred to the cannot be optimized into block-processing modes final draft of the standard (International Standard- are simply algorithms that are impossible to con- ization Organization 1999), which contains the struct in a strictly block-processing language (ex- definition of SAOL. cept with srate = krate). A composer who wishes to have the maximum efficiency in opera- tion for a real-time composition simply refrains Efficiency in SAOL from making use of these algorithms. It is reasonable to believe that such optimizing The use of control signals and block-based process- implementations are not too difficult to develop; ing has been an area of some recent debate in the the technology required is very similar to that in literature on music-synthesis languages. parallelizing compilers for vector supercomputers. Dannenberg (1997b) provides an excellent, There is a large literature in the compiler-design thoughtful summary of the relevant issues. In the literature on this topic (Allan, Jones, Lee, and earliest synthesis languages, the semantics of pro- Allan 1995) that may be readily applied to the de- cessing were sample-by-sample; Barry Vercoe’s lan- sign of SAOL compilers. guages Music-11 and Csound innovated the use of The specification of SAOL in the MPEG-4 stan- the control rate and block-based processing to add dard does not restrict the form of an implementa- efficiency. In block-based processing, the seman- tion. The standard specifies only what an tics of each line are evaluated for an entire block of implementation must do—that is, turn SAOL code samples before the next line is evaluated. Other into sound in a particular manner—not how it modern languages, such as Nyquist and must operate. SAOL systems may be implemented Supercollider, have adopted this mechanism as in hardware or software, using DSP or native pro- well. SAOL returns to the earlier model and uses cessors, using block-based or single-sample pro- sample-by-sample semantics for processing, al- cessing, embedded in larger systems or as though it preserves the audio-rate/control-rate dis- stand-alone instruments, depending on the needs tinction to allow dual-rate operation (that is, to of the particular music application. allow some signals to be updated more slowly than others) and to specify the semantics of controllers and global variables (for example, only control-rate SAOL in the MPEG-4 Standard variables may be shared between instruments). The change back to sample-by-sample semantics Although, as demonstrated above, SAOL is a pow- has two primary implications. First, instruments erful and flexible music-synthesis language on its that make use of single-delay feedback in their own, it has additional capabilities stemming from synthesis algorithms can be constructed in the its design as part of the MPEG-4 International “obvious” manner, without having to artificially Standard. Space does not permit a full description set the sampling rate and control rate to be equal. of these capabilities here; a summary is provided Second, not every algorithm may be implemented below, and interested readers are referred to other in a strictly block-processing manner by the SAOL articles for more details.

40 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 MPEG-4 Structured Audio the SAOL orchestra interact to produce sound. These instructions are normative in the MPEG-4 The MPEG-4 Structured Audio standard (Scheirer standard, which means that every different MPEG- 1998; Scheirer, Lee, and Yang forthcoming) con- 4 implementation must follow these rules to en- tains five main sections, two of which (SAOL and sure that musical content sounds exactly the same SASL) have been described above. The other three on every synthesis device. The scheduler is speci- pieces are as follows. fied in terms of a set of careful rules given on the A rich wavetable format called SASBF (pro- order of execution of instrument instances, order nounced “sazz-biff”), for Structured Audio Sample of dispatch of events, and order of processing Bank Format, allows banks of samples for use in sound with buses. wavetable synthesis to be efficiently transmitted. The format of SASBF derives from the MIDI Downloadable Sound (DLS) (MIDI Manufacturers Other MPEG-4 Audio Coders Association 1996) and E-Mu SoundFonts formats, and combines the best features of both; it has been Structured Audio tools are not the only audio tools designed in collaboration with the MIDI Manufac- in MPEG-4. There is also a set of highly functional turers Association (MMA) and various companies natural-audio tools that allow traditional compres- interested in this technology. The MMA is inde- sion of streaming audio (Quackenbush 1998). The pendently standardizing the same format as DLS MPEG-4 General Audio (GA) coder is based heavily Level 2. on the powerful MPEG-2 Advanced Audio Coding A restricted profile of MPEG-4 Structured Audio (AAC) method, with extensions that allow for more allows the use of only wavetable synthesis in low- scalability and better performance at low bit rates. cost applications where sophisticated sound con- MPEG’s psychoacoustic tests (Meares, Watanabe, trol is not needed. In the full profile of Structured and Scheirer 1998) have demonstrated that AAC Audio, wavetable synthesis (based on SASBF) and can provide quality nearly indistinguishable from general-purpose software synthesis (based on uncompressed digital audio at 64 kb/sec (kbps) per SAOL) may be mixed and combined as needed channel; AAC showed significantly higher quality (Scheirer and Ray 1998). than the MPEG-1 Layer 3, Dolby AC-3, or Lucent A set of MIDI semantics allows the use of MIDI PAC methods in tests at an independent laboratory data to control synthesis in SAOL. These instruc- (Soulodre et al. 1998). Using the MPEG-4 exten- tions in the standard specify how to generate sound sions, GA coding provides excellent audio quality in response to MIDI note events and controllers. at rates as low as 16 kbps per channel. Through these semantics, the wealth of existing The MPEG-4 Codebook Excited Linear Predic- content and composition tools (sequencers) may be tion (CELP) coder allows the coding of wide-band used to create MPEG-4–compatible soundtracks or narrow-band speech signals, with high quality until MPEG-4–native tools are available. at 24 kbps and excellent compression down to 12 Real-time data in the MIDI protocol (or standard kbps. The MPEG-4 Parametric Speech coder al- MIDI files) may be used to control synthesis in lows ultra-low-bit-rate compression of speech and conjunction with, or instead of, SASL control files. simple music signals down to 4 kbps per channel. As MIDI events are received in the MPEG-4 termi- nal, they are translated into SAOL events accord- ing to the MIDI semantics specified in the AudioBIFS standard. Most of the MIDI semantics provide the “obvious” behavior for MIDI events in MPEG-4 The tools described above, both synthetic and when possible. natural, represent the state of the art in sound syn- Finally, a scheduler semantics describes exactly thesis and sound compression. However, another how control events, SASL scores, MIDI data, and level of power is possible in MPEG-4 with the

Scheirer and Vercoe 41

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 7. Two sound an AudioFX node, then streams are processed and the result is mixed with a mixed using the musical background AudioBIFS scene graph. A transmitted with the speech sound is transmit- MPEG-4 Structured Audio ted with MPEG-4 CELP, system. and reverb is added with

AudioBIFS system (Scheirer, Väänänen, and of these nodes represents a signal-processing ma- Huopaniemi forthcoming), part of the MPEG-4 Bi- nipulation on one or more audio streams. In this, nary Format for Scene Description (BIFS). AudioBIFS is somewhat itself like a sound-process- AudioBIFS allows multiple sounds to be transmit- ing language, but it is much simpler (there are ted using different coders and then mixed, equal- only seven types of nodes, and only “filters,” no ized, and post-processed once they are decoded. “generators”). The scene-graph structure is used This format is structurally based on the Virtual because it is a familiar and tested mechanism for Reality Modeling Language (VRML) 2.0 syntax for computer-graphics description, and AudioBIFS is scene description (International Standardization only a small part of the overall BIFS framework. Organisation 1997), but contains more powerful The functions performed by AudioBIFS nodes al- sound-description features. low sounds to be switched (as in a multiple-lan- Using MPEG-4 AudioBIFS, each part of a guage soundtrack), mixed, delayed, “clipped” for soundtrack may be coded in the format that best interactive presentation, and gain-controlled. suits it. For example, suppose that the transmis- More-advanced effects are possible by embed- sion of voiceover with background music is de- ding SAOL code into the AudioBIFS scene graph sired in the style of a radio advertisement. It is with the AudioFX node. Using AudioFX, any audio difficult to describe high-quality spoken voice effect may be described in SAOL and applied to with sound-synthesis techniques, and so Struc- the output of a natural or synthetic audio decoding tured Audio alone cannot be used; but speech cod- process (see Figure 7). ers are not adequate to code speech with For example, if we want to transmit reverber- background music, and so MPEG-4 CELP alone ated speech with background music, we code the cannot be used. In MPEG-4 with AudioBIFS, the speech with MPEG-4 CELP, and provide an speech is coded using the CELP coder, and the AudioFX node containing SAOL code that imple- background music is coded in Structured Audio. ments the desired reverberator. As above, we code Then, at the decoding terminal, the two “streams” the background music using MPEG-4 Structured of sound are decoded individually and mixed to- Audio. When the transmission is received, the gether. The AudioBIFS part of the MPEG-4 stan- speech stream is decoded and the reverberation dard describes the synchronization provided by processing is performed; the result is added to the this mixing process. background music, which might or might not also AudioBIFS is built from a set of nodes that link be reverberated. Only the resulting sound is played together into a tree structure, or scene graph. Each back to the listener. Just as MPEG-4 Structured

42 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Audio allows exact, terminal-independent control High-quality synthesis may be performed at very of sound synthesis for the composer, MPEG-4 low bit rates, down to less than 1 kbps, and the AudioBIFS allows exact, terminal-independent synthetic content may be synchronized with natu- control of audio-effects processing for the sound ral sound content and multiplexed into a single designer and producer. program bit stream. The streaming data uses a The combination of synthetic and natural sound tokenized, compressed binary format, instead of in the same sound scene with downloaded mixing the human-readable “textual SAOL format” that is and effects processing is termed synthetic/natural used in this article’s SAOL code examples. hybrid coding, or SNHC audio coding, in MPEG-4. The Structured Audio components have also SNHC is a powerful coding concept that MPEG-4 been designed to be efficient and useful in fixed- is the first standard to use (Scheirer 1999; Scheirer, media applications, such as a studio environment Lee, and Yang forthcoming). or composition workstation. The standard in- Other features of AudioBIFS include 3-D audio cludes the textual SAOL format for this purpose. spatialization for sound presentation in virtual-re- Additionally, the standard contains suggestions for ality applications, and the creation of sound scenes stand-alone implementation authors, regarding that render differently on different terminals, de- good ways to allow access to sound samples and pending (for example) on sampling rate, speaker user data, and ways to provide debugging informa- configuration, or listening conditions. tion. The saolc implementation, which is the Finally, the AudioBIFS component of MPEG-4 is MPEG-4 Structured Audio “reference software” part of the overall BIFS system, which provides so- (the official implementation of the standard), con- phisticated functionality for visual presentation of tains a stand-alone mode in which SAOL is used streaming video, 3-D graphics, virtual-reality from a command line, very much like Csound. scenes, and the handling of interactive events. The Suggestions for implementing real-time local audio objects described with AudioBIFS and the MIDI control by attaching a MIDI keyboard are various audio decoders may be synchronized with also provided in the standard. video or computer-graphics objects, and altered in response to user interaction. Standardization in Computer Music

Streaming Media versus Fixed Media This section discusses the role that the interna- tional standardization process has played in the Like previous MPEG standards, MPEG-4 has been development of SAOL, and the advantage this pro- designed with streaming media in mind. There is cess can serve for the computer music community. an extensive set of tools in the MPEG-4 systems SAOL did not exist as a language prior to the component for efficiently handling the transport MPEG-4 standard; the standards work was not un- and delivery of streaming data, including methods dertaken in order to place an ISO “seal of ap- for multiplexing, synchronization, and back-chan- proval” on previously existing technology. Rather, nel communications. MPEG-4 Structured Audio a concerted effort was made to design the best syn- was designed in this framework, and so the sound- thesis language possible, with the broadest appli- synthesis capabilities described in this article may cation, so that MPEG-4 Structured Audio could all be transmitted as streaming data. In this model, serve the best purpose of standards: to unify a the bitstream header contains the orchestra file sometimes fragmented field and marketplace and and any necessary preparatory data; at the begin- prevent duplication of effort. If the MPEG-4 Struc- ning of the session, the client terminal processes tured Audio standard becomes broadly accepted, this header and prepares the orchestra for synthe- all computer musicians will benefit from the re- sis. Then, the streaming data contains score and sulting explosion of hardware accelerators and control information that drives the synthesis. composition tools around it.

Scheirer and Vercoe 43

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Languages and Implementations implementations, programmers (musicians) be- come “locked into” a certain implementation (be- When evaluating synthesis languages as languages, cause it is the only one that will run their it is important to keep in mind the distinction be- instruments), and there is not a competitive rea- tween language and implementation. For a par- son to develop a better one. ticular music system, the language is the set of The disadvantages of specified languages are that syntactic rules that describe what sequences of standards have a certain stifling effect on innova- characters make up legal programs, and the set of tion (C and C++ have had a stronghold on the de- semantic rules that describe how to make sound velopment of production code for many years), and from a program. An implementation, by contrast, they might not utilize the resources of a particular is a particular computer program that can perform computing environment with maximal efficiency. the required mapping from a sequence of charac- Even after years of compiler development, hand- ters to sound. Depending on the music system, coded assembly, especially for high-performance other information such as scores, MIDI files, con- architectures such as DSPs or vector processors, trol sequences, or real-time interaction may affect still may have performance advantages over code this mapping process. compiled from high-level languages. But since as- It has traditionally been the case in the com- semblers are generally incompatible from architec- puter music field that languages have been defined ture to architecture, the development tools for in regard to implementations. That is, the Csound assembly language coding are not as advanced as language is not defined anywhere except as a com- those for high-level languages. With sufficient ef- puter program that accepts or rejects particular se- fort and programming skill, advanced development quences of characters. As this implementation environments and high-performance signal-pro- evolves and changes over time, the set of “legal” cessing capabilities can be merged in a single tool. Csound orchestras changes as well (generally by James McCartney’s language and software sys- expansion, in a backward-compatible manner). tem, SuperCollider (McCartney 1996a, b), is an ex- This is not the case in the computer-language cellent example of this: it provides highly efficient world at large; languages such as C, Fortran, Lisp, execution, a sophisticated front end, and a synthe- Ada, C++, and Java have clearly written specifica- sis language with more-advanced capabilities than tions that describe exactly what a legal implemen- SAOL, but only in a proprietary framework on a tation must and must not do. single platform. Note, though, that the quality of The primary advantage of the specified-language SuperCollider’s front end and the efficiency of its model is that it promotes compatibility between execution are not properties of SuperCollider as a implementations. For a large set of synthesis language, but rather a testament to Mr. nonpathological programs, a programmer is guar- McCartney’s skill in creating an implementation of anteed that a C program will execute the same the language. Similarly, the restriction to a single way under any legal C compiler. The ANSI C stan- platform is not intrinsic to the language; the lan- dard tells the developer of a new C compiler ex- guage could, once clearly specified, be implemented actly what the compiler must do in response to a on other platforms with other front ends. particular program. This interoperability promotes As computers and signal-processing units get the development of an efficient marketplace for ever faster, eventually the need for high-quality compilers, debuggers, editors, and other program- and compatible development tools for synthesis ming tools. If a company develops a new compiler languages will be more pressing than the need to that is much faster than others available, the mar- squeeze every cycle out of existing processors. ketplace can shift quickly because existing code is This is the world that the design of SAOL targets: still useful, and so there is a competitive advan- one in which musicians are willing to accept 75- tage to a company to develop such powerful com- percent performance on state-of-the-art hardware pilers. If languages are not compatible between in exchange for inexpensive yet powerful imple-

44 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 mentations, sophisticated interfaces, cross-plat- The MIDI standard was created to enable data form compatibility, and a broad and competitive exchange between fixed hardware . In marketplace for tools, music, and hardware. contrast, the MPEG-4 Structured Audio standard Although, as of this writing, there are not yet was created to enable data exchange between flex- implementations of SAOL whose speed is com- ible software synthesizers. The MIDI standard, al- petitive with tools that have more development though perhaps not as capable as many musicians time behind them, this is not a property of the lan- would like, provides a valuable demonstration that guage. The SAOL standard requires real-time manufacturers and tool developers will indeed implementation for compliance to the standard, rally around a public standard for the exchange of and so real-time SAOL processors will rapidly music data. emerge from companies who wish their tools to be MPEG-4 compliant. The technology required to implement such tools efficiently is fairly well un- Compatibility derstood today. Thanks to the advances of innova- tors such as James McCartney and Roger The concrete specification of SAOL as part of the Dannenberg, creating implementations from a MPEG-4 standard means that any SAOL-compli- well-specified language design is (mostly) a matter ant tool will be compatible at the SAOL level with of engineering. This is especially true when one any other such tool. Such compatibility has many considers the amount of development resources implications. It makes it much easier to rapidly that will be directed toward SAOL implementa- develop new software-synthesis tools such as tion as a result of its inclusion in MPEG-4. graphical user interfaces (GUIs) for instrument construction. Such a tool may be built to interact with a SAOL-compliant hardware synthesizer, and The MIDI Standard so the GUI author does not have to construct the real-time synthesis capabilities as part of the tool. A widely used existing standard for the representa- Similarly, a production musician who wishes to tion of sound-control parameters is the MIDI pro- use new sounds does not have to develop them on tocol (MIDI Manufacturers Association 1996; Loy his or her own. Because of the modular nature of 1985). In some ways, representation of music as a SAOL, instruments, effects algorithms, and unit set of MIDI bytes is similar to the Structured Au- generators may be easily traded (or bought and dio concept, as compression is achieved in similar sold) between developers, and then rapidly used as ways. However, MPEG-4 Structured Audio is a part of a new orchestra. Finally, since the MPEG-4 standard for the representation of sound, not just standard will be widely used in low-bit-rate com- musical control. In MIDI-based music exchange, munication applications such as digital broadcast- musicians have little control over the exact ing and satellite-to-mobile-platform transmission, sounds produced (since this is left to the particular there will be a great need for computer musicians hardware/software system that implements the to build the synthesis hardware and composition synthesis), and as a result, MIDI is not generally tools, and to compose the music that will be used an appropriate exchange format for serious musi- in such everyday applications. cal content (Moore 1991). In contrast, sound transmission in SAOL is much more exact; if composers desire the exact re- Open Standards creation of certain sounds, this is always possible in MPEG-4. The MPEG-4 standard places tight The ISO in general, and the MPEG working group control—in many places, sample-by-sample speci- in particular, are emblematic of the open-stan- fication—on the output of the synthesis in re- dards process in the development of new technol- sponse to a particular SAOL program. ogy. Any researcher or company may make

Scheirer and Vercoe 45

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 suggestions regarding the future development of has been essentially no attempt to optimize saolc MPEG-4 and its Structured Audio tools (such as for speed, so the performance of saolc should not proposals for corrigenda, if parts of the standard be taken as indicative of the performance of an op- are found to be “broken” or incomplete). Accord- timized SAOL implementation. Moreover, saolc ing to MPEG’s rules of procedure, these contribu- provides no user interface except command-line tions must be formally evaluated, and if they are and bit-stream processing capabilities. judged to be important and to move in a positive direction, they will be incorporated into the stan- dard. Anyone can read and evaluate the capabili- The Big Picture ties of the standard for themselves. In addition, MPEG maintains a software implementation of Julius Smith (1991) wrote an incisive critique of the entire MPEG-4 standard, which is available to the direction of progress in the digital musical in- anyone interested in developing his or her own strument industry. Unfortunately, many of his ob- tools compliant with MPEG-4. The MIT Media servations still hold true today: “The ease of using Lab wrote and maintains the reference source code MIDI synthesizers has sapped momentum from for the SAOL tools, and has released this source synthesis algorithm research by composers… MIDI code into the public domain for free use by the synthesizers offer only a tiny subset of the synthe- community. The Media Lab maintains no intellec- sis techniques possible in software…The disadvan- tual-property rights or proprietary control over the tage [of MIDI] is greatly reduced generality of direction of the standard, and will not gain materi- control, and greatly limited synthesis specifica- ally from its acceptance in any way. tion” (Smith 1991). Even with the development of The reference implementation of SAOL, called the MIDI-DLS specification, the limitations of saolc, is not intended to be a tool that is usable by MIDI (and now DLS) continue to be a powerful musicians or competitive with modern synthesis- force shaping the capabilities of computer sound programming environments. It is a simple, text- processing. based implementation of SAOL, SASL, SASBF, the This is a particularly disturbing state of affairs, MIDI rules, and the scheduler semantics in MPEG- given the vast developments in graphics process- 4 Structured Audio that is supposed to be exactly ing, multimedia systems, and programming inter- conformant to the standard and relatively easy to faces that have taken place since Julius Smith’s read and understand. MPEG provides reference article was written. The world of graphics on per- software for MPEG-4 as a secondary reference for sonal computers has evolved through several gen- normative behavior of the standard. Organizations erations and a great deal of technological interested in developing more practical tools can sophistication since 1991, but the fundamental use saolc as a guide. sound architecture of the average PC has taken The source base for saolc is about 32,000 lines of only a single step: instead of FM synthesis control- C and C++ code totaling about 1 MB of data. On lable with “note-on, note-off” instructions, now the Silicon Graphics, Inc. (SGI) platform, this com- we have wavetable synthesis controllable with piles into an executable with a 900-KB footprint; note-on, note-off instructions. Soon, new develop- on Windows 95 under Visual C++, the executable ments will allow composers to specify wavetables footprint is about 380 KB. The code base is in- of their own, rather than having them provided by tended to be widely portable and compiles as-is on the sound-card manufacturers, and control them (at least) SGI, Alpha, Linux, Sun, and Win32 plat- with note-on, note-off instructions. forms. The performance of saolc is extremely poor The MPEG-4 Structured Audio tools have the compared to modern real-time software synthesiz- potential to change this situation. The structure ers; even on a high-performance machine such as and format will be taken seriously, for MPEG stan- an SGI Octane, only one or two voices of very dards are an important touchstone in the com- simple instruments will run in real time. There puter industry at large. Through the inclusion of

46 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 general-purpose software-synthesis tools in the anonymous referees whose critiques greatly im- MPEG-4 standard, the needs and inventions of the proved the article. computer music community have been exposed to The work described herein has had many con- the computer industry for a very broad hearing. tributors, including, but not limited to: Paris We hope that other computer musicians will join Smaragdis, Bill Gardner, Michael Casey, Adam us in supporting the new standard, developing Lindsay, Giorgio Zoia, Jyri Huopaniemi, Riitta conformant implementations, and creating music Väänänen, Itaru Kaneko, Shigeki Fujii, Lee Ray, and tools that can help it to prosper. Brian Link, Luke Dahl, Dave Sparks, Billy Brackenridge, and Tom White. Thanks to Pete Schreiner, Pete Doenges, and Leonardo Chiariglione Conclusion for overseeing the MPEG-4 project and the Struc- tured Audio components, and to Ron Burns and SAOL, the MPEG-4 Structured Audio Orchestra Don Mead from Hughes Aircraft Company for ener- Language, is a powerful and flexible new synthesis gizing the Media Lab’s contribution to MPEG. language. Further, as part of the MPEG-4 Interna- tional Standard, there will soon be many real-time References implementations, composer’s desktops, and other tools available for using it. The move to accep- Allan, V. H., R. B. Jones, R. M. Lee, and S. J. Allan. tance of powerful open standards in the computer 1995. “Software Pipelining.” ACM Computing Sur- music field will create an explosion of opportunity veys 27(3):367–432. for musicians and technologists with the creativ- Boulanger, R., ed. Forthcoming. The Csound Book. ity and skill these tools demand, and will create a Cambridge, Massachusetts: MIT Press. “rising tide” supporting other real-time software- Casey, M. A., and P. Smaragdis. 1996. “Netsound: Real- synthesis tools. Time Audio from Semantic Descriptions.” Proceed- For readers interested in using or developing ings of the International Computer Music tools based on SAOL, the SAOL home page on the Conference. San Francisco: International Computer World Wide Web may be found at http:// Music Association, p. 143. sound.media.mit.edu/mpeg4. This site contains Dannenberg, R. B. 1997a. “Machine Tongues XIX: current information on the progress of the stan- Nyquist, a Language for Composition and Sound Syn- thesis.” Computer Music Journal 21(3):50–60. dard, up-to-date software implementations, ex- Dannenberg, R. B. 1997b. “The Implementation of ample compositions, a library of user-defined unit Nyquist, a Sound-Synthesis Language.” Computer generators, complete documentation on the SAOL Music Journal 21(3):71–82. language, and mailing lists that support the SAOL International Standardization Organisation (ISO). 1997. community. ISO/IEC 14472–1 International Standard: Virtual Re- ality Modeling Language (VRML). Available at http:// www.vrml.org. Acknowledgments International Standardization Organisation (ISO). 1999. ISO 14496-3:1999 (MPEG-4 Audio). Geneva: Interna- The first author is grateful to the Interval Research tional Standardization Organization. Corporation (Palo Alto, California) for its fellow- Loy, D. G. 1985. “Musicians Make a Standard: The MIDI Phenomenon.” Computer Music Journal 9(4):8–25. ship support over the first year of this work, and to Mathews, M. V. 1969. The Technology of Computer the Digital Life consortium of the MIT Media Music. Cambridge, Massachusetts: MIT Press. Laboratory for ongoing research funding. As al- McCartney, J. 1996a. SuperCollider: A Real-Time ways, the Machine Listening Group of the Media Sound Synthesis Programming Language (program Lab, especially Keith Martin and Youngmoo Kim, reference manual). Austin, Texas. Available at http:// have been essential through their comments and www.audiosynth.com. critiques on this article. Thanks also to two McCartney, J. 1996b. “SuperCollider: A New Real-Time

Scheirer and Vercoe 47

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Sound Synthesis Language.” Proceedings of the Interna- coming. “AudioBIFS: Describing Audio Scenes with tional Computer Music Conference. San Francisco: In- the MPEG-4 Multimedia Standard.” To appear in ternational Computer Music Association, pp. 257–258. IEEE Transactions on Multimedia. McMillen, K., D. L. Wessel, and M. Wright. 1994. “The Schottstaedt, W. 1994. “Machine Tongues XVII: CLM: ZIPI Music Parameter Description Language.” Com- Music V Meets Common Lisp.” Computer Music puter Music Journal 18(4):52–73. Journal 18(2):30–37. Meares, D., K. Watanabe, and E. D. Scheirer. 1998. “Re- Shepard, R. N. 1964. “Circularity in Judgments of Rela- sults of the MPEG-2 AAC Stereo Verification Tests.” tive Pitch.” Journal of the Acoustical Society of ISO/IEC JTC1/SC29/WG11 (MPEG) document America 36(12):2346–2353. N2006. San Jose, California: International Standard- Smith, J. O. 1991. “Viewpoints on the History of Digital ization Organisation. Available at http:// Synthesis.” Proceedings of the 1991 International www.cselt.it/mpeg. Computer Music Conference. San Francisco: Interna- MIDI Manufacturers Association (MMA). 1996. “The tional Computer Music Association, pp. 1–10. Complete MIDI 1.0 Detailed Specification v. 96.2.” Soulodre, G. A, T. Grusec, M. Lavoie, and L. Thibault. Ordering information available at http:// 1998. “Subjective Evaluation of State-of-the-Art Two- www.midi.org. Channel Audio Codecs.” Journal of the Audio Engi- Moore, F. R. 1991. “The Dysfunctions of MIDI.” Com- neering Society 46(3):164-177. puter Music Journal 12(1):19–28. Vercoe, B. L. 1995. Csound: A Manual for the Audio Pope, S. T. 1993. “Machine Tongues XV: Three Pack- Processing System (program reference manual). Cam- ages for Software Sound Synthesis.” Computer Music bridge, Massachusetts: MIT Media Laboratory. Journal 17(2):23–54. Vercoe, B. L., W. G. Gardner, and E. D. Scheirer. 1998. Pope, S. T. 1997. Sound and Music Processing in “Structured Audio: The Creation, Transmission, and SuperCollider. Available at http:// Rendering of Parametric Sound Descriptions.” Pro- www.create.ucsb.edu/htmls/sc.book.html. ceedings of the IEEE 86(5):922–940. Quackenbush, S. 1998. “Natural Audio Coding in Wright, M. 1994. “A Comparison of MIDI and ZIPI.” MPEG-4.” Proceedings of the IEEE International Computer Music Journal 18(4):86–91. Conference on Acoustics, Speech, and Signal Process- Wright, M. 1998. “Implementation and Performance Is- ing. Washington, DC: Institute for Electrical and sues with OpenSound Control.” Proceedings of the Electronics Engineers, pp. 3797–3800. 1998 International Computer Music Conference. San Roads, C. 1996. The Computer Music Tutorial. Cam- Francisco: International Computer Music Associa- bridge, Massachusetts: MIT Press. tion, pp. 224–227. Scheirer, E. D. 1998. “The MPEG-4 Structured Audio Wright, M., and A. Freed. 1997. “OpenSound Control: A Standard.” Proceedings of the IEEE International New Protocol for Communicating with Sound Syn- Conference on Acoustics, Speech, and Signal Process- thesizers.” Proceedings of the 1997 International ing. Seattle: Institute for Electrical and Electronics Computer Music Conference. San Francisco: Interna- Engineers, pp. 3801–3804. tional Computer Music Association, pp. 101–104. Scheirer, E. D. 1999. “Structured Audio and Effects Pro- cessing in the MPEG-4 Multimedia Standard.” Multi- media Systems 7(1):11–22. Appendix 1: Core Opcodes in SAOL Scheirer, E. D, Y. Lee, and J.-W. Yang. Forthcoming. “Synthetic Audio and SNHC Audio in MPEG-4.” In This appendix contains a complete list of the A. Puri and T. Chen, eds. Advances in Multimedia: built-in “opcodes,” or unit generators, in SAOL. Signals, Standards, and Networks. New York: Marcel Dekker. Space does not permit a full description of the pa- Scheirer, E. D., and L. Ray. 1998. “Algorithmic and rameters, syntax, and semantics of each; interested Wavetable Synthesis in the MPEG-4 Multimedia readers are referred to the standard or to our on- Standard.” Proceedings of the 105th AES Convention. line documentation. In the standard, the operation San Francisco: Audio Engineering Society. (Available of each unit generator is defined at the sample-by- as reprint #4811.) sample level. All of these unit generators must be Scheirer, E. D., R. Väänänen, and J. Huopaniemi. Forth- present in any compliant SAOL implementation.

48 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Math Functions These generate a periodic audio signal from a wavetable, using simple loops, frequency match- Each of these implements the corresponding math- ing, or sample-rate matching, respectively: ematical function: oscil, loscil, doscil. int, frac, abs, sgn, exp, log, sqrt, This generates a periodic control signal from a sin, cos, tan, asin, acos, atan, pow, wavetable: log10, floor, ceil, min, max. koscil. These allow conversion from amplitude multi- pliers to decibels and back: dbamp, ampdb. Signal Generators

These generate a line-segment signal at the control Pitch Converters rate or audio rate, respectively:

These convert from one of SAOL’s pitch formats kline, aline. to another. The pitch formats are: octave-fraction These generate an exponentially curved signal at (oct), pitch class (pch), frequency in Hz (cps), and the control rate or audio rate, respectively: MIDI note number: kexpon, aexpon. octpch, pchoct, cpspch, pchcps, octcps, cpsoct, midipch, pchmidi, midioct, These generate a control-rate or audio-rate con- octmidi, midicps, cpsmidi. tinuous-phase signal, respectively: These allow the “global tuning” of the orchestra kphasor, aphasor. to be inspected and changed: These perform Karplus-Strong or parametric gettune, settune. granular synthesis: pluck, grain.

Stored-Function-Table Operations This generates band-limited pulse-train signals: buzz. These allow access to the length, loop point, loop end point, sampling rate, and base frequency (if any) of a stored function table: Noise Generators ftlen, ftloop, ftloopend, ftsr, ftbasecps. These generate “white” random numbers or noise: These allow the loop point, loop end point, base irand, krand, arand. frequency, and sampling rate of a stored function table to be modified: These generate random numbers or noise from a triangular distribution: ftsetloop, ftsetend, ftsetbase, ftsetsr. ilinrand, klinrand, alinrand. These provide direct read/write access to the These generate random numbers or noise from sample data in a stored-function table: an exponential distribution: tableread, tablewrite. iexprand, kexprand, aexprand.

Scheirer and Vercoe 49

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 These generate a Poisson-distributed random- To rescale an audio signal so that it has specified impulse sequence or signal: power, or power that matches a reference signal, use: kpoissonrand, apoissonrand. gain, balance. These generate random numbers or noise from a To perform parametric power-level compression Gaussian distribution: on an audio signal, use: igaussrand, kgaussrand, agaussrand. compressor.

Filters Sample-Rate Conversion

Turn a discrete sequence of values into a continu- Use the following to decimate an audio signal to a ous control signal with: control signal, or upsample and downsample be- tween audio and control signals: port. decimate, upsamp, downsamp. Parametric second-order filters include: To gate an audio signal with a control signal, use: hipass, lopass, bandpass, bandstop. samphold. Exactly normative filtering using the canonical second-order section is: Place blocks of samples from an audio signal into a wavetable with: biquad. sblock. For IIR all-pass and comb filters of specified de- lay and feedback gain, use: Delays allpass, comb. To delay one sample or to delay a specified These are general FIR and IIR filters. The first amount of time, respectively, use: two operate from parametric coefficients; the last two store the coefficients in a stored-function table: delay1, delay. fir, iir, firt, iirt. A flexible fractional and multi-tap delay-line tool is: Spectral Analysis fracdelay.

To perform windowed sliding-block DFTs, placing Effects the result in a stored-function table, use fft. Apply reverberation, chorusing, flanging, or time shifting (pitch-preserving speed change) to an au- To perform windowed sliding-block IDFTs, con- dio signal with the following: verting spectral frames in stored-function tables into an audio signal, use: reverb, chorus, flange, speedt. ifft. Tempo Control Gain Control To query or set the playback tempo of the orches- Calculate the power in an audio signal with: tra, use: rms. gettempo, settempo.

50 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Appendix 2: expseg—place a function made up of exponential Core Function-Table Generators in SAOL curves in a stored-function table polynomial—place an arbitrary polynomial func- This appendix contains a complete list of the tion in a stored-function table built-in function-table generators in SAOL. Space spline—place a spline curve on a given set of does not permit a full description of the param- control points in a stored-function table eters, syntax, and semantics of each; interested window—place a window function (Boxcar, readers are referred to the standard or to our on- Hamming, Bartlett, Kaiser, Gaussian) in a line documentation. In the standard, the operation stored-function table of each is defined at the sample-by-sample level. harm—place a sum of zero-phase, harmonically All of these function-table generators must be related sinusoids in a stored-function table present in any compliant SAOL implementation: harm_phase—place a sum of phased, harmoni- cally related sinusoids in a stored-function table sample—place a sound sample in a stored-func- periodic—place an arbitrary sum-of-sinusoids tion table function in a stored-function table data—place a sequence of specific data values in buzz—place a ban-limited pulse train in a a stored-function table stored-function table random—place random values, drawn from one concat—concatenate two or more function of several distributions, in a stored-function tables together, and place the result in a new table stored-function table step—place a step function in a stored-function empty—create space for an empty stored-func- table tion table (composers may write i-rate user-de- lineseg—place a function made up of linear seg- fined opcodes that effectively act as ments in a stored-function table user-defined table generators)

Scheirer and Vercoe 51

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021