SAOL: the MPEG-4 Structured Audio Orchestra Language

Eric D. Scheirer and Barry L. Vercoe Machine Listening Group SAOL: The MPEG-4 E15-401D MIT Media Laboratory Cambridge, Massachusetts 02139-4307, USA Structured Audio [email protected] [email protected] Orchestra Language Since the beginning of the computer music era, The Motion Pictures Experts Group (MPEG), tools have been created that allow the description part of the International Standardization Organiza- of music and other organized sound as concise net- tion (ISO) finished the MPEG-4 standard, formally works of interacting oscillators and envelope func- ISO 14496, in October 1998; MPEG-4 will be des- tions. Originated by Max Mathews with his series ignated as an international standard and published of “Music N” languages (Mathews 1969), this unit in 1999. The work plan and technology of MPEG-4 generator paradigm for the creation of musical represent a departure from the previous MPEG-1 sound has proven highly effective for the creative (ISO 11172) and MPEG-2 (ISO 13818) standards. description of sound and widely useful for musi- While MPEG-4 contains capabilities similar to cians. Languages such as Csound (Vercoe 1995), MPEG-1 and MPEG-2 for the coding and compres- Nyquist (Dannenberg 1997a), CLM (Schottstaedt sion of audiovisual data, it additionally specifies 1994), and SuperCollider (McCartney 1996b) are methods for the compressed transmission of syn- widely used in academic and production studios thetic sound and computer graphics, and for the today. juxtaposition of synthetic and “natural” (com- As well as being an effective tool for marshalling pressed audio/video) material. a composer’s creative resources, these languages Within the MPEG-4 standard, there is a set of represent an unusual form of digital audio com- tools of particular interest to computer musicians pression (Vercoe, Gardner, and Scheirer 1998). A called Structured Audio (Scheirer 1998, 1999; program in such a language is much more succinct Scheirer, Lee, and Yang forthcoming). The MPEG-4 than the sequence of digital audio samples that it Structured Audio tools allow synthetic sound to be creates, and therefore this method can allow for transmitted as a set of instructions in a unit-gen- more dramatic compression than traditional audio erator-based language, and then synthesized at the coding. The idea of transmitting sound by sending receiving terminal. The synthesis language used in a description in a high-level synthesis language MPEG-4 for this purpose is a newly devised one and then performing real-time synthesis at the re- called SAOL (pronounced “sail”), for Structured ceiving end, which Vercoe, Gardner, and Scheirer Audio Orchestra Language. By integrating a music- (1998) term structured audio, was suggested as synthesis language into a respected international early as 1991 (Smith 1991). A project at the Massa- standard, the required broad base of systems can be chusetts Institute of Technology (MIT) Media established, and industrial support for these power- Laboratory called NetSound (Casey and Smaragdis ful capabilities can be accelerated. The sound-syn- 1996) constructed a working system based on this thesis capabilities in MPEG-4 have a status concept, using Csound as the synthesis engine, equivalent to the rest of the coding tools; a compli- and allowing low-bit-rate transmission on the ant implementation of the full MPEG-4 audio sys- Internet. If it were possible to create a broad base tem must include support for real-time synthesis of mutually compatible installed systems and mu- from SAOL code. sical compositions designed to be transmitted in In this article, we describe the structure and ca- this manner, this technique could have broad util- pabilities of SAOL. Particular focus is given to the ity for music distribution. comparison of SAOL with other modern synthesis languages. SAOL has been designed to be inte- Computer Music Journal, 23:2, pp. 31–51, Summer 1999 grated deeply with other MPEG-4 tools, and a dis- © 1999 Massachusetts Institute of Technology. cussion of this integration is presented. However, Scheirer and Vercoe 31 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 it is also intended to be highly capable as a stand- expressive (so that musicians can do complex alone music-synthesis language, and we provide things easily), and highly functional (so that any- some thoughts on the implementation of efficient thing that can be done with digital audio can be stand-alone SAOL musical instruments. Strengths expressed in SAOL). Additionally, SAOL as a lan- and weaknesses of the language in relation to guage should lend itself to efficient implementa- other synthesis languages are also discussed. A dis- tions in either hardware or software. cussion of the role of the MPEG-4 International As well as the new features of SAOL that are de- Standard in the development of future computer scribed below, many well-established features of music tools concludes the article. Music-N languages (Mathews 1969; Pope 1993) are retained. SAOL, like other Music-N languages, de- fines an instrument as a set of digital signal-pro- SAOL: Structure and Capabilities cessing algorithms that produces sound. A set of instruments is called an orchestra. Other retained SAOL is a declarative unit-generator-based lan- features include: the sample-rate/control-rate dis- guage. In this respect, it is more like Csound tinction, which increases efficiency by reducing (Vercoe 1995; Boulanger forthcoming) than it is sample-by-sample calculation and allowing block- like SuperCollider (McCartney 1996a, b; Pope based processing; the orchestra/score distinction, 1997) or Nyquist (Dannenberg 1997a); Nyquist in which the parametric signal-processing instruc- employs a functional-programming model in its tions in the orchestra are controlled externally by design, and SuperCollider employs an object-ori- a separate event list called the score (one of ented model. SAOL extends the syntax of Csound Nyquist’s innovations is the removal of this dis- to make it more understandable and concise, and tinction); the use of instrument variables to en- adds a number of new features to the Music-N capsulate intermediate states within instruments model that are discussed below. and global variables to share values between in- It is not our contention that SAOL is a superior struments; and a heavy dependency on stored- language to the others we cite and compare here. In function tables or wavetables to allow efficient fact, our belief is somewhat the opposite: the dif- processing of periodic signals, envelopes, and other ferences between general-purpose software-synthe- functions. These historical aspects of SAOL will sis languages are generally cosmetic, and features not be discussed further here, but excellent sum- of the languages’ implementations are much more maries on the evolution and syntactic construction crucial to their utility for composers. For the of synthesis languages may be found in other refer- MPEG-4 project, we developed SAOL anew because ences (Roads 1996; Dannenberg 1997a, b; and it has no history or intellectual-property encum- Boulanger forthcoming, among others). brances that could impede the acceptance of the standard. SAOL is not a research project that pre- sents major advances in synthesis-language design; Readability rather, it is an attempt to codify existing practice, as expressed in other current languages, to provide Where Csound is “macro-assembly-like,” Nyquist a fixed target for manufacturers and tools develop- is “Lisp-like,” and SuperCollider is “Smalltalk- ers making use of software-synthesis technology. like,” SAOL is a “C-like” language. In terms of There were several major design goals in the cre- making the language broadly readable, this is a ation of SAOL. These were: to design a synthesis good step, because C is the most widely used of language that is highly readable (so that it is easy these languages. The syntactic framework of to understand and to modify instrument code), SAOL is familiar to anyone who programs in C, al- highly modular (so that general-purpose process- though the fundamental elements of the language ing algorithms can be constructed and reused are still signal variables, unit generators, instru- without modification in many orchestras), highly ments, and so forth, as in other synthesis lan- 32 Computer Music Journal Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892699559742 by guest on 26 September 2021 Figure 1. A SAOL instrument that makes a short tone. // This is a simple SAOL instrument that makes a short tone, // using an oscillator over a stored function table. instr beep(pitch,amp) { table wave(harm,2048,1); // sinusoidal wave function asig sound; // ‘asig’ denotes audio signal ksig env; // ‘ksig’ denotes control signal env = kline(0,0.1,1,dur-0.1,0); // make envelope sound = oscil(wave, pitch) * amp * env; // create sound by enveloping an oscillator output(sound); // play that sound } guages. (The exact syntax of C is not used; there Modularity are several small differences that make the language easier to parse.) The program in Figure 1 There is a highly capable set of unit generators shows a simple SAOL instrument that creates a built into the SAOL specification (100 in all; see simple beep by applying an envelope to the output Appendix 1). This set is fixed in the standard, and of a single sinusoidal oscillator. all implementations of SAOL must implement A number of features are immediately apparent them. However, SAOL may be dynamically ex- in this instrument. The instrument name (beep), tended with new unit generators within the lan- parameters (or “p-fields”: pitch and amp), stored- guage model. While other Music-N languages function table (wave), and table generator (harm) require rebuilding the language system itself to all have names rather than numbers. All of the sig- add new unit generators, this capability is a funda- nal variables (sound and env) are explicitly de- mental part of SAOL.

SAOL: the MPEG-4 Structured Audio Orchestra Language

Flocking: a Framework for Declarative Music-Making on the Web

The Early History of Music Programming and Digital Synthesis, Session 20

Real-Time Multimedia Composition Using

Computer Music (So Far)

Towards a Functional-Aesthetic Sonification Design Framework A

A Declarative Metaprogramming Language for Digital Signal

Deubois New Edit

Computational Composition Strategies in Audiovisual Laptop Performance

Flocking: a Framework for Declarative Music-Making on the Web

Chugens, Chubgraphs, Chugins: 3 Tiers for Extending Chuck

Subsynth: a Generic Audio Synthesis Framework for Real-Time Applications

From Live Coding to Virtual Being