Proceedings of the Audio Conference 2010

May 1st – 4th, 2010 Hogeschool voor de Kunsten Utrecht Utrecht, The Netherlands Published by Hogeschool voor de Kunsten, Utrecht, The Netherlands May 2010 All copyrights remain with the authors http://lac.linuxaudio.org/2010

Credits Cover design: Tijs Ham Layout: Frank Neumann Editor: Maurits Lamers Typesetting: LaTeX and pdfLaTeX

About the cover The cover text is created using a PureData patch. When you play the patch, you should get an idea of what the penguin at the bottom is singing...

Thanks to: Martin Monperrus for his webpage ”Creating proceedings from PDF files”

Printed in Hilversum by Print service ”de Toekomst” http://www.detoekomst.nl

ii Partners and Sponsors

Hogeschool voor de Kunsten

Loohuis Consulting

Linux Weekly News

Linuxaudio.org

Elastique

Tonehammer

Muziekinstituut MultiMedia

iii

LAC 2010: a Linux Audio Conference

Since 2003 the Linux Audio Conference has become a yearly recurring international event where users and developers of open source music software and Linux Audio in particular get together.

This year the LAC, as it is known by some, is hosted by the Hogeschool voor de Kunsten Utrecht. The HKU offers bachelor and master programmes for various disciplines such as Music, Theatre, Design, Media, Games, Interaction and Music & Technology. Students and alumni of the HKU participate in the organisation and contribute to the conference.

So why is the HKU doing this? And why am I doing this? To answer the first question: because the HKU recognises the importance of Open Source Software for education and, vice versa, by involving the generations that will shape the future we hope to enrich the Linux Audio community.

As for the second question: I’m doing this because I am a long time Linux Audio user, an Open Source advocate, a lecturer at the HKU and I need the freedom that Open Source Software gives me. Oh, and because at last year’s LAC in Parma, Italy, I said to someone in the pub how nice it would be to host this conference in a different country or even continent every year. From that moment on my fate was sealed: I was going to get this party to Utrecht.

Originally a Linux Audio developers meeting, the LAC has developed into a conference that comprises not only Linux Audio but also other Open Source Music Software and Open Content. Of course, with Linux being a true Open Source platform, Linux Audio remains the backbone of the venue. The Linux Audio community is more alive than ever, as is demonstrated by this book with its collection of high-standard papers.

The conference offers enough diversity to attract people with different backgrounds and one common denominator: their devotion to open systems. With over 75 scheduled events such as presentations, workshops, discussion panels and concerts, there are lots of opportunities for everyone to participate, learn and find new inspiration, work relations and even friends. My personal hope for the LAC 2010 is that it will contribute to the acceptance and use of open systems for creating, producing, understanding and enjoying music.

To conclude I would like to thank Ad Wisman for giving his approval and Jan IJzermans and Rens Machielse for backing me up. During the entire process, Robin Gareus and Frank Neumann did a tremendous job with advice, support, creating nifty solutions and handling the entire process from papers to proceedings. Thanks guys, I owe you one! Ico Bukvic, thanks for keeping the server up and running. Thank you Farida Ayojil for all your support with respect to the location, security and catering.

v And then there are Than van Nispen tot Pannerden and Bertus van Dalen who generated lots of ideas, helped out in numerous situations and gave a whole new dimension to this conference. To all the others I didn’t mention: your contributions are highly appreciated.

I wish you all a very good LAC 2010 and hope you enjoy your stay in Utrecht!

Marc Groenewegen

vi Special thanks to ...

Conference Organisation Marc Groenewegen Robin Gareus Frank Neumann Marcel Wierckx Than van Nispen tot Pannerden Bertus van Dalen

Program Marc Groenewegen Than van Nispen tot Pannerden Bertus van Dalen Robin Gareus Frank Neumann

Streaming Jorn¨ Nettingsmeier Christian Thater¨ Emile Bijk Herman Robak Wouter Verwijlen Raffaella Traniello Ernst Roza

T-shirt Design Tijs Ham

Public Relations Marc Groenewegen Femke Langebaerd

Conference Website Marc Groenewegen Robin Gareus

Logo Design Niek Kok Tamara Houtveen

Timetable database and web front-end Robin Gareus Marc Groenewegen

vii Concert Organisation Marc Groenewegen

Concert Sound Stefan Janssen

Composition Contest ”150 Years of Music Technology” Than van Nispen tot Pannerden

Magazine ”The LAC Times” Niek Kok Sander Oskamp Marc Groenewegen Than van Nispen tot Pannerden Bertus van Dalen Maarten Brinkering Marcel Wierckx

Consulting and Assistance Luc Nieland Than van Nispen tot Pannerden Bertus van Dalen Danique Kamp Farida Ayojil Henk Schraa Jos Bertens Roderick van Toor Sanne Smits Maurits Lamers Armijn Hemel Marije Baalman Fons Adriaensen

viii Operational team as known by April 25, 2010 Bertus van Dalen Ciska Vriezenga Daan van Hasselt Danique Kamp Eric Magnee Farida Ayojil Frank Neumann Giel Dekkers Harry van Haaren Henk Schraa Johan Rietveld Jos Bertens Luc Nieland Marcel Wierckx Mario van Etten Martijn van Gessel Maurits Lamers Roald van Dillewijn Roderick van Toor Sanne Smits Simon Johanning Stefan Janssen Than van Nispen tot Pannerden Thijs Koerselman Tijs Ham

Photography Brendon Heinst

Paper Administration and Proceedings Frank Neumann

...and to everyone else who helped in numerous places after the editorial deadline of this publication.

ix Review Committee Fons Adriaensen Casa della Musica, Parma, Italy Marije Baalman Concordia University, Montreal, Canada Frank Barknecht Footils, Cologne, Germany Ico Bukvic Virginia Tech, Blacksburg, VA, United States Gotz¨ Dipper ZKM, Karlsruhe, Germany Florian Faber Marburg, Germany Robin Gareus University of Paris, France Marc Groenewegen Hogeschool voor de Kunsten Utrecht, Utrecht, The Netherlands Frank Neumann Harman International, Karlsbad, Germany Yann Orlarey GRAME, Lyon, France Dave Philipps United States Martin Rumori Academy of Media Arts, Cologne, Germany Pieter Suurmond Hogeschool voor de Kunsten Utrecht, Utrecht, The Netherlands Victor Lazzarini NUI Maynooth, Ireland

Music Jury Hans Timmermans Than van Nispen tot Pannerden Bertus van Dalen Pieter Suurmond Marc Groenewegen

x Table of Contents

QuteCsound, a Csound Frontend 1 • Andres´ Cabrera

Implementing a Polyphonic MIDI Software Synthesizer using Coroutines, • Realtime Garbage Collection, Closures, Auto-Allocated Variables, Dynamic Scoping, and Continuation Passing Style Programming 7 Kjetil Matheussen

Writing Audio Applications using GStreamer 15 • Stefan Kost

Emulating a Combo Organ Using Faust 21 • Sampo Savolainen

LuaAV: Extensibility and Heterogeneity for Audiovisual Computing 31 • Graham Wakefield, Wesley Smith, Charles Roberts

The WFS system at La Casa del Suono, Parma 39 • Fons Adriaensen

General-purpose Ambisonic playback systems for electroacoustic • concerts - a practical approach 47 Jorn¨ Nettingsmeier

Real-Time Kernel for Audio and Visual Applications 57 • John Kacur

Re-Generating Stockhausen’s ”Studie II” in Csound 65 • Joachim Heintz

Using open source music software to teach live electronics in pre-college • music education 75 Hans Roels

Field Report: A pop production in Ambisonics 83 • Jorn¨ Nettingsmeier

5 years of using SuperCollider in real-time interactive performances and • installations - retrospective analysis of Schwelle, Chronotopia and Semblance 91 Marije Baalman

Applications of Blocked Signal Processing (BSP) in Pd 99 • Frank Barknecht

OrchestralLily: A Package for Professional Music Publishing with • LilyPond and LaTeX 109 Reinhold Kainhofer

Term Rewriting Extensions for the Faust Programming Language 117 • Albert Graf¨

xi Openmixer: a routing mixer for multichannel studios 123 • Fernando Lopez-Lezcano, Jason Sadural

Best Practices for Open Sound Control 131 • Andrew Schmeder, Adrian Freed, David Wessel

Supernova - a multiprocessor-aware synthesis server for SuperCollider 141 • Tim Blechmann

Work Stealing Scheduler for Automatic Parallelization in Faust 147 • Stephane Letz, Yann Orlarey, Dominique Fober

A MusicXML Test Suite and a Discussion of Issues in MusicXML 2.0 153 • Reinhold Kainhofer

3DEV: A tool for the control of multiple directional sound source • trajectories in a 3D space 161 Oscar Pablo Di Liscia, Esteban Calcagno

Sense/Stage - low cost, open source wireless sensor and data sharing • infrastructure for live performance and interactive realtime environments 167 Marije Baalman, Joseph Malloch, Harry Smoak, Joseph Thibodeau, Vincent De Bellebal, Marcelo Wanderley, Brett Bergmann, Christopher Salter

Education on Music and Technology, a Program for a Professional • Education 177 Hans Timmermanns, Jan IJzermans, Rens Machielse, Gerard van Wolferen, Jeroen van Iterson

Concerts 181 •

Workshops 187 •

xii QuteCsound, a Csound Frontend

Andr´esCABRERA Sonic Arts Research Centre Queen’s University Belfast UK [email protected]

Abstract QuteCsound is a front-end application for Csound written using the Qt toolkit. It has been developed since 2008, and is now part of the Csound distri- bution for Windows and OS X. It is a code editor for Csound, and provides many features for real-time control of the Csound engine, through graphical con- trol interfaces and live score processing.

Keywords Csound, Front-end, Qt, Widgets, Interface builder

1 Introduction Figure 1: The main QuteCsound window QuteCsound was born out of the desire to have a cross-platform front-end for Csound [Boulanger, 2000] like MacCsound [Ingalls, 2005] which only runs on OS X. It is based on the idea of hav- Figure 2: The transport bar ing real-time control of Csound through graph- ical widgets, and making Csound accessible for QuteCsound Front page1 and the QuteCsound novice users. It is however designed to be also sources and binaries can be found on its Source- a powerful editor for advanced users and also forge page2. offline (non-realtime) work. The most recent version is 0.5.0. It has been tested on Linux, 2 GUI Mac OS X, Windows and Solaris. QuteCsound The main application window is shown in figure has been translated to Spanish, French, Italian, 1. The main component is a text editor with German and Portuguese. syntax highlighting. Documents can be opened One of the main goals was also to make a as separate tabs in this area. There is a large new interface for Csound which would be com- icon bar with the main actions, which contains fortable for a musician whose background is not the usual open/save and cut/copy/paste action in programming. Existing cross-platform inter- icons, a section with transport controls (see fig- faces for Csound were either too basic or some- ure 2), and a section for handling visibility of what impractical. The Csound Manual [Cabr- the rest of the application windows and panels. era, 2010] is very comprehensive, but few front- ends take full advantage of its resources like op- Around the editor many dockable widgets can code listing in XML format. be positioned freely. These dockable widgets QuteCsound uses the Csound API [ffitch, are: 2005], which enables tight integration and con- trol of Csound. Widget Panel In the widget panel, control It is licensed under the GPLv3 and LGPLv2 widgets like sliders and knobs can be cre- for compatibility with the Csound licence to ated. The design and manipulation of the ease distribution alongside Csound. More in- 1http://qutecsound.sourceforge.net formation on QuteCsound can be found in the 2http://www.sourceforge.net/projects/qutecsound

1 widgets is all graphical, and requires no textual programming. See 2.1 below. Manual Panel The html version of the Csound Manual can be displayed in this panel. The reference for an opcode under the editor cursor can be called with the de- fault short-cut Shift+F1. Console Panel The output from Csound for the current document is displayed in this panel. Inspector Panel Shows a dynamically gener- ated hierarchical list of important sections from the current file. It shows instrument Figure 3: Widget Panel detail definitions and labels, definition of User de- fined opcodes, f-table definitions, and score sections. An example of how the widget panel can be used is shown in figure 3. When the current document tab is changed, These are the available data widgets: all the panels which depend on the document like the widget panel and the inspector panel, Slider Ordinary sliders which become horizon- change to show the current data. tal or vertical depending on the relation be- There are three additional windows in the tween width and height. GUI: Knob Simple rotary knobs. Configuration dialog Allows setting options Labels and Displays Text widgets that dis- for Csound execution, Environment and in- play immutable text (labels), or text that terface options. can be changed only from Csound, not us- 4 Utilities dialog Simplifies usage of the ing the mouse . Csound utilities -applications which pre- Text Editor A text entry widget. process files for certain opcodes- can be called and controlled through this dialog Number widgets The ScrollNumber and window. SpinBox widgets offer number value inputs and outputs with mouse and keyboard Live Event Panels These windows which control. vary in number according to the current document contain a spreadsheet-like in- Menus The menu widget allows creating a terface for manipulating Csound score Drop-down or Combo Box for selection events which can be sent, manipulated and from a list. looped while Csound is running. They can Controller The controller widget can be a also be processed using the simple python slider, a meter or an XY controller. It can qutesheet API (see section 2.3). also be used as a “LED” display. 2.1 Widgets Button Buttons can be data widgets (sending The widget panel allows creation of versatile different values depending on whether they graphical control interfaces. Data widgets pro- are pressed or not) or score event genera- vide bi-directional interchange of values with tors. They can also hold images. Csound through the API. The values can be CheckBox A simple checkbox which can take updated synchronously or asynchronously de- a value of 1 or 0. pending on the QuteCsound configuration and the usage of invalue/outvalue or chnget/chnset 3 There are three other widgets which are used to opcodes in the csd file. display information from Csound:

3The invalue/outvalue opcodes call a registered call- 4While this separation is not really necessary or use- back when they are used, while chnget/chnset only ful, it follows the MacCsound format choices, and was change internal values which must be polled kept for compatibility purposes only

2 There are some handy organization functions for selected widgets like align and distribute to aid in creating better looking widget panels. The widgets are saved as text in the csd file, but this text is always hidden from the user5. Each widget is currently represented by a single line of text, which looks something like this:

ioSlider {23, 244} {414, 32} -1.000000 1.000000 -1.000000 amount

This follows the original MacCsound format strictly, which enables QuteCsound to open and Figure 4: The Widget Preferences dialog generate files which are interchangeable with MacCsound. It is somewhat human-readable and editable, but impossible to extend without Graphs The Graph Widget displays Csound breaking compatibility. A lot of internal work F-tables, as they are created by Csound, has already been done to move past the MacC- and also displays data from the dispfft and sound widget format to an XML based format display opcodes, which allow monitoring which will allow easier extensibility and parsing, any signal or its spectrum. There is a Com- and new widget types. boBox which allows selection of the current tables. The table shown can also be se- 2.2 Live events panel lected through a Csound API channel. Each document can have any number of Live Scopes The Scope widget shows visual repre- Event Panels. A live event panel is a place sentations of Csound output buffer. It can where Csound score events can be placed for act as a traditional Oscilloscope or as a Lis- interactive usage during a Csound run. It is sajou, or Poincare display. shown in figure 5. The live event panel can dis- play the information in the traditional Csound Consoles The Console widget shows the score format or as a spreadsheet with editable Csound console output as a widget in the cells. It offers a group of common processing widget panel. functions which can be applied quickly to a Widgets can be created by right-clicking on group of cells like addition, multiplication, and the widget panel, and selecting a type of widget generation functions like fill linearly or exponen- from a list. They can be moved and resized tially or random number generation. There is by entering the ‘Edit Mode’ with the keyboard also support for some of the basic Csound Score short-cut Ctrl+E. preprocessor features like tempo control and the All widgets have a configuration dialog which carry operator ‘.’. can be shown by right clicking on them and se- Score events can be copied/pasted to/from lecting ‘Properties’ or by double clicking when text view and sheet view seamlessly. in edit mode. The Properties dialog for a text Live Event Panels are also stored as text in widget is shown on figure 4. Properties like size the main csd file, and are hidden from the user and position can be set in these dialogs. The in the main text editor. channel name through which the widget trans- 2.3 The QuteSheet Python API mits (if it can according to its type) can also be A simple python API has been devised to enable set here. transformation of data from the Live Events To receive data from the widgets, they must Panel using Python. The cells (both the se- be assigned to a control variable (k-rate) using lected and the complete set) are passed to the the invalue opcode like this: python script as arrays in any particular orga- kval invalue "channel" nization (by rows, by columns, by individual cells), and can be returned with a single func- Conversely, to send data to a widget, the out- tion specifying the new data and where it should value opcode can be used. 5This behavior will probably change in the future to outvalue "chan", kval allow text modification of the widgets

3 Figure 5: The Live Events Panel be placed in the sheet. The python scripts can be stored in a directory which is scanned every time the right-click menu in the event sheet is triggered, effectively allowing “live coding” of score transformations while the Csound audio engine is running.

2.4 Code Graph Figure 6: The Code Graph Output QuteCsound can parse the currently active doc- ument to generate a dot language file, which can read/write library, which is very well main- be rendered using graphviz6 . Although compli- tained, stable and efficient. cated files produce diagrams that are too com- All audio and MIDI I/O is handled by Csound plex, this feature is useful for beginners to see itself, except the function, which copies how the variables and opcodes are connected in the Csound output buffer after every control a visual way. The output of this action can be block to a ring buffer and writes it to disk from seen in figure 6. another thread. This means that QuteCsound supports output to Jack, Coreaudio and Win- 3 Architecture MME as well as generic interfaces like Portau- dio. QuteCsound requires Qt 4[Nokia, 2010], Csound can be run either as a com- Csound and libsndfile[de Castro Lopo, 2010] to pletely independent process in a separate build. Terminal application, on the same appli- Qt is a mature cross-platform graphical tool- cation thread (usually undesirable) or on kit, which is used in several well known free- a separate thread using the API through software audio projects like QJackCtl. It also the CsoundPerformanceThread 7 C++ interface has cross-platform libraries for non GUI stuff from interfaces/csPerfThread.hpp in the like networking, XML parsing, printing and Csound sources. threading which has saved time and avoided the Data update is requested by Csound syn- need for additional dependencies. It also has chronously through its callback functions (set extensive facilities for internationalization and using the csoundSetInputValueCallback and translation. It is distributed in separate dy- csoundSetOutputValueCallback functions) namic libraries which can be distributed with when the invalue/outvalue opcodes are used. the executable. Data for Csound is polled in that callback, but Libsndfile is a well established audiofile data for the widgets is queued, to be processed

6Graphviz is a package for generating flowchart style 7this class handles some of the threading issues as- graphics using the dot language. More information can sociated with passing events to Csound and with start- be found on www.graphviz.org ing/pausing/stopping the engine

4 at a slower rate. The values can also be up- And of course fix some of the bugs along the dated asynchronously with the chnset/chnget way. opcodes. The values are then updated from a timer triggered thread in QuteCsound, forcing 6 Acknowledgements locking of values. Many thanks to the Csound developers, spe- An interesting feature of MacCsound which cially John ffitch and Victor Lazzarini for their has been emulated in QuteCsound is that wid- assistance with usage of the API during the de- gets with the same channel name update each velopment of QuteCsound, and their quick re- other even when Csound is not running. This is sponse to issues and particular needs. Also, useful to avoid Csound code for common things many thanks go to the users who have con- like showing the numeric value of a slider wid- tributed many ideas, translations, bug reports get, and it also gives a (false but expected and and testing. Thanks to people like Joachim rewarding) sense of an “always running” en- Heintz, Andy Fillebrown and Fran¸coisPinot, gine. In practical terms, this implies that when for their contribution to development and pack- Csound is running, widgets must first take val- aging. ues generated from Csound, and then propagate their values to other widgets. References Richard Boulanger, editor. 2000. The Csound 3.1 Documentation integration Book: Perspectives in Software Synthesis, The Csound documentation is highly integrated Sound Design, Signal Processing and Pro- in QuteCsound. It can be easily called from gramming. MIT Press. the Help menu, or to find the reference for the opcode under the editor cursor. The documen- Andres Cabrera, editor. 2010. The Csound tation also contains an XML file of all the op- 5.12 Manual. code definitions organized by category. This file Erik de Castro Lopo. 2010. is used for syntax highlight, but also for show- Libsndfile. http://www.mega- ing opcode syntax in the status bar, populating nerd.com/libsndfile/api.html. an opcode selector organized by categories and building the code graph code (to know the in- John ffitch. 2005. On the design of csound put/output variable names and types). 5. In Proceedings of the 2005 Linux Audio Conference, ZKM, Karlsruhe, Germany. 4 Examples Menu Matt Ingalls. 2005. Maccsound. The Examples menu in QuteCsound includes http://www.csounds.com/matt/MacCsound/. a large set of introductory examples and tu- Nokia. 2010. Qt – cross-platform application torials, reference examples for the widgets and and ui framework. http://qt.nokia.com/. a group of ‘classic’ Csound compositions like Boulanger’s Trapped in Convert, and Csound realizations of important historical pieces like Chowning’s Stria and Stockhausen’s Studie II. The examples menu also contains a set of re- altime synths, some useful files for things like I/O monitoring and file processing, and a ‘Fa- vorites’ menu which shows the csd files from a user specified directory.

5 The future Plans for the future include completion of the new widget format, and development of new widgets like a table widget. It would be nice to have a synchronization option to make loops sync to a master loop. A lot of work has been done to enable the exporting of stand alone applications, so this will hopefully be finished soon.

5 6 Implementing a Polyphonic MIDI Software Synthesizer using Coroutines, Realtime Garbage Collection, Closures, Auto-Allocated Variables, Dynamic Scoping, and Continuation Passing Style Programming

Kjetil Matheussen Norwegian Center for Technology in Music and the Arts. (NOTAM) [email protected]

Abstract however the programming only concerns han- This paper demonstrates a few programming tech- dling blocks of samples where we only control a niques for low-latency sample-by-sample audio pro- signal graph, there are several high level alter- gramming. Some of them may not have been used natives which makes it relatively easy to do a for this purpose before. The demonstrated tech- straightforward implementation of a MIDI soft niques are: Realtime memory allocation, realtime synth. Examples of such high level music pro- garbage collector, storing instrument data implicitly gramming systems are SuperCollider [McCart- in closures, auto-allocated variables, handling signal ney, 2002], Pd [Puckette, 2002], CSound4 and buses using dynamic scoping, and continuation pass- many others. ing style programming. But this paper does not describe use of block processing. In this paper, all samples are in- Keywords dividually calculated. The paper also explores Audio programming, realtime garbage collection, possible advantages of doing everything, alloca- coroutines, dynamic scoping, Continuation Passing tion, initialization, scheduling, etc., from inside Style. the realtime audio thread. At least it looks like everything is performed 1 Introduction inside the realtime audio thread. The under- lying implementation is free to reorganize the This paper demonstrates how to implement a code any way it wants, although such reorga- MIDI software synthesizer (MIDI soft synth) nizing is not performed in Snd-RT yet. using some unusual audio programming tech- Future work is making code using these tech- niques. The examples are written for Snd-RT niques perform equally, or perhaps even better, [Matheussen, 2008], an experimental audio pro- than code where allocation and initialization of gramming system supporting these techniques. data is explicitly written not to run in the real- The techniques firstly emphasize convenience time audio thread. (i.e. few lines of code, and easy to read and modify), and not performance. Snd-RT1 runs on top of Snd2 which again runs on top of the 2 MIDI software synthesizer Scheme interpreter Guile.3 Guile helps gluing The reason for demonstrating a MIDI soft synth all parts together. instead of other types of music programs such It is common in music programming only to as a granular synthesis generator or a reverb, is compute the sounds themselves in a realtime that the behavior of a MIDI soft synth is well priority thread. Scheduling new notes, alloca- known, plus that a MIDI soft synth contains tion of data, data initialization, etc. are usually many common challenges in audio and music performed in a thread which has a lower prior- programming: ity than the audio thread. Doing it this way 1. Generating samples. To hear sound, we helps to ensure constant and predictable CPU need to generate samples at the Audio usage for the audio thread. But writing code Rate. that way is also more complicated. At least, when all samples are calculated one by one. If 2. Handling Events. MIDI data are read at a rate lower than the audio rate. This rate is 1http://archive.notam02.no/arkiv/doc/snd-rt/ commonly called the Control Rate. 2http://ccrma.stanford.edu/software/snd/ 3http://www.gnu.org/software/guile/guile.html 4http://www.csound.com

7 3. Variable polyphony. Sometimes no notes coroutine to make it run. The spawned are playing, sometimes maybe 30 notes are coroutine will run automatically as soon6 playing. as the current coroutine yields (by calling yield or wait), or the current coroutine 4. Data allocation. Each playing note re- ends. quires some data to keep track of frequency, phase, envelope position, volume, etc. The Coroutines are convenient in music pro- challenges are; How do we allocate memory gramming since it often turns out practi- for the data? When do we allocate memory cal to let one dedicated coroutine handle for the data? How do we store the memory only one voice, instead of mixing the voices holding the data? When do we initialize manually. Furthermore, arbitrarily placed the data? pauses and breaks are relatively easy to im- plement when using coroutines, and there- 5. Bus routing. The sound coming from the fore, supporting dynamic control rate simi- tone generators is commonly routed both lar to ChucK [Wang and Cook, 2003] comes through an envelope and a reverb. In ad- for free. dition, the tones may be autopanned, i.e. panned differently between two loudspeak- (wait n) waits n number of frames before con- ers depending on the note height (similar tinuing the execution of the current corou- to the direction of the sound coming from tine. a piano or a pipe organ). (sound ...) spawns a special kind of coroutine where the code inside sound is called one time per sample. (sound coroutines are 3 Common Syntax for the Examples stored in a tree and not in a priority queue The examples are written for a variant of since the order of execution for sound the programming language Scheme [Steele and coroutines depends on the bus system and Sussman, 1978]. Scheme is a functional lan- not when they are scheduled to wake up.) guage with imperative operators and static A simple version of the sound macro, scoping. called my-sound, can be implemented like A number of additional macros and special this: operators have been added to the language, and some of them are documented here because of (define-stalin-macro (my-sound . body) the examples later in the paper. ‘(spawn (while #t ,@body (...) is a macro which first trans- (wait 1)))) forms the code inside the block into clean R4RS code [Clinger and Rees, 1991] un- However, my-sound is inefficient compared derstood by the Stalin Scheme compiler.5 to sound since my-sound is likely to do (Stalin Scheme is an R4RS compiler). Af- a coroutine context switch at every call ter Stalin is finished compiling the code, the to wait.7 sound doesn’t suffer from this produced object file is dynamically linked problem since it is run in a special mode. into Snd-RT and scheduled to immediately This mode makes it possible to run tight run inside the realtime audio thread. loops which does not cause a context switch (define-stalin signature ...) defines variables until the next scheduled event. and functions which are automatically in- (out sample) sends out data to serted into the generated Stalin scheme the current bus at the current time. (the code if needed. The syntax is similar to current bus and the current time can be define. thought of as global variables which are im- (spawn ...) spawns a new coroutine [Conway, plicitly read from and written to by many 1963; Dahl and Nygaard, 1966]. Corou- 6Unless other coroutines are placed earlier in the tines are stored in a priority queue and it is queue. not necessary to explicitly call the spawned 7I.e. if two or more my-sound blocks or sound blocks run simultaneously, and at least one of them is a my- 5Stalin - a STAtic Language ImplementatioN, sound block, there will be at least two coroutine context http://cobweb.ecn.purdue.edu/ qobi/software.html. switches at every sound frame.

8 operators in the system)8 By default, the :where is just another way to declare local current bus is connected to the sound card, variables. For example, but this can be overridden by using the (+ 2 b in macro which is explained in more de- :where b 50) tail later. If the channel argument is omitted, the is another way of writing sample is written both to channel 0 and 1. (let ((b 50)) It makes sense only to use out inside a (+ 2 b)) sound block. The following example plays a 400Hz sine sound to the sound card: There are three reason for using :where instead of let. The first reason is that ( the use of :where requires less parenthe- (let ((phase 0.0)) sis. The second reason is that reading the (sound code sometimes sounds more natural this (out (sin phase)) (inc! phase (hz->radians 400))))) way. (I.e “add 2 and b, where b is 50” in- stead of “let b be 50, add 2 and b”.) The (range varname start end ...) is a simple lo- third reason is that it’s sometimes easier op iterator macro which can be imple- to understand the code if you know what mented like this:9 you want to do with a variable, before it is defined. (define-macro (range varname start end . body) (define loop (gensym)) 4 Basic MIDI Soft Synth ‘(let ,loop ((,varname ,start)) (cond ((<,var ,end) We start by showing what is probably the sim- ,@body plest way to implement a MIDI soft synth: (,loop (+ ,varname 1))))))

(range note-num 0 128 (wait- :options ...) waits until MIDI data ( is received, either from an external inter- (define phase 0.0) face, or from another program. (define volume 0.0) (sound wait-midi has a few options to specify the (out (* volume (sin phase)))) kind of MIDI data it is waiting for. In the (inc! phase (midi->radians note-num))) examples in this paper, the following op- (while #t tions for wait-midi are used: (wait-midi :command note-on :note note-num (set! volume (midi-vol))) :command note-on (wait-midi :command note-off :note note-num Only wait for a note on MIDI message. (set! volume 0.0)))) :command note-off This program runs 128 instruments simul- Only wait for a note off MIDI mes- taneously. Each instrument is responsible for sage. playing one tone. 128 variables holding volume :note number are also used for communicating between the Only wait for a note which has MIDI parts of the code which plays sound (running at note number number. the audio rate), and the parts of the code which reads MIDI information (running at the control Inside the wait-midi block we also have rate10). access to data created from the incom- There are several things in this version which ing midi event. In this paper we use are not optimal. Most important is that you (midi-vol) for getting the velocity (con- 10Note that the control rate in Snd-RT is dynamic, verted to a number between 0.0 and 1.0), similar to the music programming system ChucK. Dy- and (midi-note) for getting the MIDI namic control rate means that the smallest available note number. time-difference between events is not set to a fixed num- ber, but can vary. In ChucK, control rate events are 8Internally, the current bus is a coroutine-local vari- measured in floating numbers, while in Snd-RT the mea- able, while the current time is a global variable. surement is in frames. So In Chuck, the time difference 9The actual implementation used in Snd-RT also can be very small, while in Snd-RT, it can not be smaller makes sure that “end” is always evaluated only one time. than 1 frame.

9 would normally not let all instruments play all to free memory used by sounds not playing the time, causing unnecessary CPU usage. You anymore to avoid running out of memory. would also normally limit the polyphony to a Luckily though, freeing memory is taken care fixed number, for instance 32 or 64 simultane- of automatically by the Rollendurchmesserzeit- ously sounds, and then immediately schedule sammler garbage collector, so we don’t have new notes to a free instrument, if there is one. to do anything special:

1| (define-stalin (softsynth) 5 Realtime Memory Allocation 2| (while #t 3| (wait-midi :command note-on As mentioned, everything inside 4| (define osc (make-oscil :freq (midi->hz (midi-note)))) 5| (define tone (sound (out (* (midi-vol) (oscil osc))))) runs in the audio realtime thread. Allocating 6| (spawn memory inside the audio thread using the OS al- 7| (wait-midi :command note-off :note (midi-note) 8| (stoptone)))))) location function may cause surprising glitches 9| 10| ( in sound since it is not guaranteed to be an 11| (softsynth)) O(1) allocator, meaning that it may not al- ways spend the same amount of time. There- In this program, when a note-on message is fore, Snd-RT allocates memory using the Rol- received at line 3, two coroutines are scheduled: lendurchmesserzeitsammler [Matheussen, 2009] garbage collector instead. The memory alloca- 1. A sound coroutine at line 5. tor in Rollendurchmesserzeitsammler is not only 2. A regular coroutine at line 6. running in O(1), but it also allocates memory extremely efficiently. [Matheussen, 2009] Afterwards, the execution immediately jumps In the following example, it’s clearer that in- back to line 3 again, ready to schedule new strument data are actually stored in closures notes. which are allocated during runtime.11 In ad- So the MIDI soft synth is still polyphonic, dition, the 128 spawned coroutines themselves and contrary to the earlier versions, the CPU require some memory which also needs to be is now the only factor limiting the number of allocated: simultaneously playing sounds.12

( (range note-num 0 128 7 Auto-Allocated Variables (spawn (define phase 0.0) In the following modification, the (define volume 0.0) CLM [Schottstaedt, 1994] oscillator oscil (sound will be implicitly and automatically allocated (out (* volume (sin phase)))) (inc! phase (midi->radians note-num))) first time the function oscil is called. After (while #t the generator is allocated, a pointer to it is (wait-midi :command note-on :note note-num stored in a special memory slot in the current (set! volume (midi-vol))) (wait-midi :command note-off :note note-num coroutine. (set! volume 0.0))))) Since oscil is called from inside a sound coroutine, it is natural to store the generator in 6 Realtime Garbage Collection. the coroutine itself to avoid all tones using the (Creating new instruments only same oscillator, which would happen if the auto- when needed) allocated variable had been stored in a global variable. The new definition of softsynth now The previous version of the MIDI soft synth did looks like this: allocate some memory. However, since all mem- ory required for the lifetime of the program were (define-stalin (softsynth) (while #t allocated during startup, it was not necessary to (wait-midi :command note-on free any memory during runtime. (define tone (sound (out (* (midi-vol) But in the following example, we simplify the (oscil :freq (midi->hz (midi-note))))))) code further by creating new tones only when (spawn (wait-midi :command note-off :note (midi-note) they are needed. And to do that, it is necessary (stop tone)))))) 11Note that memory allocation performed before any sound block can easily be run in a non-realtime thread 12Letting the CPU be the only factor to limit before scheduling the rest of the code to run in realtime. polyphony is not necessarily a good thing, but doing so But that is just an optimization. in this case makes the example very simple.

10 The difference between this version and the similar functionality) are necessary, so it would previous one is subtle. But if we instead look be nice to have a better way to handle them. at the reverb instrument in the next section, it would span twice as many lines of code, and the 9 Routing Signals with Dynamic code using the reverb would require additional Scoping. (Getting rid of manually logic to create the instrument. handling sound buses) A slightly less verbose way to create, read and 8 Adding Reverb. (Introducing write signal buses is to use dynamic scoping signal buses) to route signals. The bus itself is stored in a A MIDI soft synth might sound unprofessional coroutine-local variable and created using the or unnatural without some reverb. In this ex- in macro. ample we implement John Chowning’s reverb13 Dynamic scoping comes from the fact that and connect it to the output of the MIDI soft out writes to the bus which was last set up synth by using the built-in signal bus system: by in. In other words, the scope for the current bus (the bus used by out) follows (define-stalin (reverb input) the execution of the program. If out isn’t (delay :size (* .013 (mus-srate)) (somehow) called from in, it will instead write (+ (comb :scaler 0.742 :size 9601 allpass-composed) (comb :scaler 0.733 :size 10007 allpass-composed) to the bus connected to the soundcard. (comb :scaler 0.715 :size 10799 allpass-composed) (comb :scaler 0.697 :size 11597 allpass-composed) :where allpass-composed For instance, instead of writing: (send input :through (all-pass :feedback -0.7 :feedforward 0.7) (define-stalin bus (make-bus)) (all-pass :feedback -0.7 :feedforward 0.7) (all-pass :feedback -0.7 :feedforward 0.7) (define-stalin (instr1) (all-pass :feedback -0.7 :feedforward 0.7))))) (sound (write-bus bus 0.5)))

(define-stalin bus (make-bus)) (define-stalin (instr2) (sound (write-bus bus -0.5))) (define-stalin (softsynth) (while #t ( (wait-midi :command note-on (instr1) (define tone (instr2) (sound (sound (write-bus bus (out (read-bus bus)))) (* (midi-vol) (oscil :freq (midi->hz (midi-note))))))) (spawn we can write: (wait-midi :command note-off :note (midi-note) (stop tone)))))) (define-stalin (instr1) (sound (out 0.5))) (define-stalin (fx-ctrl input clean wet processor) (+ (* clean input) (define-stalin (instr2) (* wet (processor input)))) (sound (out -0.5)))

( ( (spawn (sound (softsynth)) (out (in (instr1) (sound (instr2))))) (out (fx-ctrl (read-bus bus) 0.5 0.09 reverb)))) What happened here was that the first time in was called in the main block, it spawned a Signal buses are far from being an “unusual new coroutine and created a new bus. The new technique”, but in text based languages they coroutine then ran immediately, and the first are in disadvantage compared to graphical mu- thing it did was to change the current bus to sic languages such as Max [Puckette, 2002] or the newly created bus. The in macro also made Pd. In text based languages it’s inconvenient to sure that all sound blocks called from within write to buses, read from buses, and most im- the in macro (i.e. the ones created in instr1 and portantly; it’s hard to see the signal flow. How- instr2 ) is going to run before the main sound ever, signal buses (or something which provides block. (That’s how sound coroutines are stored in a tree) 13as implemented by Bill Schottstaedt in the file ”jc-reverb.scm” included with Snd. The fx-ctrl function When transforming the MIDI soft synth to is a copy of the function fxctrl implemented in Faust’s use in instead of manually handling buses, it Freeverb example. will look like this:

11 ;; source, but used two times from the same corou- ;; Don’t need the bus anymore: tine, both channels will use the same coroutine- (define-stalin bus (make-bus)) local variables used by the reverb; a delay, four ;; softsynth reverted back to the previous version: (define-stalin (softsynth) comb filters and four all-pass filters. (while #t There are a few ways to work around this (wait-midi :command note-on (define tone problem. The quickest work-around is to re- (sound (out (* (midi-vol) code ’reverb’ into a macro instead of a function. (oscil :freq (midi->hz (midi-note))))))) (spawn However, since neither the problem nor any so- (wait-midi :command note-off :note (midi-note) (stop tone)))))) lution to the problem are very obvious, plus that it is slower to use coroutine-local variables than ;; A simpler main block: ( manually allocating them (it requires extra in- (sound structions to check whether the data has been (out (fx-ctrl (in (softsynth)) 14 0.5 0.09 allocated ), it’s tempting not to use coroutine- reverb)))) local variables at all. 10 CPS Sound Generators. (Adding Instead we introduce a new concept called CPS Sound Generators, where CPS stands for stereo reverb and autopanning) Continuation Passing Style. [Sussman and Using coroutine-local variables was convenient Steele, 1975] in the previous examples. But what happens if we want to implement autopanning and (a very simple) stereo reverb, as illustrated by the 10.1 How it works graph below? Working with CPS Sound Generators are simi- lar to Faust’s Block Diagrams composition. [Or- +-- reverb -> out ch 0 larey et al., 2004] A CPS Sound Generator can / also be seen as similar to a Block Diagram in softsynth--< Faust, and connecting the CPS Sound Genera- \ tors is quite similar to Faust’s Block Diagram +-- reverb -> out ch 1 Algebra (BDA). CPS Sound Generators are CPS functions First, lets try with the tools we have used so which are able to connect to other CPS Sound far: Generators in order to build up a larger function (define-stalin (stereo-pan input c) for processing samples. The advantage of build- (let* ((sqrt2/2 (/ (sqrt 2) 2)) (angle (- pi/4 (* c pi/2))) ing up a program this way is that we know what (left (* sqrt2/2 (+ (cos angle) (sin angle)))) data is needed before starting to process sam- (right (* sqrt2/2 (- (cos angle) (sin angle))))) (out 0 (* input left)) ples. This means that auto-allocated variables (out 1 (* input right)))) don’t have to be stored in coroutines, but can (define-stalin (softsynth) be allocated before running the sound block. (while #t (wait-midi :command note-on For instance, the following code is written in (define tone generator-style and plays a 400Hz sine sound to (sound (stereo-pan (* (midi-vol) the sound card: (oscil :freq (midi->hz (midi-note)))) (/ (midi-note) 127.0)))) (let ((Generator (let ((osc (make-oscillator :freq 400))) (spawn (lambda (kont) (wait-midi :command note-off :note (midi-note) (kont (oscil osc)))))) (stop tone)))))) (sound (Generator (lambda (sample) ( (out sample))))) (sound (in (softsynth) (lambda (sound-left sound-right) The variable kont in the function Generator (out 0 (fx-ctrl sound-left 0.5 0.09 reverb)) is the continuation, and it is always the last ar- (out 1 (fx-ctrl sound-right 0.5 0.09 reverb)))))) gument in a CPS Sound Generator. A continua- At first glance, it may look okay. But tion is a function containing the rest of the pro- the reverb will not work properly. The rea- gram. In other words, a continuation function son is that auto-generated variables used for 14It is possible to optimize away these checks, but coroutine-local variables are identified by their doing so either requires restricting the liberty of the position in the source. And since the code programmer, some kind of JIT-compilation, or doing a for the reverb is written only one place in the whole-program analysis.

12 will never return. The main reason for program- (gen-sound :options generator) is the same ming this way is for generators to easily return as writing more than one sample, i.e have more than one output.15 (let ((gen generator)) (sound :options Programming directly this way, as shown (gen (lambda (result0) above, is not convenient, so in order to make (out 0 result0))))) programming simpler, some additional syntax ...when the generator has one output. If have been added. The two most common oper- the generator has two outputs, it will look ators are Seq and Par, which behave similar to like this: the ’:’ and ’,’ infix operators in Faust.16 (let ((gen generator)) Seq creates a new generator by connecting gen- (sound :options (gen (lambda (result0 result1) erators in sequence. In case an argument is (out 0 result0) not a generator, a generator will automat- (out 1 result1))))) ically be created from the argument. ...and so forth. For instance, (Seq (+ 2)) is the same as writing The Snd-RT preprocessor knows if a variable or expression is a CPS Sound Generator by (let ((generator0 (lambda (arg1 kont0) (kont0 (+ 2 arg1))))) looking at whether the first character is capitol. (lambda (input0 kont1) For instance, (Seq (Process 2)) is equal to (generator0 input0 kont1))) (Process 2), while (Seq (process 2)) is equal to and (Seq (+ (random 1.0)) (+ 1)) is the (lambda (input kont) (kont (process 2 input))), same as writing regardless of how ’Process’ and ’process’ are defined. (let ((generator0 (let ((arg0 (random 1.0))) (lambda (arg1 kont0) (kont0 (+ arg0 arg1))))) 10.2 Handling auto-allocated variables (generator1 (lambda (arg1 kont1) (kont1 (+ 1 arg1))))) oscil and the other CLM generators are macros, (lambda (input kont2) (generator0 input (lambda (result0) and the expanded code for (oscil :freq 440) looks (generator1 result0 kont2))))) like this: ;; Evaluating ((Seq (+ 2) (+ 1)) 3 display) ;; will print 6! (oscil_ (autovar (make_oscil_ 440 0.0)) 0 0)

Par creates a new generator by connecting gen- Normally, autovar variables are translated erators in parallel. Similar to Seq, if an into coroutine-local variables in a separate step argument is not a generator, a generator performed after macro expansion. However, using the argument will be created auto- when an auto-allocated variable is an argu- matically. ment for a generator, the autovar surrounding For instance, (Par (+ (random 1.0)) (+ 1)) is removed. And, similar to other arguments is the same as writing: which are normal function calls, the initializa- tion code is placed before the generator func- (let ((generator0 (let ((arg0 (random 1.0))) tion. For example, (Seq (oscil :freq 440)) is ex- (lambda (arg1 kont0) 17 (kont0 (+ arg0 arg1))))) panded into: (generator1 (lambda (arg1 kont1) (kont1 (+ 1 arg1))))) (let ((generator0 (let ((var0 (make_oscil_ 440 0.0))) (lambda (input2 input3 kont1) (lambda (kont) (generator0 input2 (kont (oscil_ var0 0 0)))))) (lambda (result0) (lambda (kont) (generator1 input3 (generator0 kont))) (lambda (result1) (kont1 result0 result1)))))))

;; Evaluating ((Par (+ 2)(+ 1)) 3 4 +) will return 10! 17Since the Snd-RT preprocessor doesn’t know the number of arguments for normal functions such as oscil , 15Also note that by inlining functions, the Stalin this expansion requires the preprocessor to know that scheme compiler is able to optimize away the extra syn- this particular Seq block has 0 inputs. The preprocessor tax necessary for the CPS style. should usually get this information from the code calling 16Several other special operators are available as well, Seq, but it can also be defined explicitly, for example like but this paper is too short to document all of them. this: (Seq 0 Cut (oscil :freq 440)).

13 10.3 The Soft Synth using CPS Sound ming. Generators 13 Acknowledgments (define-stalin (Reverb) (Seq (all-pass :feedback -0.7 :feedforward 0.7) Thanks to the anonymous reviewers and Anders (all-pass :feedback -0.7 :feedforward 0.7) Vinjar for comments and suggestions. (all-pass :feedback -0.7 :feedforward 0.7) (all-pass :feedback -0.7 :feedforward 0.7) (Sum (comb :scaler 0.742 :size 9601) References (comb :scaler 0.733 :size 10007) (comb :scaler 0.715 :size 10799) William Clinger and Jonathan Rees. 1991. (comb :scaler 0.697 :size 11597)) (delay :size (* .013 (mus-srate))))) Revised Report (4) On The Algorithmic Lan- guage Scheme. (define-stalin (Stereo-pan c) (Split Identity (* left) Melvin E. Conway. 1963. Design of a sepa- (* right) rable transition-diagram compiler. Commu- :where left (* sqrt2/2 (+ (cos angle) (sin angle))) :where right (* sqrt2/2 (- (cos angle) (sin angle))) nications of the ACM, 6(7):396–408. :where angle (- pi/4 (* c pi/2)) :where sqrt2/2 (/ (sqrt 2) 2))) O.-J. Dahl and K. Nygaard. 1966. SIMULA:

(define-stalin (softsynth) an ALGOL-based simulation language. Com- (while #t munications of the ACM, 9(9):671–678. (wait-midi :command note-on (define tone (gen-sound Kjetil Matheussen. 2008. Realtime Music (Seq (oscil :freq (midi->hz (midi-note))) Programming Using Snd-RT. In Proceedings (* (midi-vol)) (Stereo-pan (/ (midi-note) 127))))) of the International Computer Music Confer- (spawn ence. (wait-midi :command note-off :note (midi-note) (stop tone)))))) Kjetil Matheussen. 2009. Conservative Gar- (define-stalin (Fx-ctrl clean wet Fx) gage Collectors for Realtime Audio Process- (Sum (* clean) (Seq Fx ing. In Proceedings of the International Com- (* wet)))) puter Music Conference, pages 359–366 (on- ( line erratum). (gen-sound (Seq (In (softsynth)) James McCartney. 2002. Rethinking the (Par (Fx-ctrl 0.5 0.09 (Reverb)) (Fx-ctrl 0.5 0.09 (Reverb)))))) Computer Music Language: SuperCollider. Computer Music Journal, 26(2):61–68. 11 Adding an ADSR Envelope Y. Orlarey, D. Fober, and S. Letz. 2004. Syn- tactical and semantical aspects of faust, soft And finally, to make the MIDI soft synth sound computing. decent, we need to avoid clicks caused by sud- denly starting and stopping sounds. To do this, Miller Puckette. 2002. Max at Seventeen. we use a built-in ADSR envelope generator (en- Computer Music Journal, 26(4):31–43. tirely written in Scheme) for ramping up and W. Schottstaedt. 1994. Machine Tongues down the volume. Only the function softsynth XVII: CLM: Music V Meets Common Lisp. needs to be changed: Computer Music Journal, 18(2):30–37. (define-stalin (softsynth) Jr. Steele, Guy Lewis and Gerald Jay (while #t (wait-midi :command note-on Sussman. 1978. The Revised Report on (gen-sound :while (-> adsr is-running) SCHEME: A Dialect of LISP. Technical Re- (Seq (Prod (oscil :freq (midi->hz (midi-note))) (midi-vol) port 452, MIT. (-> adsr next)) (Stereo-pan (/ (midi-note) 127)))) Gerald Jay Sussman and Jr. Steele, (spawn Guy Lewis. 1975. Scheme: An inter- (wait-midi :command note-off :note (midi-note) (-> adsr stop))) preter for extended lambda calculus. In :where adsr (make-adsr :a 20:-ms Memo 349, MIT AI Lab. :d 30:-ms :s 0.2 :r 70:-ms)))) Ge Wang and Perry Cook. 2003. ChucK: a Concurrent and On-the-fly Audio Program- 12 Conclusion ming Language, Proceedings of the ICMC. This paper has shown a few techniques for doing low-latency sample-by-sample audio program-

14 Writing Audio Applications using GStreamer

Stefan KOST GStreamer community, Nokia/Meego Majurinkatu 12 B 43 Espoo, Finland, 02600 [email protected]

Abstract Maemo and Intels Moblin use GStreamer for all multimedia tasks. In the audio domain two GStreamer is mostly known for its use in media players. Although the API and the plugin applications to mention are (a multitrack collection has much to offer for audio composing audio editor) and Buzztard [2] (a tracker like and editing applications as well. This paper musical composer). The latter is the pet project of introduces the framework, focusing on features the article's author and will be used later on as an interesting for audio processing applications. The example case and thus deserves a little author reports about practical experience of using introduction. A song in Buzztard is a network of GStreamer inside the Buzztard project. sound generators, effects and an audio sink. All parameters on each of the elements can be Keywords controlled interactively or by the sequencer. All audio is rendered on the fly by algorithms and by GStreamer, Framework, Composer, Audio. optionally accessing a wavetable of sampled sounds. [3] gives a nice overview of live audio 1 GStreamer framework effects in Buzztard. GStreamer [1] is an open source, cross 1.1 Multimedia processing graph platform, graph based, multimedia framework. Development takes place on freedesktop.org. A GStreamer application constructs its GStreamer is written in C, but in an object oriented processing graph in a GstPipeline object. This is a fashion using the GObject framework. Numerous container for actual elements (source, effect and language bindings exist, e.g. for Python, C#, Java sinks). A container element is an element itself, and Perl. GStreamer has been ported to a wide supporting the same API as the real elements it variety of platforms such as Linux, Windows, contains. A pipeline is built by adding more MacOS, BSD and Solaris. The framework and elements/containers and linking them through their most plugins are released under LGPL license. The “pads”. A pipeline can be anything from a simple project is over 10 years old now and has many linear sequence to a complex hierarchical directed contributors. The current 0.10 series is API and graph (see Illustration 1). ABI stable and being actively developed since As mentioned above, elements are linked by about 5 years. connecting two pads – a source pad of the source A great variety of applications is using element and a sink pad from the target element. GStreamer already today. To give some examples, GStreamer knows about several types of pads – then GNOME desktop is using it for its always­, sometimes­ and request­pads: Mediaplayers Totem and Rhythmbox and the • always­pads are static (always available) Chat&Voip client Empathy, platforms like Nokias • sometimes­pads appear and disappear

15 Illustration 1: A hierarchical GStreamer pipeline according to data­flow (e.g. new stream on transferring buffers from pad to pad. A pipeline multiplexed content) can operate “push based”, “pull based” and in • request­pads are a template that can be “mixed mode”. In push mode, source elements are instantiated multiple times (e.g. inputs for active and deliver data downstream. In pull mode a mixer element) sink elements request data form upstream elements Besides generic containers (containers to when needed. Pure pull based mode is interesting structure a pipeline like GstBin and GstPipeline) for audio applications, but not yet fully supported several specialized containers exist: by all elements. Regardless of the scheduling, • GstPlaybin2: builds a media playback buffers always travel downstream (from sources to pipeline based on the media content, e.g. sinks). used in Totem Events are used for control flow between • GstCamerabin: abstract still image/video elements and from the application to the pipeline. capture workflow, e.g. used in Nokia Events inside the pipeline are bidirectional – some N900 go upstream, some go downstream. E.g. seeking is • Gnonlin: multitrack audio/video editor, done by using events. e.g. used in Jokosher and Pitivi Messages are used to inform the application 1.2 Data formats from the inside of the pipeline. Messages are sent to a bus. There they are marshalled to the The GStreamer core is totally data format application thread. The application can subscribe agnostic. Formats like audio and video (and their to interesting messages. E.g. multimedia metadata properties) are only introduced with the gst­ (such as id3 tags) are send as a tag­message. plugins­base module. New formats can be added Queries are mostly used by the application to without modifying the core (consider adding check the status of the pipeline. E.g. one can use a spectral audio processing plugins like VAMP [4]). position­query to check the current playback Data formats are described by GstCaps which is an position. Queries can also be used by elements, object containing a media­type and a list of {key, both up­ and downstream. value} pairs. Values can be fixed or variable (a range or list of possible values). The GstCaps class Type Direction implements operations on caps (union, Buffer Between elements intersection, iterator,...). This is quite different to Event From application to elements e.g. ladspa where the standard specifies a fix and between elements format like float­audio with 1 channel. Message From elements to application via Elements register template caps for their pads. the message bus This describes the data formats that they can Query From applications to elements accept. When starting the pipeline, elements agree and between elements. on the formats that they will use and store this in their pads. Different pads can use different formats Table 1: Communication primitives and formats can also change while the pipeline is running. All the communication object are lightweight GStreamer plugin packages provide various data object (GstMiniObject). They support subclassing, format conversion plugins such as e.g. but no properties and signals. audioconvert and audioresample. Such plugins 1.4 Multi threading switch to pass­through mode if no conversion is GStreamer pipelines always use at least one needed. For performance and quality reason it is of thread distinct from the main ui thread1. course best to avoid conversion. Depending on the scheduling model and pipeline structure more threads are used. The message bus 1.3 Data Flow described in the previous chapter is used to The framework provides various communication transfer messages to the applications main thread. and I/O mechanisms between elements and the application. The main data­flow is done by 1This also applies to commandline applications

16 Together these mechanisms decouple the UI from way. New elements can be used without that the the media handling. application needs special knowledge about them. Sink elements usually run a rendering thread. In 1.6 Plugin wrappers pull based scheduling also all the upstream data processing would run in this thread. The GStreamer plugin packages provide bridge In push based scheduling (which is the default plugins that integrate other plugin standards. In the for most elements), sources start a thread. audio area, GStreamer applications can use ladspa, Whenever data­flow branches out, one would add lv2 and buzzmachine plugins. The bridge plugins queue elements which again start new threads for register elements for each of the wrapped plugins. their source pad (see Illustration 2 for an example). This eases application development, as one does Threads are taken from a thread­pool. not have to deal with the different APIs on that GStreamer provides a default thread­pool, but the level any more. application can also provide its own. This allows 1.7 Input and Outputs (some) processing to run under different scheduling modes. The threads from the default GStreamer supports sources and sinks for pool are run with default scheduling (inherited various platforms and APIs. For audio on Linux from parent and thus usually SCHED_OTHER). these include Alsa, OSS, OSS4, Jack, PulseAudio, ESound, SDL as well as some esoteric options 1.5 Plugin/Element API (e.g. an Apple Airport Express Sink). Likewise GStreamer elements are GObjects subclassing there are video sinks for XVideo, SDL, OpenGL GstElement or any other base­class build on top of and others. that. Also structural components like GstBin and 1.8 Audio plugins GstPipeline are subclassed from GstElement. GStreamer has many specialized base­classes, Besides the plugin wrappers described in section covering audio­sinks and ­sources, filters and 1.6, the input/output elements mentioned in 1.7 many other use cases. and generic data flow and tool elements (tee, Elements are normally provided by installed input/output­selector, queue, adder, audioconvert), plugins. Plugins can be installed system wide or GStreamer has a couple of native audio elements locally. Besides applications can also register already. There is an equalizer, a spectrum analyser, elements directly from their code. This is useful a level meter, some filters and some effects in the for application specific elements. Buzztard uses gst­plugins­good module. The Buzztard project this for loading sounds through a memory­buffer has a simple monophonic synthesizer (simsyn) and sink. a fluidsynth wrapper in the gst­buzztard module. The use of the GObject paradigm provides full Especially simsyn is a good starting point for an introspection of an element's capabilities. This instrument plugin (audio generator). allows applications to use elements in a generic

Illustration 2: Thread boundaries in a pipeline

17 1.9 A/V Sync and Clocks applications usually also need sequencer Time is important in multimedia application. A capabilities. A Sequencer would control element common timebase is needed for mixing and properties based on time. GStreamer provides such synchronisation. A GStreamer pipeline contains a a mechanism with the GstController subsystem. clock object. This clock can be provided by an The application can create control­source objects element inside the pipeline (e.g. the audio clock and attach them to element properties. Then the from the audio sink) or a system clock is used as a application would program the controller and at fall back. The system clock is based on POSIX runtime the element fetches parameter changes by monotonic timers if available. Elements tag buffers timestamps. with timestamps. This is the stream­time for that The control­source functionality comes as a particular media object (see Illustration 3 for the base­class with a few implementations in relation of different timebases). Sink elements can GStreamer core: an interpolation control source now sync received buffers to the clock. If multiple and an lfo control source. The former takes a series sinks are in use, they all sync against the same of {timestamp, value} pairs and can provide clock source. intermediate values by applying various smoothing GStreamer also provides a network clock. This functions (trigger, step, linear, quadratic and allows to construct pipelines that span multiple cubic). The latter supports a few waveforms (sine, computers. square, saw, reverse­saw and triangle) plus One can use seek­events to configure from amplitude, frequency and phase control. which time to play (and until what time or the The GObject properties are annotated to help the end). Seek­events are also used to set the playback application to assign control­sources only to speed to achieve fast forward or playing media meaningful parameters. backwards. [5] has a short example demonstrating the interpolation control source. 1.10 Sequencer The sequencer in Buzztard completely relies on A sequencer records events over time. Usually this mechanism. All events recorded in the the sequence is split into multiple tracks too. timeline are available as interpolation control GStreamer elements are easy targets for event sources on the parameters of the audio generators automation. Each element comes with a number of and effects. GObject properties as part of its interface. The 1.11 Preset handling properties can be enumerated by the application. One can query data types, ranges and display texts. The GStreamer framework defines several Changing the value of an element's property has interfaces that can be implemented by elements. the desired effect almost immediately. Audio One that is useful for music applications is the elements update the processing parameters at least preset interface. It defines the API for applications once per buffer. In video elements a buffers is one to browse and activate element presets. A preset frame and thus parameter changes are always itself is a named parameter set. The interface also immediate. defines where those are stored in the file system The GObject properties are a convenient and how to merge system wide and user local files. interface for live control. Besides multimedia

Illustration 3: Clock times in a pipeline

18 In Buzztard the preset mechanism is used to > sync; echo 3 > /proc/sys/vm/drop_caches > time gst-inspect >/dev/null 2>&1 defines sound preset on sound generators and real 0m2.710s effects. Another use of presets are encoding user 0m0.084s sys 0m0.060s profiles for rendering of content so that it is > time gst-inspect >/dev/null 2>&1 suitable for specific devices. real 0m0.074s user 0m0.036s sys 0m0.040s 2 Development support Example 1: Registry initialisation times Previous chapters introduced the major framework features. One nice side effect of a 3.2 Pipeline construction widely used framework is that people write tools Initial format negotiation can take some time in to support the development. large pipelines. This gets especially worse if a lot The GStreamer core comes with a powerful of conversion elements are plugged (they will logging framework. Logs can be filtered to many increase the number of possible formats). One way criteria on the fly or analysed off­line using the to address this, is to define a desired default format gst­debug­viewer tool. Another useful feature is and plug conversion elements only as needed. the ability to dump pipeline layouts with all kinds > export GST_DEBUG="bt-cmd:3" of details as graphviz dot graphs. This is how > ./buzztard-cmd 2>&1 -c p -i Aehnatron.bmw | Illustration 1 was generated. grep start 0:00:00.038075793 first lifesign from the app Small pipelines can by tested with gst­launch on 0:00:02.583626430 song has been loaded the command­line. The needed elements can be 0:00:03.388285111 song is playing found with gst­inspect or by browsing the installed Example 2: Playback start time for a Buzztard song plugins with its graphical counterpart gst­ inspector. Example 2 is the authors poor mans approach to Gst­tracelib can provide statistics and meassure the time to go to playing. The timings performance monitoring of any GStreamer based already use the optimization suggested in the application. previous paragraph. The song used in the example uses 44 threads and consists of 341 GStreamer 3 Performance elements structured into 62 GstBins. One often asked questions is regarding to the 3.3 Running time framework overhead. This is discussed in the The actual run time overhead is quite low. sections below, looking at it from different angles. Pushing a buffer, will essentially just call the 3.1 Startup time process function on the peer pad. Likewise it happens in pull mode. There is some overhead for When initializing the gstreamer library, it load sanity checking and renegotiation testing. and verifies the registry cache. The cache is a dictionary of all known plugins and the features > time ./buzztard-cmd -c e -i Aehnatron.bmw -o /tmp/Aehnatron.wav they provide. The cache allows applications to 11:48.480 / 11:48.480 lookup elements by features, even before their real 0m29.687s module is loaded. user 0m41.615s The cache is built automatically if it is not sys 0m4.524s present. This spawns a separate process that will Example 3: Render a Buzztard song load and introspect each plugin. Crashing plugins are blacklisted. Obviously this takes some time, Example 3 show how long it takes to render a especially if a large amount of plugins is installed. quite big song to a pcm wav file. Processing nicely This of course depends at lot on the amount of saturates both cpu cores. This benchmark uses the installed plugins (in my case: 235 plugins, 1633 same song as in Example 2. features) and the system2. As can been seen from 4 Conclusion the above example, the 2nd round is quick. GStreamer is more that just a plugin API. It 2 Lenovo T60 using a Intel® CPU T2400@ 1.83GHz (dual core) implements a great detail of the plugin­host

19 functionality. To compare GStreamer with some [3] Stefan Kost. 24 June 2007 . Fun with technologies more known in the Linux Audio GStreamer Audio Effects. The Gnome Journal. ecosystem ­ one could understand GStreamer as a [4] VAMP Project. 1999­2010. The Vamp audio mix of the functionality that e.g. Jack + lv2 analysis plugin system. http://vamp­plugins.org provide together. GStreamer manages the whole [5] GStreamer Project. 1999­2010. processing filter graph, but in one process, while GstController audio example. jack would do something similar across multiple http://cgit.freedesktop.org/gstreamer/gstreamer/tr processes. GStreamer also describes how plugins ee/tests/examples/controller/audio­example.c are run together, while APIs like ladspa or lv2 leave that freedom to the host applications. The GStreamer framework comes with a lot of features needed to write a multimedia application. It abstracts handling of various media formats, provides sequencer functionality, integrates with the different media APIs on platforms like Linux, Windows and MacOS. The multithreaded design helps to write applications that make use of modern multicore CPU architectures. The media agnostic design is ideal for application that like to use other media besides audio as well. As a downside all the functionality also brings quite some complexity. Pure audio projects still need to deal with some aspects irrelevant to their area. Having support for arbitrary formats makes plugins more complex. Pull based processing remains an area that needs more work, especially if it should scale as well as the push based processing on multicore CPUs. If there is more interest in writing audio plugins as GStreamer elements, it would also be a good idea to introduce new base­classes. While there is already one for audio filters, there is none yet for audio generators (there is a GstBaseAudioSrc, but that is meant for a sound card source). Thus simsyn introduced in chapter 1.8 is build from 1428 lines of C, while the audiodelay effect in the same package only needs 620.

5 Acknowledgements Thanks go the the GStreamer community for their great passionate work on the framework and my family for their patience regarding my hobby.

References [1] GStreamer Project. 1999­2010. GStreamer: Documentation. ww.freedesktop.org. [2] Stefan Kost. 2003­2010. The Buzztard project. www.buzztard.org

20 Emulating a Combo Organ Using Faust

Sampo SAVOLAINEN Foo-plugins http://foo-plugins.googlecode.com/ [email protected]

Abstract nents. This makes for a good test of Faust’s per- This paper describes the working principles of a 40 formance and parallelization capabilities. Ac- year old transistor organ and how it is emulated with cess to an operational organ was also a factor in software. The emulation presented in this paper is the choice, as it makes it possible to match the open source and written in a functional language emulation quality against the original. called Faust. The architecture of the organ proved The contents of this paper is organized into to be challenging for Faust. The process of writing four sections. The first section covers how the this emulation highlighted some of Faust’s strengths real organ operates. This is followed by a de- and helped identify ways to improve the language. scription of how the organ is emulated. The Keywords third section covers the performance aspects of the emulation. The last section offers analysis Faust, synthesis, emulation of Faust’s strengths and gives proposals on how to improve the language. 1 Introduction Faust[Yann Orlarey, 2009b] stands for Func- 2 YC-20 tional AUdio STream and as the name implies This section covers the operations of the or- it is a functional language designed for audio gan in detail[Yamaha, 1969]. The informa- processing. The Faust compiler is an intermedi- tion in this is section is later referred to and ary compiler, which produces source code for a detailed further when discussing the emulated C++ signal processor class which is integrated parts. Figure 1 shows a block diagram of the into a chosen C++ architecture. This archi- device. tecture file provides a run-time environment, or wrapper, for the processor. This wrapper can 2.1 Features be for example a stand-alone Linux Jack appli- The organ has one physical manual with 61 keys cation or an audio processing plug-in. with a switchable 17 key bass manual and two This paper describes how an emulation of a voice sections. Instead of drawbars like Ham- 1970’s combo-organ was written in Faust. The mond organs, the voices are controlled by a lever Yamaha YC-20 is a fairly typical organ of its system used in Yamaha organs of the time. As time, a transistor based relatively light instru- the word drawbar is commonplace when dis- ment meant for musicians on the road. The cussing organ voicing, the word is used in place organ is a divide-down design and its working of lever for the voice controls. While the con- principles are discussed in detail in section 2. trols are potentiometers, they have notches to The emulation is released as open source under help the user achieve repeatable settings. Each the GNU General Public License (v3). In the drawbar lever in the organ has four positions spirit of open source, the working principles and (end stops and two notches) from off to full vol- decisions taken during writing the emulation are ume. described in this paper for all to read. Section I has drawbars 16’, 8’, 4’, 2-2/3’, 2’, The YC-20 was chosen to be emulated as its 1-3/5’ and 1’ and section II has drawbars 16’, architecture is very different from typical soft- 8’, 4’, 2’ and a continuous brightness lever. The ware and virtual analog synthesizers. Instead sections can be selected or mixed together using of the complex controllability and routability of a continuous balance lever. Enabling the bass typical synthesizers, this organ requires a large manual switches the lowest 17 keys (white on number of fixed parallel processes and compo- black) to bass section sounds with drawbars 16’

21 2.3 Sections The drawbar controls are potentiometers which control how much of each bus bar is mixed to each section of the organ. While in section I, drawbars are simply mixed together using resis- tors, Section II drawbars have more complex fil- tering. Each section II drawbar is divided into two streams, each filtered separately with RC networks. One stream is low pass filtered while the other is high pass filtered. The low passed and high passed signals are mixed through re- sistors to combine into bright and dull signals which are then buffered. The brightness poten- tiometer controls a mix of the two which, after further buffering, is the output of section II. The key switches on the lowest 17 keys feed separate bass bus bars. The bass manual switch controls whether these bus bars are kept sepa- rate or mixed with the main bus bars. Thus, with the bass feature disabled, the bass keys work exactly as the rest of the manual. With the switch enabled, the bass bus bars are sepa- rate and only feed the bass section. It is worth noting that when enabled the bass section effec- Figure 1: Block diagram of the YC-20 tively discards the signal of bass bus bars 2-2/3’, 2’, 1-3/5’ and 1’. and 8’. There are controls for pitch, overall vol- Bass section differs from the two main sec- ume, bass volume, vibrato depth, vibrato speed tions in that both bass manual drawbars (16’ and an obscure ”touch vibrato” feature. The and 8’) use a mix of multiple bass bus bars. organ also has a single percussion drawbar and The 16’ drawbar is a mix of bass bus bars 16’, 8’ an integrated volume pedal. Like the drawbars, and 4’. The bus bars are mixed through differ- all vibrato controls have four notches. ent value resistors. 16’ has the least resistance and 4’ the highest resistance. The 8’ drawbar 2.2 Tone generation is a mix of the 8’ bass bus bar with lower re- The synthesizer architecture comprises of 12 os- sistance and 4’ bass bus bar with higher resis- cillators, one per each note in a twelve-tone tance. While section I and section II drawbar equal temperament (12-TET) scale. Each os- control potentiometers are wired in a standard cillator produces a sawtooth wave. For each volume configuration, the bass manual drawbar oscillator, there are 7 frequency divider stages controls are wired center-tap to source with the so each oscillator produces 8 voices totalling 96 other tip grounded and the other tip as the out- voices. The dividers produce square waves. put. This configuration makes the drawbar con- These voices are next fed through a passive trol not only affect how much of the drawbar is filter bank. The divided voices have both a mixed in, but it also varies the impedance of high pass and low pass filters while the oscilla- the signal source. The different mixing resistors tor voices are only high pass filtered to remove and the varying impedance of the drawbar con- bias. This totals 180 resistor-capacitor (RC) trols, combined with a fixed capacitor to ground filter networks. Filtered voices are connected (after summing the drawbar signals) makes the to switches on the actual keys of the keyboard. network act as a fairly complex RC filter. This The keyboard is thus in fact a matrix mixer filter has a varying cutoff point and mix amount connecting the voices to bus bars. A key feeds per bass bus bar. each bus bar with the appropriate voice match- ing that key and octave, or the matching har- 2.4 Percussion monic voice in case of the 2-2/3’ and 1-3/5’ bus Percussion manual sounds are created by mix- bars. ing together bus bars 1’, 2-2/3’ and 16’ via resis-

22 tors. There is a substantially larger resistance the organ with, as Faust’s functional semantics on the 1’ bus bar leading to less 1’ voices in fit well with having all processing running at all the percussion section. The volume of this sig- times. Faust also makes it trivial to have a large nal is controlled by a simple envelope generator. amount of streams flowing from one circuit to This envelope generator is triggered by activity another. This emulation was also intended as a on the 1’ bus bar. The envelope attack is in- test of Faust in a real world use. stantaneous and the release time is fast while the sustain level is zero. This causes a fast at- 3.2 Tone generation tack sound, but the effect only works when no The 12 main oscillators produce sawtooth keys are pressed down before. In other words, waves. They share a common, varying bias volt- the percussion effect is not heard, for example, age (see section 3.3) which affects the oscillator when playing legato. However there is a sub- frequencies. The main oscillator voices are di- stantial amount of bleed from the percussion vided by an array of flip flop circuits each di- section which is audible when all other draw- viding the frequency in half and the next one bars are off. dividing the previously divided voice. The flip If the bass manual is enabled, it disengages flops produce square wave signals. This means the bass bus bars from the main bus bars. Thus each oscillator produces a total of 8 phase- the bass keys will not mix sounds onto the 1’ synchronized voices, each one an octave down bus bar. Therefore playing the bass manual from the previous voice. will not trigger the percussion and percussion As the main oscillator frequencies are high sounds will trigger even when bass keys are held (4–8kHz), a naive oscillator implementation down. would suffer massively from aliasing when us- 2.5 Main output ing typical sampling rates (44.1–96kHz). Nat- urally, also the dividers would suffer from The main output of the device is a mix of sec- aliasing as well. After evaluating differ- tions I and II, the percussion part and the bass ent methods to band-limit the signals, the manual. The mix between sections I and II is PolyBLEP1[V¨alim¨aki and Huovilainen, 2007] controlled by the balance lever (a potentiome- method was chosen. While BLEP[Brandt, ter). The amount of percussion mixed in is con- 2001] would produce less aliasing components, trolled by the percussion drawbar. Bass manual it is computationally more expensive. More is summed into the main output via a poten- importantly, the BLEP step function is cur- tiometer controlling the bass volume. This com- rently impossible to calculate in Faust as it re- bined signal is then preamplified. The preampli- quires using Fourier and inverse Fourier transfer fied output can then be attenuated by the main functions. The quality of BLIT-SWS2[Stilson, volume potentiometer of the device and the vol- 1996] is good at high frequencies, but it pro- ume pedal. The volume pedal action controls a duces aliasing components below the funda- mechanical shutter between a small light bulb mental frequency[V¨alim¨aki and Huovilainen, and a light dependent resistor. 2007]. Also, BLIT-SWS is problematic when 3 The emulation it comes to band-limiting hard synchronized os- cillators[Brandt, 2001] which is one strategy to The emphasis was on creating a playable instru- emulate the divider circuits. ment which sounds like the original organ. A Second, third, and fourth order polynomial playable emulation needs to be able to work at residual functions were evaluated for the Poly- low latencies and it needs to be efficient enough BLEP. The higher order residual functions re- to be ran on commodity hardware. The emula- duced high frequency aliasing only slightly more tion tries not only to emulate the ideal circuit compared to the second order function. Fur- but also some of the inaccuracies in the real in- thermore a band-limited signal using the sec- strument. ond order function has considerably less aliasing 3.1 Why Faust? components below the fundamental frequency Instead of taking an approach where sounds are when compared to the third and fourth or- synthesized only when needed, this emulation 1BLEP = band-limited step function. PolyBLEP keeps all oscillators, dividers and filters running uses a step function derived from a simple polynomial all the time – just like the real organ. Faust was 2BLIT = band-limited impulse train, SWS refers to chosen as the programming language to emulate the use of windowed sinc functions

23 der functions. This confirmed previous find- Frequency (A) (B) (C) ings[Pekonen, 2007]. The chosen second order 5kHz 352 800 44 100 10 000 polynomial residual function r(t) is shown as 1kHz 352 800 44 100 2 000 equation 1. 500Hz 352 800 44 100 1 000

Table 1: Amount of PolyBLEP calculations per t2/2 + t + 1/2, if 1 t 0 second for a square wave at Fs = 44.1kHz. r(t) = − ≤ ≤ (1) t2/2 + t 1/2, if 0 < t 1 (A) an implementation where per each frame (− − ≤ the PolyBLEP is evaluated for both the pre- vious and current sample for both discontinu- The divider circuits are emulated as slaved os- ities. The branched nature of the residual func- cillators. The main oscillator function outputs tion multiplies this number further by a factor both the signal and phase information. One di- of two. (B) an ideal implementation without visor function divides the phase and feeds this branching where only one PolyBLEP is evalu- to a slave oscillator. The slave again produces ated per frame. (C) is the number of required both a signal and phase information. The com- PolyBLEP calculations. plete divider circuit for one oscillator is seven such divisor functions piggybacked. PolyBLEP works by adding the polynomial band-limited step function to two samples: the to ground after the series capacitor. This al- sample before and after a discontinuity in the ters the next stage impedance compared to the signal. To achieve this, the implementation de- other filters. lays its output by one sample. At each frame, Filtering is very complex to emulate exactly the phase is inspected. If the phase passed a dis- right, as one voice might be connected to mul- continuity in the waveform, the PolyBLEP func- tiple bus bars. This varies the high-pass filter tion is evaluated for and added (rising wave) or load resistance and therefore affects the filter subtracted (falling wave) for the previous sam- cutoff frequency depending on what keys are ple and the current sample. As Faust lacks depressed. This is not emulated. Instead, the true branches, all possible branches of condi- high-pass filter load is estimated based on the tional statements are always calculated. Table expectation that there is only one connection to 1 shows the amount of residual function evalu- a bus bar. ations required under different conditions. The 3.3 Oscillator bias table shows that branching would be far supe- rior to any non-branching solution. This is be- Each oscillator produces a saw wave at a differ- cause without branching the amount of Poly- ent frequency. The frequencies are chosen from BLEP evaluations done is purely a function of a 12-TET scale. However all oscillators share the sampling rate while the amount of required a single bias voltage affecting their frequency. evaluations is a function of the signal frequency. This bias voltage is controlled by the vibrato circuit, touch vibrato and the master pitch po- To be able to run the emulation in real time, tentiometer. The master pitch potentiometer is a truly branching solution had to be developed emulated by a simple slider widget with a DC as an external C++ function. This was however output. Touch vibrato would be controlled by easy to integrate with the Faust processing as horizontal movement of the keys. As such MIDI the ffunction operator lets one use externals just keyboards are extremely rare3 this feature was as native functions. left out of the emulation. The filter bank contains tailored filtering for Vibrato control voltage is created by a simple each of the 96 voices produced by the oscillators sine wave oscillator. The vibrato speed controls and dividers. The main oscillators only have a the speed of this oscillator. The vibrato speed single capacitor in series. This is emulated as range (5-8Hz) was measured from the real de- a high-pass RC filter using an approximation vice. The vibrato depth is simply an attenu- of the next stage impedance as the load resis- ation control for the vibrato oscillator output. tor value. The first four divided signals have a Vibrato depth range was empirically selected. trivial RC low-pass filter before a single capac- The depth control deliberately never goes to itor in series similar to the the main oscillator filter. The lowest three voices are filtered like 3Only one keyboard was found claiming such capa- the previous four, except that there is a resistor bility: the Yamaha STAGEA ELS-01C.

24 zero, thus the vibrato has a miniscule effect on the bias voltage even when turned down. 3.4 Keyboard matrix mixer The keyboard matrix mixing, while a relatively simple part, contains many separate operations. Each key is a Faust button4 connecting multi- ple voices to different bus bars. This means each key button is used as a multiplier for 7 differ- ent voices (one per bus bar). So the signals on the emulated bus bars are the outcome of 61 7 = 427 discrete multiplications summed to appropriate∗ busses. There is also added logic for the bass manual where the resulting down- mixes from the 17 bottom keys are fed to either the main bus bars or the bass bus bars. The matrix mixer could be written as sets of floating point tables multiplied together. In Figure 2: Comparison between emulated and Faust however, there are no such array opera- real manual I drawbar 8’ sounds. tors. This results in a number of separate mul- tiplications and sums which are difficult for the organ was measured while playing a stable note Faust compiler to optimize, as vector operations with all drawbars off except one drawbar which are unusable if the data is not in ordered arrays. was tested in all three on positions. The off The key action is not band-limited. The key position is expected to be -inf. Compared to full switches on the real organ are simple switches volume, the two middle positions were measured connecting voices to bus bars. This operation at -12dB and -18dB. causes clicks in the real organ as switches open This can be translated into a continuous and close. The naive non-band-limited key ac- transfer function (see equation 2). p is the posi- tion of the emulation matches the sound of the tion of the lever from 0 (off) to 1 (full volume). real organ surprisingly well. However, this is The function returns a gain coefficient usable in something that could be improved at a later the emulation. The function emulates the taper stage. and can be used with slider controls provided Writing the keyboard matrix mixer also re- by the Faust architecture. Further work on a vealed an issue with the service manual. The graphical user interface should provide four po- service manual indicates wrongly which voices sition levers. are connected to the harmonic bus bars (2-2/3’ and 1-3/5’). The manual states that 2-2/3’ is coeff(p) = 2.81p3 2.81p2 + p connected to a voice five semitones higher than − coeff(0) = inf dB the voice connected to 4’ and 1-3/5’ with a voice − eight semitones higher than 2’. The real organ coeff(1/3) 18.05 dB (2) ≈ − connects voices seven and three semitones above coeff(2/3) 12.03 dB ≈ − respectively. The emulation follows the real or- coeff(1) = +0.00 dB gan instead of the service manual. 3.5 Section I Figure 2 shows C2 played on section I with only the 8’ drawbar engaged. The line depicts Section I is a mix of the main bus bars mixed to- the frequency spectrum of the emulation out- gether through the drawbar potentiometers and put. The crosses show peaks measured from a additional resistors. There is no extra filtering YC-20 organ using the same settings. applied to the signal. Thus the only part left to emulate for section I sounds is the drawbar 3.6 Section II controls. As described earlier in section 2.3, Section II The drawbar potentiometers do not follow the has a controllable brightness feature. The vari- typical linear or logarithmic tapers. The real able brightness is done by dividing each bus bar 4Button output signal is 1 when it is being pressed into two streams, one of which is high-passed down and otherwise 0. and the other low-passed. The streams derived

25 from different bus bars are then mixed together into bright and dull streams. The brightness control is a simple balance control between the two streams. The high-pass filtering is done by a two stage RC filter with a resistor in parallel with the ca- pacitor of the first filter. Thus the filter is ef- fectively a shelving high pass filter. The shelf is emulated by calculating a voltage divider be- tween the parallel resistor and the resistor in the RC high pass filter. The voltage divider gives a gain coefficient C which is used to mix together the high-passed and unfiltered signals. Unfil- tered signal is multiplied by C while the filtered is multiplied by 1-C and the results are added together. This keeps gain at high frequencies at 0dB as with the original passive filter. The two Figure 3: Comparison between emulated and passive filters also affect each other and could real bright section II drawbar 8’ sounds. not be emulated by simply chaining two digital filters. The load applied by two filter stages is envelope is then used to control the volume of emulated by dividing the resistance of the sec- a mix of the 1’, 2-2/3’ and 16’ bus bars. Some ond filter by two. This doubles the cutoff fre- bleed is added to the envelope generator out- quency of the filter. This method was found by put to emulate the bleed exhibited by the real trial and error. Figure 4 shows the block dia- organ. gram of the complete filter. The low-pass stage is much more straightfor- 3.8 Bass manual ward. Filtering consists of two chained RC low- Emulating the bass section perfectly is a com- pass filters. However, the best match with the plex task due to how mixing resistors and the original organ was achieved by not compensat- variable impedances of the potentiometers af- ing for load posed by the chained filters. This fect the RC filter. Measurements from the real filter stage is emulated by two digital RC low organ however suggest that the low frequency pass filters in series. fundamentals dominate the signal and an ac- Figure 3 shows C2 played on a fully bright ceptable emulation is achieved by carefully con- section II with only the 8’ drawbar engaged. trolling the mix of the bass bus bars and by This measurement was done at the same overall using a single fixed RC low-pass filter. This one level as measurements in figure 2. The measure- filter is applied to a mix of both bass drawbars. ment shows a slight overall volume difference and that the section II high-pass filters are not 4 Performance perfect. However, while harmonics for this par- Running this emulation requires a substantial ticular sound do not line up exactly, the balance amount of processing power. There are 96 os- between harmonics is good enough. The har- cillators working all of the time and there are monics of many other notes (for example C3) hundreds of filters being applied at many stages line up perfectly. of the design. Faust is designed to easily har- ness multiple processing units in parallel. The 3.7 Percussion design of the emulation has a lot of potential for The percussion uses the signal on the 1’ bus parallelization and as such, it should work as a bar to trigger an envelope generator. There good real world test for Faust’s SMP features. is a root-mean-square envelope follower on the The performance numbers shown here are 1’ bus bar and the envelope is triggered when measured using a modified version of the Faust the follower exceeds a set threshold. The enve- benchmarking suite. These numbers are es- lope generator starts off with a signal where a timates of achieved memory bandwidth in one frame unit impulse is created at the trigger megabytes of audio data generated per second. point. This impulse signal is low-pass filtered to The Faust benchmark uses this measurement, match the measured percussion envelope. This as memory bandwidth on SMP systems is a pre-

26 cious commodity. In the case of the YC-20, the a jack application. The measurements shown processing creates much more data than what in table 3 were done by running the emulation comes out of its single mono output. Thus these compiled with different flags and observing the figures can only be compared to each other. DSP percentage meter in qjackctl. This meter Tests were done on a Dell D820 laptop with tells how much of the time between jack engine a Core 2 Duo T7400 (2.16GHz) processor using callbacks is spent doing processing inside jack the internal sound interface. The operating sys- clients. tem was 32 bit Ubuntu 9.10. The processor fre- quency governor was switched to performance Tests Faust flags Load scalar none 49% on both cores before running the tests. The ∼ kernel was a standard Ubuntu package, 2.6.31- vectorized -vec -vs 32 34% scheduler -sch -g -vs 256 ∼ 29% 19-generic and the used gcc version was 4.4.1- ∼ 4ubuntu9. Benchmarks were done on YC-20 code revision 2275. Figures in column C are Table 3: All tests were compiled with the fol- from a slightly modified version of the YC-20 lowing gcc flags: code and it shows the performance impact of -O3 -march=native -mfpmath=sse -msse -msse2 non-branching PolyBLEP calculations. The re- -msse3 -ffast-math -ftree-vectorize. sults in MB/s are shown in table 2. Jack 1 was running real-time with a dummy back-end simulating a 48kHz sample rate with Faust benchmark (A) (B) (C) 512 frame buffers. scalar (galsascal) 0.40 0.43 0.28 vectorized (galsavec) 0.58 0.59 0.35 vectorized 2 (galsavec2) 0.62 0.62 0.34 5 Conclusions OpenMP (galsomp2) 0.51 0.55 0.29 scheduler (galsasch) 0.90 0.88 N/A The software presented in this paper emulates scheduler 2 (galsasch2) 0.93 0.89 N/A the YC-20 organ well. While some parts of the emulation could benefit from more polish, Table 2: The tests were ran with different sets other parts of the emulation can sound very of gcc parameters: much like a real YC-20. A better quality anti- (A) Branching C++ PolyBLEP evaluations. aliasing method could be beneficial, although gcc -O3 -march=native -mfpmath=sse -msse these methods would come at the expense of -msse2 -msse3 -ffast-math -ftree-vectorize CPU cycles. The real organ has considerable (B) Branching C++ PolyBLEP evaluations. bleed and inconsistencies which might also be gcc -O3 -march=native -mfpmath=sse -msse worth emulating. -msse2 -msse3 -ffast-math -ftree-vectorize 5.1 How to improve Faust -fgcse-sm -funsafe-math-optimizations (C) Divider slave oscillators use non-branched As discussed in section 3.2, and shown in the PolyBLEP operations. gcc -O3 -march=native benchmarks in section 4, there is a strong case -mfpmath=sse -msse -msse2 -msse3 -ffast-math for a truly branching select operation. However, -ftree-vectorize -fgcse-sm -funsafe-math- the semantics of Faust requires all of the pro- optimizations cessing graph to be evaluated for every frame. GCC was unable to compile the scheduler Truly branching operations would cause some versions of the emulation with non-branching function evaluations to be skipped. The op- PolyBLEP operations. erating semantics are however compatible with branching operations if the skipped functions are stateless. This is because if the function has no state, the next evaluation of the func- The multi-threaded scheduler tests were no- tion does not depend on previous evaluations. tably faster than other benchmarks. There is There are two ways truly branching opera- an increase of 50% in memory bandwidth when tions could be added to Faust: either select2 and compared to vectorized single thread tests. This select3 calls with no stateful function branches data is backed by tests done by measuring the are compiled automatically as truly branching performance of the YC-20 emulation running as or a new branching select call is created. Both 5Subversion repository available at http://foo- cases require the specification of rules on what plugins.googlecode.com/svn/trunk/ a stateless function is. Also, the compiler must

27 check whether the functions being skipped over 96 inputs of the keyboard mixer and the but- comply to these specifications before allowing tons representing the keys of the organ could compilation of such code. be arranged into arrays, the mixing could be This logic could also be generalized to multi- done algorithmically using the aggregate func- plication functions. Function f(x) in equation tions such as sum and vector multiply. Such op- 3 is an example of a function where evaluating erations also open up ways to further optimize t2(x) can be skipped if x 0 as t1(x) evaluates the created C++ code. As it stands, the key- to 0 for x 0 and t2 is stateless.≤ board mixing is compiled into a large amount ≤ of separate discrete multiplication and addition t1(x) = (x > 0) operations which can not be vectorized. The Faust code shown as equation 4 is am- t2(x) = x2 fmod(x, 1.0) (3) − biguous, but can be compiled without any warn- f(x) = t1(x) t2(x) ing. This can lead to anywhere from unexpected ∗ results to complete failure. It would be good The compiler error messages are often unhelp- form for the Faust compiler to at least warn the ful. They do not always specify in which file the user of the situation. compilation error occurred. Optimally the com- piler should identify the file name, line number and if possible, the name of the function the f(x, y) = x + y (4) error occurred in. One specific issue with the with x = y y; ; compiler messages is worth giving special atten- { ∗ } tion to. It is cases where the number of inputs 5.2 Benefits of Faust and outputs of functions in a sequential compo- Functional thinking suits audio applications sition do not match. In such cases, instead of very well as solutions to problems can often printing the names of the offending functions, be expressed as mathematical equations. This Faust outputs a quite exhaustive description of model sidesteps many of the issues procedural both functions. For example, if the YC-20 emu- programs face, as the processes can be written lation would fail to cut the phase output of the at a higher abstraction level. Unlike with proce- last divider (dividers would then output a total dural languages, Faust allows signal processing of 108 instead of 96 streams), the resulting error programmers to not worry about buffers, their message is 67 kilobytes. lengths or loop structures or real-time thread For signal processing, the lack of support for constraints. This also saves time for the pro- Fourier transforms restricts what problems can grammer as less time is spent debugging issues be solved. As mentioned in section 3.2, the bet- not directly connected with the signal process- ter band-limiting method (BLEP) is not possi- ing task at hand. ble to implement in pure Faust. Fourier trans- forms would naturally be useful for a variety of Faust code is easy to keep readable as the syn- other tasks as well. Fortunately there has been tax offers tools, such as sequential composition, promising news of multi-rate support for Faust which help keep functions simple and concise. which would allow processing such as Fourier See listings 1, 2 and 3 for an example. While transforms. listing 2 is pretty compact, one should note that As procedural programming is the prevailing functions are presented to the reader in reverse model of programming, it is safe to say many order and it is also difficult to see to which func- potential users of Faust are familiar with only tion return value the multiplication is applied that model. This applies pressure to the quality to. of documentation. The current documentation float d o s t u f f ( float f ) { should abbreviate the definitions of the more f = func1(f); f = func2(f); advanced features of the language. Especially f = f 2 . 5 ; recursion and the rdtable and rwtable functions f = func3(f);∗ could benefit from better explanations. Cur- return f ; rently the features are almost side-stepped and } almost nothing but their syntax is described. Listing 1: Consecutive calls Implementing the matrix mixer required a lot of hand-crafting. Having a concept of index- float d o s t u f f ( float f ) able arrays would help such operations. If the return func3( func2({ func1(f)) 2 . 5 ) ; ∗

28 refactor the modules to match. Faust proces- sors naturally fit with each other. This model works very well with open source as it makes it easier to spread good ideas and implementa- tions. It might even lead to a resource library of truly reusable components usable with any Faust project – as long as the licenses are com- patible. Faust also provides automatic parallelization and vectorization. This allows all Faust pro- grams to benefit from these advanced optimiza- tion methods which usually require expert pro- gramming skills to implement. These capabil- ities have been discussed in detail by Orlarey, Fober and Letz[Yann Orlarey, 2009a]. Faust also allows the developer to easily test and compare different optimization methods with- out refactoring their code. 6 Acknowledgements Thanks to Petri Junno for support and all the help with analyzing the original organ. I would also like to thank Sakari Bergen, Torben Hohn and Yann Orlarey for providing advice. Thanks also to Fons Adriaensen for Japa which was an excellent tool for comparing the emulation to the real organ. References Eli Brandt. 2001. Hard sync without aliasing. In Proceedings of the International Computer Music Conference (ICMC), Havana, Cuba, September. Figure 4: Faust SVG output of section II high pass filtering for a single drawbar. The param- Jussi Pekonen. 2007. Computationally effi- eters for the filters are resistance in ohms and cient music synthesis – methods and sound capacitance in microfarads. design. Master’s thesis, TKK Helsinki Uni- versity of Technology. Tim Stilson. 1996. Alias-free digital synthesis } Listing 2: Enclosed statements of classic analog waveforms. Vesa V¨alim¨akiand Antti Huovilainen. 2007. dostuff = func1 : func2 : (2.5) : func3; Antialiasing oscillators in subtractive syn- ∗ Listing 3: Faust sequential composition thesis. Signal Processing Magazine, IEEE, 24(2):116–125. The SVG output of the Faust compiler is Yamaha, 1969. YC-20 Service Manual. also worth mentioning. The compiler can create Yamaha Corporation. hyper-linked SVG documents which depict the process function. This document can be an ex- Stephane Letz Yann Orlarey, Do- cellent learning and debugging tool as it shows minique Fober. 2009a. Adding automatic how the compiler understood the source code. parallelization to faust. In Grame, editor, Figure 4 shows an example of this output. Linux Audio Conference 2009. The stream concept also makes code reusable. Stephane Letz Yann Orlarey, Do- When combining procedural code from multi- minique Fober. 2009b. FAUST : an Efficient ple sources you often need to either convert Functional Approach to DSP Programming. data types between the different modules or

29 30 LuaAV: Extensibility and Heterogeneity for Audiovisual Computing

Graham WAKEFIELD and Wesley SMITH and Charles ROBERTS Media Arts and Technology, University of California Santa Barbara Santa Barbara, CA 93110, USA, wakefield, whsmith, c.roberts @mat.ucsb.edu { }

Abstract We describe LuaAV, a runtime library and applica- tion which extends the Lua programming language to support computational composition of temporal, sound, visual, spatial and other elements. In this paper we document how we have attempted to maintain several core principles of Lua itself - extensibility, meta-mechanisms, efficiency, portabil- ity - while providing the flexibility and temporal accuracy demanded by interactive audio-visual media. Code generation is noted as a recurrent strategy for increasingly dynamic and extensible environments. Keywords Audio-visual, composition, Lua, scripting language Figure 1: LuaAV in a live-coding performance 1 LuaAV (Asterisk: performed by A McLeran and G LuaAV is an integrated programming envi- Wakefield, 2009). ronment based upon extensions to the Lua programming language enabling the tight 2 LuaAV philosophy real-time integration of computation, time, The broad attitude taken in the development sound and space. of LuaAV draws inspiration from the Lua pro- LuaAV has grown from the needs of stu- gramming language itself: extensibility, meta- dents and researchers in the Media Arts & mechanisms, efficiency, and portability4. Technology program at the University of California Santa Barbara; its origins lie in 2.1 Extensibility earlier Lua-based audio and visual tools [20] Lua is described as an ‘extensible extension lan- [15] [22]. More recently it has formed a central guage’ [8]: a configuration language to embed component of media software infrastructure for within and extend the control of kernel appli- the AlloSphere [1] research space (a 3-storey cation or library code (which is typically writ- immersive spherical cave-like environment with ten in a statically compiled language such as stereographic projection and spatial audio). C). LuaAV follows this methodology directly: Various projects built using LuaAV have the core engine of LuaAV is a library of code been performed, exhibited or installed inter- (libluaav) intended to be embeddable within ap- nationally, for scientific visualization [1], data plications and application plugins, embedding visualization [13], immersive generative art code to manage instances of Lua interpreters, [21], game development1, live-coding (Figure 1) schedulers, an audio driver, and basic commu- and audiovisual performance2. nication protocols (MIDI and OSC). Most as- LuaAV is available under a UC Regents li- pects of the core library have both C and Lua cense similar in nature to the BSD license3. programming interfaces. 1http://www.charlie-roberts.com/projects/circles/ 2http://www.mat.ucsb.edu/˜whsmith/Synecdoche/ 4The reasoning behind the choice of Lua has been 3http://lua-av.mat.ucsb.edu/ documented in prior publications, particularly [17].

31 The LuaAV application embeds libluaav and to a large proportion of the LLVM/Clang API adds to it a GUI for managing active script from within Lua5, such that abstract syntax states, a CodePad for adding code to script trees as Lua data-structures, or pure C code states at run-time, and a Windowing/Menu strings, can be JIT-compiled and linked back system. Most of these components are also into the application as it proceeds, under the scriptable; indeed much of the application logic direction of an executing Lua script. of LuaAV itself is written using embedded scripts. 2.4 Portability It is near impossible to portably support the 2.2 Meta-mechanisms complex demands of multimedia applications A fundamental Lua concept is the provision of because of the diversity of platforms and their low-level meta-mechanisms for implementing dependencies. LuaAV currently targets recent features, rather than a built-in fixed set. Lua’s Linux and OSX platforms, however we strive to meta-mechanisms bring an economy of concepts use established, stable cross-platform libraries while allowing the semantics to be extended in where possible (such as PortAudio/JACK, unconventional ways. We find this open-ended the Apache Portable Runtime, etc), or else philosophy appropriate to our research domain provide abstraction layers for platform-specific of computational composition. code as appropriate (for example, the LuaAV We attempt re-use the mechanisms provided application Windowing and GUI is written by Lua itself in a consistent and predictable using a common abstraction layer over Qt for manner. As examples: the addition of temporal Linux and Cocoa for OSX).6 scheduling in LuaAV is implemented as an extension of the existing coroutine construct in 3 Related work Lua; new functionality is added to the runtime environment using the existing Lua module LuaAV is one of a family of audio/visual system; and many of these capabilities are applications in which the primary interface is specified using the existing data description an embedded programming language, including and metamethod features of Lua. SuperCollider [10], Impromptu [2], Fluxus [7], ChucK [23], and many more. All of these ap- 2.3 Efficiency plications can all be used within a performative Lua is widely acknowledged as amongst the context, such as live-coding [3]. most efficient of interpreted or scripting lan- Impromptu, based on the Scheme program- guages, however there is still an order of ming language, is an OS X only environment magnitude of performance cost relative to that was originally created for audio manipula- statically compiled code: the price paid for tion; it has been extended to also include visual dynamic flexibility that interpreters offer. For programming. One element of Impromptu that this reason, the core scheduler, drivers and is of particular interest is its multi-user runtime; other elements of the libluaav runtime library a single networked Impromptu environment can are coded in C/C++. Nevertheless, in order to be accessed and manipulated by multiple users overcome the usual trade-off between flexibility concurrently. We are actively developing a sim- and efficiency, we have begun to leverage of ilar capability in LuaAV to satisfy multi-user run-time compilation to machine code within demands in the AlloSphere. LuaAV. Fluxus is another Scheme based platform, On i386 platforms, the Lua core of libluaav is however it is primarily geared towards visual replaced with the LuaJIT [11] interpreter and composition and is cross-platform. The Fluxa trace compiler, allowing performance approxi- add-on module adds basic audio synthesis and mating C for certain algorithms, and significant playback capabilities to the Fluxus environ- performance boost over regular Lua even in in- ment; however it is limited in terms of the terpreted mode. number and breadth of unit generators that it LuaJIT grants efficiency for pure Lua source provides. code, however it cannot optimize over the 5 C/Lua boundary. For those aspects of an http://code.google.com/p/luaclang 6Nevertheless, the broad scope of the LuaAV appli- application that talk to existing C code, cation implies many non-trivial dependencies. We cur- we have been investigating the potential of rently include a Lua-based build tool and command-line LLVM/Clang [9]. We have developed a binding scripts to try to make installation on Linux more fluid.

32 ChucK is a live coding language geared to- wards audio with fine scheduling between syn- thesis and control (’strongly timed’). LuaAV’s scheduling system offers similar control, which will be described in detail below. ChucK’s vi- sual capabilities only extend to one of its devel- opment environments, the Audicle, and primar- ily revolve around visualizing currently running audio processes. SuperCollider is also predominantly a sys- tem for audio synthesis and scheduling. Its language is strongly inspired by Smalltalk, whose dynamism provides many possibilities for modifying running programs in real time. Figure 2: The LuaAV application running on SuperCollider can be extended with down- Ubuntu 9.10, with the CodePad view open. loadable modules (’quarks’), some including 7 graphical capabilities . pression grammar 4 LuaAV Implementation the ability to edit multiple scripts concur- 4.1 Script states • rently in tabs or multiple windows Opening a script in the LuaAV application the ability to execute a selected portion of creates a luaav state object, a libluaav wrapper • a script around a Lua interpreter state with additional components: basic visual error reporting • a logical clock • 4.2 Scripting in real-time a scheduler queue with pending events and • coroutines A central problem of interactive computing ap- plications is the translation from the abstract a graph of audio processes (Synths) • temporality of programming to the concrete and a bidirectional message queue (for commu- • often unpredictable behavior of real-time behav- nication with the audio graph) ior and interaction. The articulation of struc- a memory pool for C-allocated objects as- ture goes beyond matters of efficiency to de- • sociated with the lifetime of the luaav state mand:

An opened script can be closed or reloaded capacity to maintain required state until from within the LuaAV application, and its • the appropriate moment (dynamic memory source can be viewed by opening the default management) external editor. The file modification date flexibility to re-activate maintained state at of the script file is monitored by LuaAV and • the script automatically reloaded if changed; unpredictable moments (re-entrancy) thus users can edit scripts in their preferred ability to delay activity until a chosen or editor and see the results updated in LuaAV • appropriate time (event ordering, schedul- immediately. ing) Scripts can also be edited via the integrated CodePad (Figure 2) which was added to ability to specify events in measures of real- • incorporate support for live coding practices time (clocks, event spacing) as well as log- into LuaAV. Code entered into the CodePad ical/causal relations (event handling) does not reload the script; rather it is injected into the luaav state without interruption. Lua already offers excellent re-entrancy and Important features of the CodePad include: dynamic memory management: user-driven calls and library callbacks can be made into an syntax highlighting via the Leg8 parsing ex- • interpreter instance after a script has executed, 7e.g. http://sourceforge.net/projects/scgraph/ and variables will remain alive so long as they 8http://leg.luaforge.net/ are accessible. This re-entrancy extends to the

33 implementation of coroutines9 in Lua in which -- start tocking after 0.5 seconds lexically scoped local variables will remain go(0.5, printer, "tock") ’alive’ for as long as the coroutine block runs or awaits to be resumed. This relatively simple interface is a low-level meta-mechanism from which more complex 4.2.1 Metrics, scheduling and temporal patterns and semantics can be con- concurrency structed. For example, a coroutine which While Lua ensures deterministic sequencing returns a function will continue execution in of instructions, it lacks is a sense of temporal that function’s body (as a tail call in Lua), metric. Adding this metric is one of the roles and a coroutine which returns a call to its own of libluaav. Within a script in LuaAV, we function will implement temporal recursion can ask for the current time using the now() [19]. function. The time returned is logical time for Furthermore, it is possible to create new the luaav state scheduler, which is anchored schedulers whose metrics are driven by events in real-time by reference to the audio sample within the script itself; this can be used to counter. create a tempo clock for example. To grant explicit scriptable control over the scheduling capabilities of a luaav state, we have 4.3 Multi-threading and audio extended the coroutine mechanism to allow Given the power of coroutines to deterministi- yielding control to the scheduler by means of cally model concurrent activities the decision by a wait() function. The arguments to wait can the Lua authors to shun multi-threading is eas- be a duration (after which the coroutine will ier to understand11. Our own approach is to resume)10 or an arbitrary string (the name maintain this single-threaded nature for the Lua of an event trigger to wait for). A scheduled interpreter instances: it is consistent with the coroutine can be directly created using the recommended manner to interact with OpenGL go function, whose arguments can specify a contexts and GPU resources, and its determin- duration or event to wait for before starting istic assurances greatly simplify the code within the coroutine, and a function and arguments libluaav. to form the body of the coroutine. Of course, Unfortunately however, audio processing is any valid Lua code can be placed within a better placed in a dedicated independent high- coroutine; in fact the entire script itself is also priority thread, in which unbounded calls (such a coroutine and can wait() and check now() as as memory allocations, garbage collections and needed. so on) are avoided [4]. The natural result is two A code example may speak a thousand words: threads: one for the interpreter and graphics, one for the audio processing, and the problem -- define a function to print a message of synchronization between them. -- repeatedly, every 1 second Our solution is to mirror state between function printer(message) the interpreter thread and the audio thread while true do by means of time-stamped synchronization print(message) messages along a pair of single-reader/single- wait(1) -- wait 1 second writer FIFO (first-in, first-out) message queues end (built upon the JACK ringbuffer[12]). Memory end allocation/disposal and initialization of audio -- start ticking: objects occurs in the main thread, but subse- go(printer, "tick") quent state changes triggered from Lua code are serialized and dispatched to audio thread 9Coroutines are subroutines that act as master pro- via the message queue. The audio thread grams (Conway, 1963). A coroutine in Lua is a concur- can then retrieve these messages (up to the rent asynchronous state with its own instruction pointer, stack and local variables, but with access to shared glob- appropriate timestamp) and apply the state als. A coroutine is constructed from a Lua function, changes in the context of signal processing which can explicitly yield execution and be resumed directly. later. 10Durations are measured in seconds by default, how- 11Introducing threading into standard Lua can be ever LuaAV supports the creation of arbitrary user clocks done, however the granularity is so high as to make this and schedulers, with which concepts of tempo and beats feature nearly useless and execution effectively single- can be easily constructed. threaded anyway.

34 There is necessarily a latency between the arbitrary time (’strongly timed’). In order to Lua thread time and the audio thread time, achieve sample-accuracy in graph dynamics which is bounded by the update period and while maintaining determinism in the signals, jitter of both. So long as actual latency LuaAV traverses sub-sections of the graph remains below a (user-specifiable) ideal limit, for sample-accurate state changes. For CPU fully sample-accurate temporal determinism efficiency, only the upstream dependencies of can be achieved12. If ideal latency cannot be the changing node(s) must be computed, and kept, events will fire late, but the order of only up to the sub-block timestamp of the events remains determinate. scheduled change. The main drawback of this approach is that For memory efficiency, it is better to minimize audio state cannot be immediately retrieved in the number of buffers allocated, and re-use ex- the Lua script: method calls on audio objects isting memory when it can be safely done. A are asynchronous and cannot return concrete good strategy would maintain a memory pool values. Similarly any messages sent from the of re-usable buffers with a lazy allocation, eager audio thread to the main thread are also latent, recycling policy, under the control of a coloring preventing temporally accurate triggering of algorithm akin to register allocation. However Lua code in response to audio analysis for this strategy becomes quite complex with the example. combination of multiple references (buffers with multiple readers and/or multiple writers), feed- 4.4 Audio engine back connections and sub-block size traversals. The audio engine within LuaAV acts upon We currently only optimize for single-use non- time-stamped messages received on the message feedback connections and defer recycling until queue from the Lua thread, and triggers any the end of the block, but are researching more process calls in active audio objects (from here optimal algorithms. on denoted Synths13). Synths have notions of signal input and output ports of various kinds 4.5 Signal Processing which can be connected to each other14. The Toward efficiency and extensibility, LuaAV’s connections and disconnections of ports are Synths are built according to a low-level C specified by messages from the main thread. API. The API provides as much functionality 4.4.1 Dynamic graphs with sample as possible (such as automatic bindings to Lua) accuracy without compromising flexibility. Synth code can be written in C or C++, does not need For efficiency (and to achieve real-time guaran- to inherit or compose any pre-existing objects, tees) signal processing graph nodes are typically nor conform to a particular data layout. computed over blocks of N sample frames with buffered input and output signal streams. 4.5.1 Standard signal processing units The audio engine has the responsibility to For the purposes of rapid testing and minimal ensure that buffers of data for a Synth’s input dependency, a concise set of standard Synths ports are properly filled and the output ports are provided within the libluaav library. These properly prepared before the Synth’s signal units wrap low-level synthesis routines (using processing function is executed. code from the Gamma [14] library) within ab- Since the block-rate is purely an imple- stract definitions of ports, methods and process mentation detail and carries no musical or routines. Static code-generation in the libluaav aesthetic significance, we aim to hide it com- build process uses these definitions to automat- pletely from the Lua interface; users should ically create bindings to the LuaAV audio API, be able to code with state changes at any Lua bindings, and documentation. The following example code plays a series of 12 Clock drift is not an issue per se, since we derive our sine bleeps of random frequency, whose dura- source of real time from the audio sample clock itself. 13The term ’synth’ is used rather than ’unit generator’ tions progressively shorten from 1 second to 1 to indicate a coarser granularity in the graph. Finer millisecond: granularities are better handled by JIT compilation of synths from sub-components which better deserve to be local outs = Outs() -- stereo output bus named unit generators. 14Feedback between Synths necessarily incurs a block for i = 1, 1000 do of delay. Feedback within synth implementations incurs local dur = 1/i a single-sample delay. local f = 100 * math.random(10)

35 local synth = Sine{ freq = f } outs:play(synth, dur) end 4.5.2 Embedding CSound Audio synthesis specification is a complex domain with a long history. We considered it practical to re-use existing interfaces and frameworks if possible. CSound in particular has a long heritage and a huge collection of signal processing primitives (’opcodes’). Embedding CSound within LuaAV was a remarkably straightforward process, thanks to the design of the CSound API [5]. CSound in- stances can be created in a Lua script as LuaAV Figure 3: An audio-visual interactive can- synth objects using CSound’s host-implemented vas (mouse-paths converted to hyperbolic lines, audio option bound to the libluaav audio API. rendered with OpenGL and sonified as grain CSound synths can thus be connected with chirps). The script itself is being edited in other LuaAV synths. Typically CSound in- Gedit. stances in LuaAV have only minimal score specification, turning over the responsibility of the generation of score events to the Lua script. A more detailed description of our investiga- For example, the following code snippet plays tions can be found in [18]; here we will pro- an ascending harmonic scale on instrument 1 vide a summary for the reader’s convenience. from the orchestra defined in ”demo.csd”: Initial experiments constructing abstract syn- tax trees (ASTs) of complex expressions from require "csound" elementary nodes were very promising: expres- local cs = csound.create("demo.csd") sion trees have a natural corollary to data-flow play(outs, cs, 4) networks typical in signal processing, and also for h = 1, 16 do to the static single-assignment (SSA) form of cs:scoreevent( LLVM’s intermediate representation (IR). ’i’, -- event type Expression graphs however are limited be- 1, -- p1 (instrument) cause they have no internal state. Extending now(), -- p2 (start) our model to support stateful objects such 1, -- p3 (duration) as filters and variable oscillators called for 1.0, -- p4 (amplitude) the run-time generation of data-structures 100*h -- p5 (frequency) to maintain state across function calls, and ) associated routines to allocate and free memory wait(0.25) and connect to the LuaAV audio system ap- end propriately. We have successfully implemented such a model, and are continuing to pursue CSound instances can also be created from this line of development and hope that a user strings of valid CSound code, opening up inter- programming interface will stabilize soon. esting possibilities of code-generating CSound The performance of the JIT compiled code is orchestras and scores at run-time. close to that of a native static compiler (GCC). 4.5.3 Run-time generation of signal The time to JIT a simple Synth can fit within processing code the acceptable latency window between the Lua Our primary focus for audio synthesis in LuaAV thread and the audio thread. however is the generation and compilation of synthesis code from definitions specified at 4.6 Beyond Audio run-time, leveraging the the JIT capabilities We have concentrated on audio in this paper, of LLVM via luaclang. It is our view that but it is important to note that LuaAV has very runtime code-generation best serves the goals strong capabilities in the visual (2D and 3D) do- of extensibility and efficiency (and to a certain main (see Figure 3). A near-complete binding degree also portability [6]). of the OpenGL standard is included as a dy-

36 namically loadable Lua module, along with the as opposed to simply concatenating them Muro module which uses a generic Matrix data together. Currently in LuaAV it is possible to format to connect image, video files/cameras, mix Lua code and C code. As we develop the GPU textures, shaders and slabs, matrix data system further and add intermediate languages processing and analysis, 3D mesh drawing, and to generate new code between the paradigms, supporting utilities for vectors, quaternions and the boundary will only become more blurred. other common 3D tasks. LuaAV has MIDI and OSC built-in, and 6 Acknowledgements extension libraries for numerous devices and With thanks for the support of the AlloSphere systems. And of course, anything that can Research Group, University of California Santa be loaded in standard Lua can also be used Barbara. in LuaAV, such as existing SQL database bindings, networking code, Cairo 2D drawing, References PEG text parsing, and so on. [1] X. Amatriain, J. Kuchera-Morin, 5 Future directions T. Hollerer, and S. T. Pope, “The allosphere: Immersive multimedia for sci- It is notable that code generation appears in entific discovery and artistic exploration,” all three strategies to embed signal processing IEEE MultiMedia, vol. 16, pp. 64–75, 2009. within LuaAV: static generation of synthesis units within the library, the potential to gen- [2] A. Brown and A. Sorensen, “Dynamic me- erate CSound orchestras programmatically at dia arts programming in impromptu,” Pro- runtime, and to code-generate entire synthesis ceedings of the 6th ACM SIGCHI confer- routines to machine code using LLVM/Clang. ence on Creativity & . . . , Jan 2007. We believe it is a natural consequence of [3] N. Collins, A. Mclean, J. Rohrhuber, and increasingly dynamic and extensible program- A. Ward, “Live coding in laptop perfor- ming interfaces and environments, and which mance,” Organized Sound, vol. 8, no. 03, will continue to grow. pp. 321–330, 2003. Embedded scripting language bindings to efficient library code still carry a divide between [4] R. B. Dannenberg and R. Bencina, “De- static and dynamic code which remains im- sign patterns for real-time computer music mutable during runtime. The incorporation of systems,” ICMC 2005 Workshop on Real runtime JIT compilation (such as LLVM/Clang Time Systems Concepts for Computer Mu- in LuaAV) adds the capacity to dynamically sic, 2005. generate new bindings into the environment and the augmentation of existing bindings and [5] J. Ffitch, “On the design of csound5,” in binaries on the fly. Proceedings of the 3rd Linux Audio Devel- For example, computer vision video filters in opers Conference, ZKM, Karlsruhe, Ger- LuaAV are code generated by bringing together many, 2004. functionality from OpenCV and the Muro Ma- [6] M. S. O. Franz, “Code–generation on–the– trix specification to generate a new video filter fly: A key to portable software,” 1994. with full Lua bindings which is properly adap- tive to the heterogeneous nature of computer vi- [7] D. Griffiths, “Fluxus,” sion data. By compiling these filters at runtime, http://www.pawfal.org/Software/fluxus/, it’s possible to dynamically alter how filters mix 2007. and combine with further processing elements in [8] R. Ierusalimschy, L. H. de Figueiredo, and ways that would otherwise be preconceived and W. C. Filho, “Lua — an extensible exten- fixed. sion language,” Software Practice and Ex- As we have explored this area of software perience, vol. 26, no. 6, pp. 635–652, 1996. design, it has become apparent that the trend is toward heterogeneous computational environ- [9] C. Lattner and V. Adve, “The LLVM Com- ments that freely mix paradigms be they typed piler Framework and Infrastructure Tu- versus untyped, dynamically compiled versus torial,” in LCPC’04 Mini Workshop on statically compiled, and so on. What we are Compiler Research Infrastructures, West working to achieve is a continuum of paradigms Lafayette, Indiana, 2004.

37 [10] J. McCartney, “Rethinking the computer [23] G. Wang and P. Cook, “Chuck: A pro- music language: Supercollider,” Computer gramming language for on-the-fy, real-time Music Journal, vol. 26, no. 4, pp. 61–68, audio synthesis and multimedia,” in MUL- 2002. TIMEDIA ’04: Proceedings of the 12th annual ACM international conference on [11] M. Pall, “LuaJIT,” http://luajit.org/, Multimedia. New York, NY, USA: ACM, 2007. 2004, pp. 812–815. [12] Paul Davis, “Jack — connecting a world of audio,” http://www.jackaudio.org/, 2010. [13] M. Peljhan, “Common data processing and display unit-tokyo system proto- type,” http://www.ntticc.or.jp/ Exhibi- tion/2009/Openspace2009/Works/ com- mondataprocessinganddisplayunit.html, 2009. [14] L. Putnam, “Gamma - generic synthesis c++ library,” http://mat.ucsb.edu/gamma/, 2009. [15] W. Smith, “Abelian: A visual and spa- tial platform for computational audiovisual performance,” Master’s thesis, University of California Santa Barbara, 2007. [16] W. Smith and G. Wakefield, “Synec- doche,” http://www.mat.ucsb.edu/ wh- smith/Synecdoche/, 2007. [17] ——, “Computational audiovisual compo- sition using lua,” Communications in Com- puter and Information Science, vol. 7, pp. 213–228, 2008. [18] ——, “Augmenting computer music with just-in-time compilation,” Proceedings of the International Computer Music Confer- ence, 2009. [19] A. Sorensen and A. Brown, “Aa-cell in practice: an approach to musical live cod- ing,” in Proceedings of the 2007 Interna- tional Computer Music Conference, 2007. [20] G. Wakefield, “Vessel: A platform for computer music composition, interleaving sample-accurate synthesis and control,” Master’s thesis, University of California Santa Barbara, 2007. [21] G. Wakefield and H. Ji, “Artificial nature: Immersive world making,” in EvoWork- shops, 2009, pp. 597–602. [22] G. Wakefield and W. Smith, “Using lua for audiovisual composition,” in Proceed- ings of the 2007 International Computer Music Conference. International Com- puter Music Association, 2007.

38 The WFS system at La Casa del Suono, Parma

Fons Adriaensen Casa della Musica Pzle. San Francesco 43000 Parma (PR), Italy, [email protected]

Abstract At the start of 2009 a 189-channel Wave Field Syn- thesis system was installed at the Casa del Suono in Parma, Italy. All audio and control processing re- quired to run the system is performed by four Linux machines. The software used is a mix of standard Linux audio applications and some new ones devel- oped specially for this installation. This paper dis- cusses the main technical issues involved, including sections on the audio hardware, the digital signal processing algorithms, and the software used to con- trol and manage the system. Keywords Wave field synthesis, spatial audio, distributed audio systems 1 Introduction La Casa del Suono in Parma, Italy, is one of the cultural entities managed by the city of Parma, located in a small restored church and open to the general public. It is in the first place a mu- seum dedicated to the history of audio technol- ogy, showing a collection of vintage audio equip- ment. It also aims to provide to its visitors a view on current developments in this area. Figure 1: La Sala Bianca One of the installations designed for that pur- pose is the Wave Field Synthesis system in- stalled in the Sala Bianca, shown in fig. 1. This is a room of around 7.5 by 4.5m meters frequency range is 50Hz to 20kHz, and sound and 4.5m high. When the doors (seen in the quality is remarkably good for a design of this back) are closed, there is a continuous ring of size. Except for the space taken by the speak- 189 small speakers running along the complete ers the walls are completely covered by sound inner circumference of the room, one every 12cm absorbing material, leading to reasonably good (with some small exceptions due to construc- ’dead’ acoustics. tional constraints). The speakers are hidden As a project the Sala Bianca is the result of behind a white tissue covering all the walls, a collaboration between the city of Parma and and are constructed in ’blocks’ of 10, 15, or the Engineering Department of the University 17 units. They are a bass-reflex design consist- of Parma. The university in turn contracted ing of a small four inch bass and midrange unit the author to specify the audio hardware and (from Ciare) and a tweeter, driven by a pas- design the complete software. sive crossover network. The height of the ring The system is intended to be used both as a is a compromise between typical ear height for public demonstration of WFS technology, and a seated and a standing audience. The usable as a scientific research instrument. This has re-

39 sulted in some quite peculiar and contradictory system, means virtual sources that appear to requirements. In the first role the system is op- be inside the room. erated by the museum staff, and it has to run Each virtual source is in principle repro- fully automatically every day and without any duced by all speakers (in practice about half of technical support. As a scientific instrument on them). For each pair of primary and secondary the other hand it has to be reconfigurable with- sources different gain factors, delays and filter- out any artificial limits, allowing the researchers ing are required. So the complexity of the DSP to e.g. easily replace part of the software with software is proportional to N primary sources × experimental versions, or use it as a listening Nsecondary sources room for psycho-acoustic experiments using en- The system installed in the Sala Bianca is de- tirely different rendering algorithms. After the signed to create up to 48 moving virtual sources system was constructed a third way of using it in real time. Movement of the primary sources was added as the direction of La Casa della Mu- is continuous - it is not implemented by cross- sica (who are managing the installation) also fading between fixed points but by adjusting the expressed their desire to use the Sala Bianca as synthesis parameters at the full sample rate. an instrument for electro-acoustic music. This means that also the Doppler effect of a This is of course not the first large-scale WFS moving source is reproduced faithfully. installation using Linux. A much larger one was constructed in 2007 at the Technical Univer- 3 The audio system sity of Berlin, see [Baalman et al., 2007]. The two systems are however quite different both in Figure 2 shows the structure of the audio sys- the hardware and software solutions that were tem. There are four PCs, all of them from the adopted. Siemens/Fujitsu Celsius range of workstations. This choice was influenced largely by the IT de- 2 A short introduction to WFS partment of the city of Parma which imposed its requirements regarding suppliers, warranty con- Wave Field Synthesis operates by reconstruct- ditions etc., but it turned out to be a very good ing the wavefront that would be generated by a one. At the time of writing all these machines real sound source, called the primary source, by are running Fedora 8 (installed two years ago), a large number of real secondary sources. The but they will all be converted to ArchLinux in method used is based on the Huygens principle, the coming weeks. This distro makes it easy to and formally expressed by the Kirchoff integral use a much leaner system (e.g. without a fat [Verheyen, 1998]. It only works up to a fre- desktop such as Gnome or KDE). quency determined by the distance between the secondary sources, which should be less than All machines are fitted with an RME HDSP- half the wavelenght. Above this frequency, spa- MADI interface providing 64 channels of digital tial aliasing will lead to false images of the pri- audio input and output. RME ADI648 units are mary source being generated as well as the cor- used to convert between MADI and ADAT, the rect one. For this reason most WFS systems latter being the format required by all AD/DA use linear arrays of closely spaced secondary units used in the system. All machines are on a sources, as filling a plane would increase the re- local gigabit LAN, and for audio they are con- quired number above practical limits. Reducing nected to each other using an RME MADI ma- a WFS system to 2-D operation has some con- trix. The matrix has actually 8 inputs and 8 sequences: it complicates the signal processing, outputs, the remaining ones being used to con- and it also leads to an approximate solution that nect to the ’Lampadario Acustico’, another au- however still works very well in practice. Marije dio system installed at the CdS, and to pro- Baalman’s recently published book [Baalman, vide connections for external PCs of researchers 2008] provides a good overview of the state of and musicians. All units share the same sample WFS technology today, including some features clock, transmitted either by the MADI links or not covered by the system discussed in this pa- by explicit word clock connections. per. The wfsmaster machine coordinates the op- A WFS system can generate virtual sound eration of the the WFS system, and provides all sources either behind or in front of the sec- the access points to external applications using ondary sources. The latter, in a complete sur- it. It is also the only machine having a human round setup as the Sala Bianca or the Berlin interface.

40 h edrn ahns u r o sdduring used operation. not on normal are running but machines, algorithms rendering processing the signal operation the to correct the proven of verifying have for to useful They very by used be generated machines. be signals rendering can the the master measure and the monitor to back machines 4.1. section in discussed pro- is the in This account into code. taken cessing be to synchronised, boundaries has not this course period and of the are machine pe- same, each Jack the on the are if even sizes But at riod outputs time. 192 same all the mas- out- on exactly the all appear by would to provided machine signal input ter audio one period an copy Jack then just puts same would the and ren- use size the all if machines - three dering system-wide all synchronised ma- of is that master ensures audio inputs the hardware The the the machines. of to rendering outputs connected 64 are the chine 2: fig in is UPS. system large entire a a The in by powered racks room. large technical two convertors located separate DA are to all amplifiers connected and and the ADI648 machines to three rendering adjacent The the room lo- control Bianca. is small Sala machines a rendering in the cated of units ADI648 con- ’headless’ remotely fully run are They and trolled. 3, respec- 189. runlevel outputs of speaker total in 64 a and for 61, tively, 64, of care h AIcnetosfo h rendering the from connections MADI The shown as are connections MADI normal The the except 2, fig in shown equipment the All three The AsADCs DACs PC wfsmaster wfsrender ADI648 ADAT MADI ahnsec take each machines Matrix MADI iue2 h ui system audio The 2: Figure 41 h aec ftesse ya unpredictable an increased by have system the would of signals latency audio the for tions sev- for rejected was reasons. solution eral this end the dedicated considered, in the was a but between machines of rendering audio and of use transfer master (end the to designed network 2008), was gigabit early system and the 2007 time the At for connections network Using 3.1 PC wfsren3 PC wfsren2 PC wfsren1 ntebs aeteueo ewr connec- network of use the case best the In • • • rx n hti culyoeo h less the of one system. the actually in ma- is units MADI expensive that the and be the would trix, So saved a item provide point). only to access the audio has for multichannel (which those be machine would master would as renderers required, cards the be MADI still for The units ADI648 hardware. and savings audio significant the audio permit in the not would for links connections network Using sys- real-time the a tem. of in drivers performance adaptor serious network the some regarding were data there doubts of transferred, ethernet amount be gigabit the to for a sufficient of be would capacity the While equal. be to au- master adjusted and measured tomatically the easily to be could back delays the renderers the from section 4.1 in the described for signal paths synchronisation feedback problem. audio minor providing a By actually synchro- was ensure remain This would to nised. signals solution audio some that require would It audio

ADI648 ADI648 ADI648 8 8 8

DACs, amplifiers and speakers wfsmonitor Levels and status display and control

Control

Timestamped positions wfsmaster Positions wfsrender synchronisation Audio timecode signal processing

Audio

Figure 3: WFS processing structure amount. The risk of doing this was deemed to a single speaker, or to send signals directly to be too high at the time. Today the picture could a speaker for testing and alignment. There can look different, but at least the third point men- be any number of instances of wfsmonitor - it tioned above remains valid - there would not be can be run e.g. on portable PC inside the Sala any significant savings. Bianca, but the system will also run perfectly without it. 4 The WFS processing architecture 4.1 Synchronisation issues Figure 3 shows the basic WFS processing sys- tem which consists of 3 applications which com- As mentioned already in section 3 the basic municate with each other using network mes- audio architecture ensures that all outputs of sages. All of these command and monitoring the rendering machines will be exactly synchro- messages use the OSC format. nised. If the system would provide only static The main function of the wfsmaster is to virtual sources that would be all that is re- provide a single access point for controlling the quired. But to support moving sources also the WFS rendering system. Apart from generating application of the position parameters must be the synchronisation signal (discussed in section synchronised to sample accuracy on all outputs. 4.1), this program does not perform any audio With its standard configuration wfsmaster processing. It receives commands from exter- will update and transmit all the source positions nal programs specifying the position and move- every 1024 frames. This is synchronised to (but ments of each virtual source (more on this in independent of) the Jack period on the master section 5), and transforms these in the format machine. To allow the renderers to apply this required by the rendering engines. information to the correct audio samples a sep- One instance of wfsrender runs on each of arate synchronisation signal is generated by the the rendering machines. It performs all the sig- master, and transmitted to the renderers follow- nal processing required for the set of speakers ing the same path as the source audio signals. It controlled by its host. Its inputs are the audio is similar to the one used by jack delay to mea- signals - one for each virtual source - from the sure latency, and consists of a mix of six sine MADI interface, and the processing parameters tones with frequencies that are chosen such that transmitted by the wfsmaster. The details of their phases encode the current position in a se- the DSP algorithms used are discussed in sec- quence that repeats exactly every 220 samples tion 6. It also transmits monitoring messages (around 22s). So every sample has an implicit containing e.g. the levels of all its output sig- timestamp in the range 0 ... 220 1 that can − nals along with its internal state and any errors. be recovered easily by any application receiving The wfsmonitor is the only program having the signal. a graphical user interface. It shows the current The timestamp corresponding to the first position of all virtual sources, the levels of all frame of the current position update period is 189 speaker outputs, and error and status info included in the messages transmitted by the from the two others. It can also be used to ’solo’ master program. A fixed offset (currently 800

42 dummy Jack connection wfsmixer

LADSPA plugin shared Python memory OSC to renderers wfsmaster Supercollider OSC/UDP

Csound

Puredata Audio Audio to renderers

Ardour

Figure 4: External application interfaces frames) is added to allow for the worst case ex- encoding and decoding, the synchronisation sig- pected network latency. The rendering applica- nal, status reporting etc. In principle any C++ tions use the timestamp and the decoded syn- programmer who understands the required al- chronisation signal to re-align audio and data. gorithms could create the plugins without being They also check the arrival time and integrity distracted by the system level aspects. of the position messages and the presence of a Another feature added recently is multi- valid audio time code, and report these in their threaded audio processing in order to use SMP monitoring messages. An occasional ’late’ po- systems more efficiently (the machines used are sition message is reported but ignored, as is a dual-core). This divides the DSP work over a short interruption of the time code. Consistent configurable number of threads, each of them errors will make the renderers mute their output handling a subset of the output channels. This until the situation returns to normal. multithreading is transparent to the plugins. 4.2 The structure of wfsrender 5 External application interfaces The wfsrender application is the most compli- The system described so far just implements cated one in the system. It has two basic func- the basic WFS algorithm for up to 48 moving tions: compute the internal WFS processing sources. To create content, spatialisation, room parameters in function of source position and simulation etc. it depends on external applica- movement, and perform the actual DSP work. tions supplying both the audio and source po- In order to make this application more easily sition and/or movement information. Figure 4 adaptable it uses two plugins to perform most shows some possibilities. of this work. Both are loaded at runtime (as Audio signals can originate or be processed specified by a global configuration file), and can on the master machine which has almost no be replaced without recompiling the whole ap- CPU load due the WFS system itself. External plication. sources can be connected via MADI, ADAT, or The layout plugin defines the complete ge- analog inputs. In all cases the signals to be used ometry of the installation. It provides meth- as WFS sources are just sent to the first 48 out- ods to e.g. find out the exact position of each puts of the MADI interface. Time code and test speaker, to which channel on which host it is signals typically use channels 49-56, while the connected, etc. It also performs some specific last eight channels are normally used for con- calculations, such as determining if a given x, y trol room monitoring or for providing external coordinate is internal or external and finding its analog signals (e.g. for driving subwoofers). distance to the line of speakers. The wfsmaster program has two interfaces for The engine plugin performs the computation source position data. of all required internal parameters and the ac- The shared memory interface can be used tual DSP work, i.e. it defines the WFS algo- locally only, but permits sample-accurate con- rithm used. trol if the position data is generated by a Jack The interfaces to these plugins are defined as client. A dummy Jack connection to the mas- abstract C++ classes. The ’host’ program takes ter program will ensure correct order of execu- care of the audio and network interfaces, OSC tion in that case. This interface is also used by

43 delay (t)

+3db/oct input delay line

gain1 (t) to other speakers

other sources delay line -6db/oct

gain2 (t) out to speaker to other speakers

Figure 5: The basic DSP operation some LADSPA plugins that allow source posi- for all outputs, as are the delay lines them- tions and movements to be encoded as automa- selves. The three parameters are computed tion data in . The wfsmixer application from the source position and the system geom- provides a graphical tool to control source posi- etry. The wfsrender code performs this com- tions. It also allows basic room simulation using putation at every position update (i.e. every Ambisonic convolution reverbs. 1024 frames) and interpolates them linearly in The OSC/UDP interface can be used from between. Some optimisations are applied in the all the standard synthesis programs, from actual code for cases where one or more of the Python (scripts or interactive), or other lan- three parameters remain fixed. guages and applications, and from any PC on Since the system allows for moving sources the local network. The position commands al- and uses the the actual delays corresponding to low a source to move at a controlled speed, with the postion of a primary source (which means the master program interpolating the movement it will reproduce the Doppler effect exactly), it if it takes longer than one update period. can’t handle the case of plane waves, as these Both interfaces also include a per-source gain would correspond to infinite distance and delay. factor which is applied at the end of the DSP It would be easy to add these as a special case, processing. This is necessary to still allow full but so far there has been no need to do this. level for a source positioned at a large distance, 6.2 Handling sources near the speakers for example when simulating virtual speaker The equation cited above is an approximation sets for Ambisonics. that is valid only for primary sources that are not too close to the line of speakers. This is be- 6 The DSP algorithms cause its derivation assumes a continuous dis- The processing required for WFS is not really tribution of the secondary sources. Since the very complex, but it has to be repeated N system supports arbitraty movements and vir- in × Nout times, or around half that number for a tual sources can cross the line of speakers with- ’surround’ layout such as the Sala Bianca. out restriction these cases have to be handled separately. 6.1 The basic real-time processing The exact solution becomes quite complex, The form used in the wfsrender application is and it can’t be derived as a limit case of the based on equation (29) in Sascha Spor’s very standard equation. To find it one has to go back practical summary of WFS theory [Spors et al., to the full three-dimensional case, apply the sec- 2008], which shows the driving function for a ondary source quantisation there, and then re- spherical wave reproduced by a linear array. duce the result to a linear array. This translates in to the diagram shown in fig. The wfsrender code uses a pragmatic ap- 5. For each single source, and for each speaker proach to handle this situation. If a primary reproducing that source, the two filtered signals source comes too close to the secondary ones, are delayed by a time delay and then combined two parameter sets are computed: one for a using gain factors gain1 and gain2. Using two source at a fixed distance (for which the nor- delay lines allows the second filter to be shared mal equation is still valid) behind the speakers,

44 and one for the same distance in front. The 8 Acknowledgements two parameter sets are then combined accord- I’d like to thank Prof. Angelo. Farina (Uni- ing to the actual source position. The result versity of Parma) and the direction of La Casa is an approximation of the exact solution, but della Musica for having provided the opportu- works well in practice. nity to create the system described in this pa- per. A warm thanks also to Stefano Cantadori 7 System organisation and use and to all my former colleagues at Audio Link All source code, scripts, data files etc. required and AIDA. to install the system are kept on an NFS share The realisation of this project would not have available on all machines. A single click on a been possible without the work of all the devel- desktop menu will install or update all neces- opers who have created Linux and its audio sys- sary components system-wide. Compilation is tem. An uncountable number of people on the always on the target machines, i.e. everything Linux Audio mailing lists have provided valu- gets installed from source. able and often essential help and hints. My sin- Binaries, plugins and most scripts are kept in cere thanks go to all of them. the standard places under /usr/local. The only exceptions would be experimental versions that References would be installed per-user. Marije Baalman, Torben Hohn, Simon The basic setup of wfsmaster, wfsrender and Schampijer, and Thilo Koch. 2007. Renewed wfsmonitor is defined in a global configuration architecture of the swonder software for wave file. ’Technical’ users would normally just run field synthesis on large scale systems. In Pro- wfsmonitor which then permits to run and con- ceedings of the 6th Linux Audio Convention, trol the others without having to use remote lo- Berlin, Germany. gins, set up Jack servers, etc. They would then Marije Baalman. 2008. On Wave Field Syn- add anything else they require on the master thesis and Electro-acoustic Music. VDM Ver- machine manually. Alternatively the Python lag Dr. Mueller AG, Saarbruecken, Germany. components described below provide an easy way to create sessions that need to be recreated Sacha Spors, Rudolf Rabenstein, and Jens automatically many times, and have in fact be- Ahrens. 2008. The theory of wave field syn- come the preferred way to use the system. thesis revisited. In Proceedings of the 124th AES Convention, Amsterdam, The Nether- 7.1 Automatic configuration lands. Audio Engineering Society. For the ’museum’ user the situation is quite dif- Edwin Verheyen. 1998. Sound Reproduction ferent. A single click on a desktop icon launches by Wave Field Synthesis. Technische Univer- the complete system under the control of a siteit Delft, Delft, The Netherlands. Doctoral Python program that configures everything and thesis. that acts as a server to a remote control appli- cation. The remote control looks like a typi- cal desktop audio player, with stop/start/loop buttons, volume control, a playlist etc. It runs locally on the master machine and also on one or more ’EEE’ notebooks with wireless network access, used by the museum staff. Selecting an item from the playlist reconfigures the complete system according to the requirements for that item. Apart from the components that implement the remote control server and the playlist, the Python program has classes for all required au- dio applications, each new instance of them be- coming a separate Jack client. These include py jackctl, py jplayer, py ambdec, py jconvolver and some others. More will be added in the future.

45 46 General­purpose Ambisonic playback systems for electroacoustic concerts ­ a practical approach

LAC 2010, Utrecht

Jörn NETTINGSMEIER Freelance audio engineer and qualified event technician Lortzingstr. 11 Essen, Germany, 45128 [email protected]

Abstract Introduction Concerts of electro-acoustic music regularly This paper describes a practical Ambisonic feature works written for various speaker concert system, as well as a setup procedure layouts. Except in the most luxurious of suitable for live situations. The system consists of circumstances, this implies compromises in a Linux­based rendering machine, and otherwise placement, frequent interruptions of the standard sound reinforcement equipment that concert experience to relocate speakers, and/or should be available at electro­acoustic music error-prone equipment rewiring or facilities or is readily obtained from rental reconfiguration during the concert. companies. To overcome this, an Ambisonic higher-order All software parts of the production toolchain playback system can be used to create virtual are free and open­source. Hence, they do not incur speakers at any position as mandated by the licensing costs, are reasonably easy to port to and compositions. As a bonus, the performer can interface with existing Mac OS X, Solaris and then be given realtime control of the source (with certain limitations) positions in addition to their levels, increasing environments [Sle2010], and are easily customized the creative freedom of live sound diffusion. to accommodate special requirements that Deployments at LAC 2009 and the 2009 frequently arise in any experimental arts context. DEGEM concert at musikFabrik in Cologne have yielded very good results and were met The sound engineer will appreciate the ease of with general approval. Additionally, two creating different speaker layouts or trying small informal listening tests with a group of film layout variations to better convey the composer's sound artists and electro-acoustic composers intentions in a given venue, and the ability to were conducted to gather expert opinions on compensate for non­optimal speaker positioning advantages and shortcomings of the proposed due to space constraints, escape routes and other virtual speaker system. venue safety requirements. This paper was originally published at the Concert curators will like the added flexibility in Ambisonics Symposium 2010, hosted by accommodating non­standard spatial IRCAM in Paris, France. configurations (whose variety grows even more quickly when height reproduction is desired) and Keywords the option of commissioning or accepting native Ambisonics, surround sound, live sound Ambisonic compositions without extra cost or reeinforcement, electro­acoustic music, tape error­prone pre­rendering attempts by composers. music With changeover times and distracting stage work minimized, it becomes much easier to create a coherent and focused concert experience.

47 Overview A minimal Ambisonic live playback rig consists of a multichannel playback source with appropriate encoders, a decoder, digital­to­analogue converters and a number of speakers and amplification. For pre­produced tape works, the source, the encoders (i.e. the panners) and the decoder will usually run on a single PC. If the work features live electronics or sound is rendered in real time on a machine provided by the artist, it will be necessary to accommodate external audio sources. By providing analog line­ins as well as multichannel digital inputs, all situations should be covered. While the decoding machine can drive the converters and amps directly, it is highly desirable to have a physical master volume control outside the computer: for convenience, and as an emergency breaker in case something goes horribly wrong (as happens with complex digital systems from time to time). A digital mixer with digital I/O is the obvious choice. It also handles analog signals from an announcer's microphone or traditional instruments, if necessary. For flexibility, the mixer's inputs should be routable through the encoding machine and back into the mixer, so that external signals fed into the mixer can benefit from Ambisonic panning as well. In such a setup, it is important to consider 1 (and minimize) the system latency . Illustration 1: Signal flow overview of an Ambisonic 1 External signal transmission concert playback system, using a digital mixer. Mixer output faders used for speaker gain compensation. Currently, there are two suitable multichannel digital interface standards that are well­supported drivers. Most live mixing desks support ADAT by free software. natively or through optional interface cards. The channel count is usually limited by the number of connections provide eight ADAT lightpipe available extension bays. 32 inputs and outputs channels of 24 bit audio at 48 kHz over a cheap (i.e. four ADAT pairs) are a common upper bound, optical link using polymer fibre with plastic toslink which is suitable for Ambisonics up to fourth order connectors. Their maximum transmission distance plus room for subwoofers and auxiliary feeds. is specified as 10 metres; under optimum conditions, 20 m are possible. ADAT has become MADI (Multichannel Audio Digital Interface) a commodity standard that is available on many connections carry 56 channels of 24bit audio at consumer and professional computer interfaces, 48kHz (or 64 if the optional varispeed capability is many of which have production­quality Linux not needed) over either coaxial 75 ohm cables terminating in BNC connectors (which are cheap 1The ffado.org website provides a good starting point and reliable), or optical fibre using ST plugs for latency tuning. The focus is on firewire devices, but (slightly more expensive, but an ace in the hole most of the tricks are applicable to all audio systems: over long distances where hum currents might be http://subversion.ffado.org/wiki/LatencyTuning

48 an issue). Default MADI sockets are rare on All incoming signals are patched into Ardour mixing desks, but most vendors offer optional buses (or tracks, if recording capability is expansion cards. necessary), tape pieces are imported as single­ On the computer side, as of this writing, the channel audio regions into mono Ardour tracks, choice is currently limited to the RME MADIface and a master bus suitable for the desired target (available in PCIe and ExpressCard flavours), for format is created. In the case of 3rd­order which complete and reliable Linux support is periphonic playback, it is 16 channels wide. available. Note that Ardour itself is agnostic to surround formats – it is only the panner that cares about (and ADAT­to­MADI bridges and vice versa are defines) the meaning of each channel in the master available from several vendors for more complex bus. setups. The default panners in all tracks and buses must Both ADAT and MADI can accommodate be bypassed. Instead, an Ambisonic panning double or quad sample rates by demultiplexing the plugin is inserted as the last post­fader plugin of signals onto two or four physical channels, at the the channel strip. Panners up to third order are cost of reducing the number of transmission freely available as part of the AMB plugin set channels accordingly. It should be noted that the [Adr10]. An extension to higher orders is trivial. only tangible benefit of higher sample rates in a For tape pieces, the panners can be set in live situation is a slightly reduced latency (since advance to produce the required virtual speaker most devices have a constant number of samples of positions, either statically or with automation, if in­ buffering, regardless of the rate), but at a flight adjustments are required. significant cost in bandwidth, CPU and storage. If necessary, the performer can be given real 2 Software components and internal signal time control of rotation, azimuths and elevations via MIDI controllers. The mapping of parameters flow in Ardour is as simple as a shift+control middle­ The decoding PC should be equipped with at click on a fader widget to initiate MIDI learn least two ADAT I/O connectors, i.e. 16 channels in mode, followed by a quick wiggle of the desired each direction. Tape pieces can be played back hardware controller, which is now bound to the from the PC itself, and live electronics, widget. microphones or instrument signals will be coming Optionally, an Ambisonic rotator can be inserted in via the ADAT inputs. into the master bus to correct for angular offsets of Inside the PC, the audio signals are handled by the rig (if, for example, a two­in­front octagonal the JACK Audio Connection Kit [Dav10], a low­ preset is being used for an octagon with a centre latency sound server that allows the user to route speaker), or as an additional control parameter for audio between different applications and have creative live sound diffusion. them co­operate in real time. From Ardour's master bus, the signal is then fed In the use case at hand, these are: to an appropriate JACK­aware decoding software. • a playback engine for tape pieces, The author recommends AmbDec [Adr10­2], a • a virtual mixer/signal router for manipulation state­of­the­art decoder capable of up to 3rd­order and to host the panning plugins, and horizontal and first­order vertical playback. It • an Ambisonic decoder to generate the speaker supports dual­band decoding and optional but feeds. highly recommended near­field compensation. After decoding, the signals can be patched to the This author prefers Ardour [Dav10­2] (a very ADAT outs, or optionally run through a convolver flexible, free digital audio workstation) as the for additional FIR filtering, and then back into the signal handling hub inside the PC, since it can physical mixing desk, which in turn feeds the double as a playback engine for complex speakers. multichannel pieces and offers parameter automation. If only routing and panning are desired, a more lightweight mixer application such as the Non­Mixer [Lil10] could be considered.

49 Equipment requirements and setup Generally, the angles of incidence should be kept as dictated by the preset. The speaker As noted above, a silent PC with sufficient distances can then be varied within reasonable ADAT I/O is essential. Multi­core CPUs are limits to accomodate the venue, as they only convenient to ensure a responsive user interface require delay and gain compensation without when audio computation load is high, but they are affecting the decoding coefficients as such. not mandatory. Anything like a 1.0 GHz Pentium Extreme distance differences should be avoided, or better should be more than adequate for simple because the maximum sound level will be bounded playback and decoding. by what the farthest speaker can safely deliver at Budget permitting, a digital mixer is highly the sweet spot. Moreover, distant speakers will recommendable. It should have as many analogue have more room reverb at the listening position, outs as there are speakers to drive. which will reduce the sharpness of sources in that Next, decide on the make and number of direction, create uneven tone color across the rig speakers. In Ambisonic systems, all speakers will and might cause closer speakers to dominate the contribute to any one virtual source to some extent, directional perception. giving you a slightly higher SPL efficiency over In practice, the usual starting point will be a the entire listening area. On the other hand, circle, which can be elongated to an ellipse or even electronic music does require rather more made into a rectangle if necessary. headroom than your average mastered pop 2 Speaker positioning material. For a small audience (50­80 people), 8 or more large active near­field monitors might be The first step is to mark a point of reference on adequate in a medium­sized room if supported by the floor in the centre of the listening area. This subwoofers, but for anything larger, P.A.­grade will be the "sweet spot" to which the setup is being material should be used to avoid sonic calibrated. compromises and burnt drivers. All speakers and Now set up the speakers at the correct angles amplification should be identical, to ensure (minor deviations are acceptable). A laser angle consistent phase response. gauge will speed up this process considerably ­ Six speakers will give you second­order adequate specimens can sometimes be obtained horizontal coverage with a usable listening area of cheaply at your local do­it­yourself store. perhaps one­third the radius of the speaker circle. 3 Distance measurement Third­order operation is recommended for an extended listening area of one­half to two thirds Mount a laser range finder at ear height on a radius, with improved source sharpness that is tripod over the point of reference. Then measure closer to (but not quite on par with) discrete the distance to each of your speakers precisely (to speaker panning. The minimum number of within 1­3 centimetres). Reproduction in the sweet speakers for 3rd­order horizontal is eight. spot breaks down as the error approaches λ /2 (around 6 kHz at 3 cm). Outside the strict sweet 1 Selection of a suitable base layout spot and in a diffuse field, the distinction is largely Assuming that the reader does not have the academic, but it is suggested that this precision be means of computing her/his own decoding maintained to get predictable results. coefficients (much like this author), it is highly Aim at the tweeter or some other easily advisable to select a speaker layout that can be identifiable flat surface, but avoid shooting at the derived from one of the presets shipped with front cloth or mesh grilles, as these may introduce AmbDec. As of this writing, higher­order capable errors depending on whether your beam lands on presets include a number of regular polygons, and the surface or goes through the material. If there is regular polyhedra as well as stacked rings for full a bass port behind your measuring spot, the error 3D reproduction2. may be substantial. Similarly, avoid cheap range

2For special setups such as hemispheres, the author matrix from a friendly Ambisonics research facility near has found that people who are researching decoders are you if you ask nicely. The AmbDec author has also generally very interested in getting field testing and user made a standing offer to try his skills on any interesting feedback. Which means you might get a custom-tailored speaker layout you might care to throw at him.

50 finders that do the actual measurement with reference curve as closely as possible. We are only ultrasound, since you can never be sure what you interested in overall loudness, so ignore any minor are measuring. To mislead users, those cheap units deviations in frequency response. usually feature a laser as well, but it is only used A crude level calibration can also be for aiming, and the actual measurement beam is accomplished with a simple metre and a 1 kHz test much wider and more diffuse. tone, but then you are susceptible to large adjustment errors if something funny happens to be 4 Distance compensation and level going on in that narrow test band, like destructive adjustment interference with a floor reflection. Now edit the AmbDec preset file you have based 5 Aside: speaker equalisation your speaker setup on. Leave the angle values alone (they are only there for clarity ­ changing You should avoid 30­band EQs per output to them will not automagically adjust the compensate for acoustic deficiencies of room and coefficients), but do enter your measurement speakers. The phase deviations introduced by results into the speaker distance column. These filter banks at different settings would severely values govern the distance correction functions, degrade the recreated sound field. which you must activate in the setup dialog: be Hence, the best approach is to either do without sure to tick both "delay compensation" and "near equalisation altogether, or to ensure matching field compensation". You can also activate gain phase responses in all channels. It is a good (and compensation based on distance, but that assumes time­saving) compromise to use the same EQ that all your speakers are precisely level­matched curve (and algorithm!) for all speakers, which is to begin with. As this is usually not the case, leave easily achieved by ganging the EQs on a physical that option off and proceed as follows: (digital) mixer, or by inserting a mono EQ plugin On your point of reference, set up a good SPL into Ardour's master bus, which will be replicated metre (pointing straight upwards, to avoid automatically for all channels. In theory, it is measurement errors due to shadowing effects, at possible to use custom phase­matched FIR filters least in the horizontal plane) and calibrate all for each speaker3. However, the benefit will be speakers to within +/­ 0.1 dB using pink noise. The marginal in most cases, at the cost of additional calibration gains can be applied in the external latency and a setup time and effort that is mixer or, if available, in the software mixer of your prohibitive in all but permanent installations. audio interface. If neither of these options is 6 Adding optional subwoofers available, use an additional Ardour bus for each speaker send. Depending on the number of woofers available, As an alternative calibration method, use an different optimised driving signals can be used. omni­directional measurement microphone. A In the case of just one woofer, the obvious cheap one, like the Behringer ECM 8000, will do choice is to feed it the W channel and to place the fine. Again, point it straight upward or downward. speaker it in the centre front (where any “residual” Install the JAPA realtime analyzer [Adr10­3] on localisation cues will be the least distracting). If your rendering machine and feed the microphone two are available, they can either be placed centre signal into it. Conveniently, JAPA provides a pink front and back, or left and right, and be driven with noise test signal, which you can now route to each W +/­ X, or W +/­ Y, respectively. In the ideal case speaker in turn, using AmbDec's external test of four woofers, a standard first­order square signal input. Use the most distant speaker as your decoder can be used. If eight units (and safe reference, bring it up to the desired level, and make rigging anchors) are available, place them in a sure it is not clipping or limiting. Make a snapshot cube and drive them in first­order periphonic. of the steady­state curve in noise mode (i.e. with Higher orders have no advantages at low slowest response time) with the “­> X” button once frequencies. the system has stabilized. Keep it on the display by selecting the “X” curve in the memory section. Now iterate over your speakers, adjusting each 3Another free software package is available for this gain until the resulting measurements match the job: DRC-FIR [Sbr09]. [Net08] shows how DRC can be applied to an Ambisonic listening rig.

51 The user should experiment with different ratios quick A/B check, temporarily disable delay of W to the directional components. More W will compensation in AmbDec to hear the effect. increase the efficiency and overall “boom” that the Next, experiment with the crossover frequency system can deliver. If emphasis is placed on the between the LF and Mid/HF decoder5. Start with directional components, a nice dry trouser­flapping the default (which is always very conservative), effect can be observed, without the obnoxious and move it upwards a hundred hertz or so, to see pressure on the ears that is often associated with if localisation clarity improves or degrades. deep bass in enclosed spaces. Obviously, the latter If you have the chance, try and compare a option reduces the maximum sound pressure that discrete stereo signal sent to two speakers with the can be delivered. same signal panned in Ambisonics, and listen for To derive the signals, you have two options: coloration. Likely, you will observe a slight either run two instances of AmbDec in parallel, damping in the treble range, which can be one for the tops and one for the subs, or create a corrected with one or two dB boost at 5­6 kHz custom configuration that includes both. The latter with a bandwidth of two octaves. is generally more convenient to use, while the former is less work to set up. Sometimes, venue constraints such as escape Subwoofers should be aligned and calibrated routes may force you to move a speaker away from using the procedure described for the tops. With its optimum angle. Sources will now be drawn to respect to equalisation, the same rules apply. For where two speakers are lumped together, and a band separation, use low­ and highpass plugins4. “hole” will open up on the other side. In this case, try to raise the level of the lone speaker gently, and TESTING AND FINE­TUNING bring down the level of the other two speakers accordingly. Test the effect with a pink noise After the system has been set up and calibrated, signal panned across the problematic area, and aim it is important to check for defects with well­ for constant loudness rather than smooth known program material. movement. As a first step, a mono signal should be routed to each adjacent pair of speakers, one after the other, Listening Tests to check for polarity errors. If the polarity is correct, a stable phantom image should appear in Composers performing their works on a 12­ the middle of the speaker pair under test. speaker 3rd­order horizontal Ambisonic system Next, pan the same signal slowly across the have reported a very pleasing sense of space and entire rig and ensure there are no major changes in envelopment, free from timbral defects or obvious loudness. imaging deficiencies even at the peripheral areas of After any problems have been corrected, send an the auditorium. However, it should be noted that ambisonic test signal with sustained HF content due to time and equipment limitations, no A/B over the system. Depending on the room acoustics, comparison with traditional discrete loudspeaker you will experience irritating phasing effects when replay could be offered to them. moving around near the centre spot. They will be To corroborate the author's hypothesis that quite pronounced in very dry spaces and virtual Ambisonic sources are indeed a workable inconspicuous in reverberant rooms. Decide approach to multichannel playback, two informal whether that phasiness is an issue with the program listening tests with different target groups were material at hand – actual audibility will depend on scheduled, this time including A/B comparisons. HF content and on whether the audience is seated 1 Film sound people or invited to move around. You can trade in some of the phasiness in the The first test session was conducted at the centre for a slight loss of localisation precision by Kunsthochschule für Medien in Cologne, in a “de­tuning” your carefully measured delays. As a Dolby­certified film mixing room with very dry acoustics. The audience consisted of an

46 and 12dB/oct variants are available as part of the 5 For a detailed discussion of dual­band decoding, see CALF plugin set [Fol10]. [Ger74].

52 experienced film mixing engineer, a film participant perceived an irritating amount of composer, two students of media arts, a film phasiness and jumping sources depending on pitch projectionist and a media arts Ph.D. student. Due (specifically that transient attacks would seem to to the limited number and at the same time high come from a different direction than the sustained expertise of the participants, it was decided not to sound). In direct comparison, two test subjects attempt a statistically valid formalized test, but to reported audible coloration. collect opinions and discuss impressions instead. Half of the participants still expressed a clear Two rigs of identical loudspeakers (K+H preference for 5.0, while the other half was O108TV 2­way 8” systems) were set up for direct undecided. General consensus was that the comparison: an Ambisonic octagon driven in third Ambisonic system was far better suited to music order, and an ITU 5.0 setup. Both rigs shared a reproduction, where its characteristics would be common center speaker. For the test, 5.0 content less obtrusive or even beneficial. (both film and music) was reproduced either A couple of tentative conclusions can be drawn: natively or as 3rd­order virtual sources on the Ambisonic rig. Obviously, the Ambisonic system 1. For cinema use, absolute localisation of the could not be expected to perform better than the center channel is mandatory. This requirement ITU rig, as it had to suffer from the inherent cannot be met by a third­order Ambisonic system. limitations of the 5.0 source material, and then In a subsequent test run, the centre channel was added its own problems. Rather, the goal was to routed around the Ambisonic encoder and directly find out whether Ambisonic reproduction could be to the front speaker, with a gain boost of 4dB to considered a workable compromise, and to learn maintain balance (the exact amount will likely vary more about the perception of its shortcomings. with the number of speakers and Ambisonic Participants were encouraged to move around order). This hybrid setup was found acceptable. while listening, explore the usable listening area Since that speaker remained part of the Ambisonic and watch out for position­dependent artifacts. rig, it now carried both the direct C signal and components of L/R/SL/SR. The test content did In the first listening phase (which featured a not show obvious artefacts, but further experiments short animated film), the most strongly evident with LCR­panned material will be necessary to problem of the rig under test was the tendency of check this method for phasing or comb filter sources to move along with the listener, i.e. to effects. maintain relative direction but not absolute 2. Focus and stability of localisation were position. This was unanimously deemed considered critical, while holes in the side and unacceptable for the centre channel in film sound rear sound stages or distorted ambience reproduction. reproduction or localisable speakers were not. Listeners also reported a much wider frontal Even when listening to native B­format recordings, sound stage, which some found pleasant, and the test subjects did not express much appreciation others criticized as “loss of frontal focus”. for the advantages of Ambisonic reproduction. All participants expressed a clear preference for Instead, they reported localisation ambiguities and native reproduction in the film sound use case. The coloration (which were clearly present, but not mixing engineer remarked that she could imagine particularly disturbing to this author, compared to working with an Ambisonic system, but that the the benefits). increased stage width would have affected her This experience suggests that people who have mixing decisions. had prior exposure to Ambisonics have learned to “decode” a great amount of spatial detail from In a second listening phase, various 5.0 music Ambisonic playback and will usually prefer it, snippets were presented: a Händel aria, a piece for while subjects without such listening experience organ and percussion, some lounge jazz and an tend to be irritated by ambiguities and diffuseness. electric reggae track. It might well be that Ambisonic listening takes The initial reaction to Ambisonic playback was some getting used to, and that the listeners' verdict positive – the listeners reported better envelopment might improve over time. But then it also implies and pleasant ambience reproduction. One that Ambi “professionals” tend to overestimate the

53 impact on (and quality perceived by) “laypersons”, imaging. Here, any diffuseness or spatial widening i.e. casual listeners. was clearly detrimental. 3. The widening of the listening area during On average, the subjects expressed “no Ambisonic playback postulated by the author was preference” with a slight overall tendency towards not substantiated by listener remarks. native reproduction, and assessed the Ambisonic Interestingly, the collapsing of the image into a system to be “very usable” for conveying the side or rear speaker during 5.0 playback at artistic intention. However, individual peripheral positions was not commented on either. judgements deviated considerably – in the most Apparently it was considered natural and not extreme case, one person expressed a strong perceived as a problem. preference for Ambisonics while another 4. The dry acoustics of typical cinema rooms considered it clearly unusable for the same excerpt. make phasing artifacts very evident. Additional Again, the subjects did not share the author's measures to reduce phasiness (such as de­ impression of a slightly enlarged sweet spot. On correlators for the HF band) should be explored average, the listening area for “good” imaging was under such circumstances. found to be 0.5 times the array radius for 2 Electro-acoustic musicians Ambisonics, and 0.6 for discrete; “usable” imaging was perceived up to 0.7 and 0.8, respectively. A second listening test was conducted at the In comments it became evident that subjects Institut für Computermusik und Elektronische wanted to be able to pinpoint speaker positions Medien (ICEM) at Folkwang Hochschule Essen. – much to the author's surprise, the ability to do so During a 2­day workshop, one sound engineer, five was considered a matter of reproduction fidelity. students and one professor of electro­acoustic composition experimented with an octagonal setup After the survey, some music DVDs were played of Apogee AE­8 15/2” speakers in a moderately back in both modes, which provided further ambient room. On the first day, the students set up interesting insights. One specimen in particular, a their own quadrophonic compositions for playback recent live production of the Eagles' “Hotel and evaluated both native and Ambisonic California” showed severe comb filtering of frontal renderings. On the second day, a short survey was sources, to the point where the result was unusable. taken. Detailed inspection showed negative correlation For this survey, the author had selected seven between the C and L/R channels, a production trick excerpts, five from student's works and two from that widens frontal sources over discrete speakers, other compositions. Each excerpt was first played but lets Ambisonics fall flat on its face. A Pink back natively, then via Ambisonics. Participants Floyd Live DVD had the main instruments mixed were encouraged to move around to be able to to L/R exclusively, without significant centre assess the usable listening area. After some time channel content. When switched to Ambisonic for note­taking, each excerpt was repeated, again playback, the intentionally airy and diffuse sound on both systems. The subjects were asked to stage would gel into a very focused centre image, compare the Ambisonic playback to the discrete certainly “correct” but very audibly different. reference, specifically coloration, stability of It could be argued that these highly media­ localisation, angular fidelity, area of very good specific hacks are not valid testing material, but resp. usable reproduction, sound of phantom then again composers will always try to stretch the sources, smoothness of movement, ambience, limits of any given playback system. Any engineer envelopment, source distance and size, personal trying to employ Ambisonic techniques will have preference, and appropriateness of the Ambisonic to be aware of such corner cases. reproduction for the artistic work. Most DVD content showed a slight damping of Subjects reported minor shortcomings the treble when played on Ambisonics. This effect throughout, but they were mostly deemed usually caused a clear preference for discrete inobtrusive. Only for one piece was the Ambisonic rendering, which disappeared after applying some system considered unsuitable: it treated the corrective EQ. speakers as individual sources, without any correlated information or intent for phantom

54 One participant nicely summarized the overall [Sle2010] Stephane Letz et al., JACK2 on Windows: consensus in remarking that in tape music, any http://trac.jackaudio.org/browser/jack2/trunk/jackmp form of playback is “interpretation”, and that he /windows/Setup/src/README, 2010 considered Ambisonic playback to be a different but valid interpretation except in special cases. Acknowledgements Conclusion The author would like to thank Martin Rumori Informal listening tests have provided interesting and Judith Nordbrock at Kunsthochschule für insights into the potential and limitations of virtual Medien, Köln, and Prof. Thomas Neuhaus and speaker sources using Ambisonic panning. While Martin Preu at ICEM, Folkwang Hochschule acutely aware of the shortcomings, this author Essen, for their kind support and expertise in remains convinced that such systems are a very conducting the listening tests. useful tool for electro­acoustic music facilities and can provide a very pleasing and successful concert experience. Since they can be built from commodity hardware and free software, the entrance barrier to a successful deployment is low, in theory. It is the author's hope that this paper contributes to lowering that barrier in actual practice.

References [Adr10] Fons Adriaensen, the AMB plugin set, containing Ambisonic panners up to third order: http://kokkinizita.net/linuxaudio/downloads/index.ht ml, 2010 [Adr10­2] Fons Adriaensen, AmbDec, a state­of­the­art Ambisonic decoder, l.c., 2010 [Adr10­3] Fons Adriaensen, JAPA, the JACK audio perceptual analyser: l.c., 2010 [Dav10] Paul Davis et al., Ardour digital audio workstation: http://ardour.org, 2010 [Dav10­2] Paul Davis, Torben Hohn, et al., The JACK Audio Connection Kit: http://jackaudio.org/, 2010 [Fol10] The CALF plugin collection, by Krzysztof Foltman and Thor Harald Johansen: http://calf.sourceforge.net, 2010 [Ger74] Michael A. Gerzon: Surround Sound Psychoacoustics, in: Wireless World 12/1974: http://www.audiosignal.co.uk/Resources/Surround_s ound_psychoacoustics_A4.pdf, 1974 [Lil10] Jonathan Moore Liles, Non­Mixer, an alternative JACK mixing/routing application: http://non­mixer.tuxfamily.org/, 2010 [Net08] Jörn Nettingsmeier, Ambisonics at Home: http://stackingdwarves.net/public_stuff/linux_audio/t mt08/Ambisonic_Listening_Rig_with_Free_Softwar e.pdf, 2008 [Sbr09] Denis Sbragion, DRC, the digital room correction package: http://drc­firs.sourceforge.net, 2010

55 56 Real­Time Kernel For Audio and Visual Applications

John Kacur Red Hat 19243, Wittenburg Germany [email protected]

Abstract important feature is predictability or determinism. Often people who are new to real­time think it is Abstract: Many of the Linux Distributions that are about raw speed, but this isn't so. Real­time dedicated to audio and video make use of the actually sacrifices some throughput performance Linux real­time kernel. This paper explores some in­order to achieve predictability and low latency of the advantages and disadvantages of using real­ time. It explains how the real­time kernel achieves where it is deemed important. A standard kernel low­latency and shows how user­space can take will achieve better throughput on average, but a advantage of real­time capabilities. This talk is process may run very quickly some of the time but presented by one of the real­time kernel be delayed the rest of the time.. In contrast a real­ programmers, and gives an overview of the real­ time system will be less “bursty”, but have time kernel for audio and video. predictable performance. The measurement of the degree that time deviates from the average is Keywords called jitter, and it is this jitter that is greatly reduced with real­time. real­time, linux kernel The PREEMPT­RT kernel achieves predictability by ensuring that no operation takes 1 Introduction more than 100 microseconds. In some cases the Linux distributions dedicated to audio and video latency is as low as 50 microseconds which is were some of the first distributions to ship the close to the capabilities of the hardware. The low Linux real­time kernel to a general audience. latency is important for high priority processes to Software such as Jack and Ardour among many accomplish what is required of them on time. By other programs are designed to take advantage of this, we mean the process must do what is required real­time capabilities. The standard Linux Kernel neither too slowly, nor too fast. If you are playing has the facillities to do priority based scheduling, a note in a song, it is just as wrong for it to play but for optimum low latency, the PREEMPT_RT too late as it is for it to play too early. kernel is required. Indeed many features of the Although some throughput is sacrificed for PREEMPT_RT kernel have already been determinism and low­latency, kernel programmers integrated into the standard kernel as an option, are working to lower the throughput gap as well. most notibaly soft and hard threaded irqs. 2.1 Hard real­time vs soft real­time Debugging features such as ftrace and lockdep also were originally designed for the real­time kernel. Surprisingly no single definition exists of hard­ This paper will explore how the real­time Linux real time. Hard real­time can be thought of as a Kernel achieves such low latency, how to system that never misses it's deadlines, and a soft­ configure a system to take advantage of the real­time system is one that sometimes can be capabilities it offers, and an introdution to the allowed to miss it's deadlines. programming interface from user­space. The reality is more like a spectrum between these two poles. Since no hardware (or software 2 What is real­time? for that matter) is perfect, it is doubtful that the ideal of hard real­time actually exists. So, the When we talk about real­time systems, the most interesting question is, what should the system do

57 if it misses a deadline? For example, if a deadline The Linux Kernel is designed to run on SMP is missed, should an event be cancelled or (Symmetric Multiprocessing) systems. This means delivered late? If we are talking about video, it is that code must safely run on multiprocessors. In often okay to drop some frames without it being order for this to work properly, kernel noticeable, so that would be a case where we could programmers must identify critical sections. cancel meeting a deadline. In cases where we are Critical sections are code with data that could be controlling a machine, it might actually be useless corrupted if it were suddenly interrupted, and later to deliver an event that controls, say a motor after rerun on the same or a different processor. In order it is too late. to protect this data, various types of locks are used. Sometimes a softer version of hard real­time is The most prevelant kind of lock is a spinlock. It acceptable. An audio application may have a works by simply waiting (spinning) while another desired latency of 5 milliseconds, but can process holds the lock to a critical section. When occasionally be allowed to miss this deadline, but hopefully a short time later the lock is released, the only if it isn't more than 10 milliseconds. [1} waiting process can then grab the lock and continue to work. PREEMPT_DESKTOP can 2.1.1 What is PREEMPT_RT? preempt most code, but not code that is running a PREEMPT_RT is a real­time solution for Linux critical section – that is locked code. in which almost everything in the kernel is PREEMPT_RT works by converting most preemptable. The standard Linux kernel has an spinlocks into sleeping spinlocks, which in turn option called Preemtible Kernel (Low­Latency can be preempted. If you need a lock that can't Desktop) PREEMPT_DESKTOP, in which all sleep on ­rt, then you identify it by making it a kernel code that is not in a critical section, that raw_spinlock. Obviously the goal is to have the means protected by locks, is involuntarily bare minimum of these raw_spinlock as necessary, preemtible. The PREEMPT_RT option extends since they reduce the areas of code that are this to make even most critical sections preemtible. involuntarily preemtible. It does this by replacing kernel spinlocks with rt­mutexes. In addition rt­ 2.1.3 What is Priority Inversion and Priority mutexes are supported by priority inheritance – Inheritance? more on that latter Priority Inversion occurs when a lower priority PREEMPT_RT includes threaded soft and hard task runs at the expense of a higher priority task. irq interrupts – this is now an option in the Here is one of the simplest ways in which this can standard kernel. By making interrupts threaded, occur: they are also now preemtible, they can be Imagine you have a high priority task A with a scheduled just like any other process, and can be resource needed by an even high priority task B. given a priority, which can undergo priority Task B must block until task A frees it's resource. inheritance if necessary. Then imagine a task C with a priority between A Another critical component of a real­time and B. Because the priority of C is higher than that system developed for PREEMPT_RT is high of A, it preempts A and steals the processor. Thus resolution timers. we have process C running at the expense of the Finally, many tracing and verification features higher priority task B The solution that such as ftrace and lockdep were originally PREEMPT_RT provides to deal with this siutation developed for PREEMPT_RT, but are now part of is called priority inheritance. Priority Inheritance the standard kernel. works by having higher priority tasks temporarily lending their prirority to lower priority tasks that 2.1.2 How does PREEMPT_RT work? hold resources that they require. So in the situation described here, task B would lend it's priority to First some background: Voluntary preemption is task A, and A's prirority would rise to the same when kernel code can choose at preselected points level as A's. That way task C would not be allowed in the code to be preempted by higher priority to preempt A. Task A would run until it freed the processes. Involuntary preemption is when the resource required by B. B would then be scheduled scheduler can force a process to stop running in to run, and A would return to it's original priority. order to allow a higher priority process to run.

58 3 Configuring real­time

3.1 Fetching, building and configuring the kernel Some of the audio distributions are a little slower than the mainline distributions to update. So, if you need the latest hardware support, or simply want to try out the latest and greatest kernel, you may have to fetch, compile and install it by hand. Here's how to do so. You can fetch the latest rt­patch (and archived ones as well) from: http://www.kernel.org/pub/linux/kernel/projects/rt/

The latest one when I wrote this document (March 2010) was patch­2.6.33.1­rt10.bz2 From the name we can tell that it patches linux­ 2.6.33 with the stable patch linux­2.6.33.1 so we will need to fetch those as well.

wget http://www.kernel.org/pub/linux/ke rnel/v2.6/linux­2.6.33.tar.bz2 wget http://www.kernel.org/pub/linux/ke rnel/v2.6/patch­2.6.33.1.bz2 wget http://www.kernel.org/pub/linux/ke rnel/projects/rt/patch­2.6.33.1­ rt10.bz2

Now untar and apply the patches

tar xjf linux­2.6.33.tar.bz2 cd linux­2.6.33 bunzip2 ­c ../patch­2.6.33.1.bz2 | patch ­p1 bunzip2 ­c ../patch­2.6.33.1­ rt10.bz2 | patch ­p1

The rt kernel is also available via git for those who know how. For more information, see the rt­ users mailing list, or Rtwiki. (below)

l inux­rt­[email protected] https://rt.wiki.kernel.org/

You can base your configuration on your distribution's configuration. Look in your /boot directory for the config that matches your current running kernel, or copy it from /proc/config.gz if available as the real­time kernel's .config file. This

59 will save you from having to answer all the 3.2.2 /proc/sys/kernel/sched_rt_runtime_us questions when you run make menuconfig and /proc/sys/kernel/sched_rt_period_us friends. A few choices are necessary for real­time. Under “Processor type and features”, it is The default values here are: important that you select Complete Preemption (PREEMPT_RT) which will automatically select sched_rt_period_us 1000000 Thread Softirqs (PREEMPT_SOFTIRQS) and sched_rt_runtime_us 950000 Thread Hardirqs (PREEMPT_HARDIRQS) for you. Also recommended are The first value is the amount of time in TREE_PREEMPT_RCU, Tickless System microseconds that represents 100% of the CPU (Dynamic Ticks) (NO_HZ), and High Resolution bandwidth. [3] The second value is the amount of Timer Support (HIGH_RES_TIMERS). You can time in microseconds that real­time processes are also select FTRACE without any noticeable effect allowed to run. This means that 1000000 ­ 950000 on latency when not enabled. One more note of = 50000 microseconds or 0.05 seconds are caution, in the “General Setup” section, for reserved for non­realtime tasks, in other words for “Choose SLAB allocator”, you must choose SLAB SCHED_OTHER tasks to run. This is a soft­real­ as SLUB is not currently supported. time behaviour, designed to protect you against 3.2 Configuring your system runaway real­time processes that could hijack your system. You should definitely leave this at the There are many possible ways this can be done, default setting for testing. However, when you are but the key element is a way to specify either ready to go to production, and want something which users or programs get real­time privileges. closer to hard­realtime behaviour, you can set One possible scheme is to create a group called sched_rt_runtime_us to ­1. This will now allow “realtime”, and make sure any user that requires real­time tasks to monopolize the processor 100% real­time privileges is assigned to it. Other of the requested time. Once again, use caution possible groups that are commonly used are here, because this means that a misbehaving user­ “audio”, or “jackuser”. The following creates the space program can make your system unuseable. group “realtime”, and adds username jkacur to it. To change the setting, do: sudo groupadd realtime sudo usermod ­G realtime jkacur sudo ­c 'echo ­1 > 3.2.1 /etc/security/limits.conf /proc/sys/kernel/sched_rt_runtime_ us' Here we need to modify users or groups to give them the privileges they require to run with real­ 3.2.3 IRQs time priorities, without being superuser. Now that IRQs are threads, they can be given a Typical values would be. priority. An ­rt distribution will set these values with a start­up script. However, for optimum @realtime soft cpu unlimited performance, and because of differences between @realtime ­ rtprio 100 your particular hardware, you may want to tune @realtime ­ nice ­20 these values. @realtime ­ memlock unlimited To see the current values on your system, use ps with the ­o which is the option to control the These allow users that belong to the realtime group output. We will select, pid, cls for the scheduling to run with unlimited cpu time, the maximum real­ class, see table below time priority, the maximum nice value, and to lock unlimited amounts of memory. You may have to reboot your system before these changes take CLS Scheduling Class reported from ps effect. ­ not reported TS SCHED_OTHER FF SCHED_FIFO

60 RR SCHED_RR These are critical to the system, and it is not ? unknown value recommented that userspace real­time process Table 1. compete with them. Some papers suggest, and some distributions and scripts set the audio and rtprio for the real­time priority, prio for the video processes to run between the highest priority, nice, and finally the cmd. [2] priorities and that of the tasklets and hardirqs. For For example, on an untuned vanilla distribution this to work well, the audio and video processes running a real­time kernel, looking at the tasklets would have to be very well written and run with which are similar to the bottom halves of real­time priorities for very short periods. I would traditional interrupts: (that is, the part of the suggest that audio and video run at the highest interrupt that is scheduled to run later than the part priority just under the tasklets. Given the scheme that is serviced right away) presentented here, 80 would be a good value. @audio ­ rtprio 80 $ ps ­eLo @audio ­ nice ­20 pid,cls,rtprio,pri,nice,cmd | grep ­i tasklet 10 FF 49 89 – [sirq­tasklet/0] This should give the audio / video processes high 25 FF 49 89 – [sirq­tasklet/1] enough priorities to achieve very low latency without interfereing with any systerm critical real­ These values can be changed using the chrt time processes, which in turn could degrade the command. Chrt changes or retrieves the real­time overall perforance of a system. scheduling attributes of a process or task. It is usually installed by default on most distributions. 4 Tests, Benchmarks, measuring It is part of the util­linux­ng package. Tasklets should run higher than most real­time 4.1 Rt­tests processes so 82 would be a reasonable value to set Rt­tests is a suite of tests to stress various parts them to on this machine. [4] of the real­time kernel. The core of it is cyclictest To change the above. written by Thomas Gleixner. It is now being maintained by Clark Williams, and can be fetched $ su ­c "chrt ­f ­p 82 10" via git here: $ su ­c "chrt ­f ­p 82 25" git://git.kernel.org/pub/scm/linux/kernel/git/clrkwll $ ps ­eo ms/rt­tests.git pid,cls,rtprio,pri,nice,cmd | grep The idea behind cyclictest is simply to fire off a ­i tasklet 10 FF 82 122 ­ [sirq­tasklet/0] number of high resolution timers from real­time 25 FF 82 122 ­ [sirq­tasklet/1] threads and measure the difference between the time the timer is supposed to conclude and the In general hardirqs (hardware interrupts) should be time that it actually concludes. cyclictest can be set slightly higher than the soft­interrupts (which used to measure your system's minimum, average you recognize by “sirq”). With tasklets at 82, 85 and maximum latencies, and can thus be useful for would be a reasonable priority for hardirqs. tuning. Here is a typical run, just accepting the Certain threads have the highest real­time defaults. priority, of 99 ./cyclictest ps ­eo pid,cls,rtprio,pri,nice,cmd defaulting realtime priority to 2 | grep ­i ff | grep 99 policy: fifo: loadavg: 0.13 0.06 3 FF 99 139 ­ [migration/0] 0.01 1/319 6879 14 FF 99 139 ­ [posixcputmr/0] T: 0 ( 6879) P: 2 I:1000 C: 7889 15 FF 99 139 ­ [watchdog/0] Min: 16 Act: 117 Avg: 107 17 FF 99 139 ­ [migration/1] Max: 298 18 FF 99 139 ­ [posixcputmr/1] 29 FF 99 139 ­ [watchdog/1]

61 The above shows two fifo threads, running at 5 Userspace Programming, a whirlwind tour priority 2. The timer is firing at intervals of 1000 and a caution. nanoseconds. (1 microsecond). So far, 7889 cycles occurred. What does a real­time program look like in The results were a minimum latency of 16 userspace? Surprisingly simple. Of course it is a microseconds, and average latency of 107 real art to identify which processes need to run microseconds, and a maximum of 298. with real­time priorities, and in most cases it is optimal to run with a higher priority for the 4.2 hwlatdetect and SMIs minimum amount of time necessary. Hwlatdetect is a tool supplied with rt­tests to try Here is a whirlwind tour of some of the calls. to detect unexplained hardware latencies. One source of hardware latencies that we have very To raise the priority of a process, you can use little influence over is SMIs – System standard POSIX calls such as: Management Interrupts. These interrupts cannot be sched_setscheduler(pid_t pid, int handled by software and can last tens of policy, const struct sched_param microseconds. It can be very dangerous for the *param) stability of your hardware to attempt to turn them off too, because they often are in charge of such This call sets both the policy (SCHED_FIFO in functions as making sure that your cpu doesn't the example below) and the priority. If the process overheat. In some cases with extreme caution, if id is set to 0, the parameters are applied to the your BIOS allows you to, you can turn some of calling process. them off. In some cases the SMI routines have been poorly written and the best we can do is bring For example. (error checking code omitted) to the attention of hardware makers that an SMI on a particular piece of hardware is taking an struct sched_param param; excessive amount of time. struct sched_param *pparm = Hwlatdetect works in cooperation with the ¶m; hwlat_detect.ko kernel module. This kernel pparam­>sched_priority = 50; module comes as a standard feature in recent rt sched_setscheduler(0, SCHED_FIFO, pparam); kernels. It tries to detect SMIs by monopolizing a cpu for a long time, with interrupts disabled. Since You can retrieve the current scheduling policy the only thing that could interrupt such a cpu is an with SMI, any gaps detected in the time that hwlatdetect calls stop_machine are thus likely due to SMIs. sched_getscheduler(pid_t pid) 4.3 Rteval To retrieve or set the priority, you can use: Rteval is a program for evaluating the latency and performance of your system. The idea is sched_getparam(pid_t pid, struct simply to stress your system with some standard sched_param *param) linux benchmarks such as dbench and hackbench, sched_setparam(pid_t pid, const to see if they have an effect on the real­time struct sched_param *param) capabilities of your system as measured by cyclictest. To determine the minimum and maximum priorities allowed for a particular scheduling You can fetch it via git here: policy on your system, you can use: git://git.kernel.org/pub/scm/linux int sched_get_priority_max(int /kernel/git/clrkwllms/rt­tests.git policy); int sched_get_priority_min(int policy);

62 To prevent unexpected page faults of real­time [3] See Documentation/scheduler/sched­rt­ programs, it is common to lock all current and group.txt in linux kernel source 2.6.33.1­rt11 or future memory. later mlockall(MCL_CURRENT | MCL_FUTURE) [4] The suggested values are based on Red Hat's However, one caution, priority inheritance is MRG distribution. only supported by rt_mutexes in the kernel. In user space this translates to pthread_mutexes. Ordinary Other Sources of information unix semaphores in user­space are not supported by priority inheritance. The reason for this is, that “Real Time” vs “Real Fast”: How to Choose? it is not possible (or easy in any case) to identify the owner (pid) that blocked around a semaphore. Paul McKenney, Eleventh Real­Time Linux Workshoop, Dresden, 2009 If you don't want to use pthread programming but want support for priority inheritance, you can Programming for the Real World, POSIX.4 create a lock using shared memory and a Bill O.Gallmeister, O'Reilly & Associates, INC. pthread_mutex that the usual POSIX style process Copyright 1995 calls can access. For an example of how this is done, see pip_stress in the rt­test suite. Web Resources 6 Conclusion https://rt.wiki.kernel.org Linux is well suited as a low­latency real­time operating system for audio and video. With a little Approaches to Real­time, Jon Corbet bit of back­ground knowledge, an end user can http://lwn.net/Articles/106010/ tune his / her system for maximum performance and real­time determinism. This paper should get Realtime preemption, part 2, Jon Corbet you well on your way to experimenting and tuning your real­time Linux system for optimal audio and http://lwn.net/Articles/107269/ video performance. A real-time preemption overview, Paul McKenney 7 Acknowledgements http://lwn.net/Articles/146861/ Ingo Molnar, Thomas Gleixner, Paul McKenney, Steven Rostedt, Peter Zilstra and countless others for creating PREEMPT_RT.

References [1} Paul McKenney, 2007. SMP and Embedded Real Time, January 2007 Linux Journal, issue #153 – There are more examples here illustrating the differences between hard and soft real­time. [2] http://subversion.ffado.org/wiki/IrqPriorities does an excellent job and explaining how to set irq priorities for real­time audio.

63 64 Re-Generating Stockhausen's "Studie II" in Csound A Study About Algorithmic Composition Joachim HEINTZ Incontri - Institute for new music, HMT Hannover Emmichplatz 1 30175 Hannover, Germany [email protected]

Abstract composers on working with synthesized sounds suggests itself, because only in this field can also Stockhausen's "Studie II" (1954) is one of the the timbre be composed in a serial way. But this classical works of electronic music. Written at a time paper wants to point out yet another relationship when Computers played no role in the production of sound, it exhibits a way of composing which is quite between this piece and electronic music: the close similar to programming. This can be shown by re- correlation between serial composing and programming the complete piece just with the input programming. As the serial technique can be of five numbers. Beside this reduction - which is understood as a bundel of definable rules, it can be made in Csound - the compositorial decisions come programmed. to the fore, showing a flexibility and variability of By doing this, two different goals are persued. algorithms1 which can be inspiring and challenging One is more sportive: proving the thesis of still today. programmability by re-programming Studie II in practice, based on a series of 5 numbers as input. Keywords The second one is more important as a study about composing in general and about algorithmic Music Composition, Audio Programming, Csound, composing in particular: How is the "machine" Algorithmic Composition, History of Electronic Music built? Is there something like "put n formulas in it and let it play for m seconds", or are there 1 Introduction changes? Did the composer react to the result he sees (or hears) in the process of composition, or is Stockhausen's Studie II which was composed there an automatism? And how can the and realized in 1954, is well known as the perhaps irregularities we see in the score be judged: as earliest and strictest application of the serial errors, or intentional decisions, or both? technique of composition 2 to a piece of electronic music. The interest of Stockhausen and other serial 2 Building the Musical Material 1I use the terms "algorithm" or "algorithmic When a composer wants to write a piece of composition" in a wide, general way. Algorithm in my purely electronic music, a word from the bible may use is an instruction which can be executed in a definite come to his mind: "And the earth was without way to come from a state A (the input) to a state B (the form, and void" (Genesis 1, 2). There is nothing output). I call "algorithmic composition" a way of which is self-confident; no ambitus, no scales, no composing music in which the whole generation of chords, no times. If one doesn't want to use scales pitches, durations, sounds etc. can be described by a set of such definite executable instructions. In terms of and durations "because they are there", one has to computers, it means: it can be programmed. So build ones own structure from the "void". So the "algorithmic composition" is in my understanding in no first task in re-generating Studie II is to repeat what way restricted to a simple method like: "apply this Stockhausen did in preparing the musical material: formula(s) to these notes and you get the piece." In a) to build a collection of series which are used contrary, the algorithms can be different and complex, for many different purposes, and and they are derived from musical decisions. If the term b) to build a pool of distinct values for algorithm is used in a strictly formula-like frequencies, durations and intensities. mathemathical way of changing state A to state B, it should not be applied to Stockhausen's Studie II (thanks 2"Serialism" means "integral serialism" here: to Gottfried Michael Koenig for pointing me on the organizing all relevant parameters (not just the pitch) by issue of the use of "algorithmic" etc. in this piece). series.

65 2.1 Building the Series (Number Squares) with the starting values 3 5 1 4 2 (by which the starting sequence of 5 numbers are simply read The main input for Studie II is a serie of 5 vertically), the first number square (see above) is numbers: 3 5 1 4 2. By a number of procedures, produced.3 this series is expanded to a set of 2 x 5 "Number Squares", each consisting of 5 x 5 numbers. The 2.2 Building the Pools for Frequencies, first one, as an example, is the following: Durations and Intensities For the generation of possible values for 3 5 1 4 2 frequencies, Stockhausen used a method which is 5 2 3 1 4 quite similar to the well-known approach of an 1 3 4 2 5 equal tempered scale. The partition of the octave in 4 1 2 5 3 12 equal steps can be described as 12th root of 2 as 2 4 5 3 1 the relation for one frequency to the next possible. Stockhausen used essentially the same method, but For the re-generation of Studie II, these as he was possessed with the number 5 in this collections of 25 numbers (each building one piece, and also wanted to build new harmonies series) are stored in an array for further use. In instead of the usual cromatic scale, he decided to Csound, an array can be built as a "function table". take the 25th root of 5 (or in other words, the 5*5th So practically the program does at this point the root of 5 ...). This division of the double octave following: plus pure major third in 25 equal steps gives a a) It defines functions (in Csound: "User Defined multiplier of 1.066495... - which is quite close to Opcodes") which perform the various the cromatic semitone of 1.059463.... So his gain is modifications: from the starting 5 numbers to the to have something rather common which first square (series), and then by a different method nevertheless leads to new sounds. from the first square to the next four, and then by It's easy to build this scale by recursively an again different method to the next five ones. multiplying the starting value of 100 Hz with 5 1/25. b) It fills 10 function tables with the results. It is stored again in a function table. As a simple example, this is one of the functions The durations are built by the same method, for transforming: starting with 2.5 cm (0.5 * 5 ...) as the minimal 4 opcode SS2_TP1Val, i, iiii length of the tape, and recursively multiplying by ;calculates a new value from ival and the 51/25 for getting 61 different durations. transposition interval interv, within the For practical reasons, the intensities are built scope of iminval and imaxval ival, interv, iminval, imaxval xin from 0 to -30 dB in one dB steps. For bringing this ivalneu = ival+interv table to the same length as the frequency and ires = (ivalneu>imaxval?\ ivalneu-imaxval : (ivalneu

66 3.1 First Part Now four series are used to get a frequency and a mixture. The first series (A) determines one of The whole piece can be considered as a the 9 groups in abstract. Let the first value of this collection of in total 5 * 75 + 5 = 380 single serie be 3. Then a "transposition" value (B) is events. Each event needs six determinations: applied to this. This "transposition" works like a a) the base frequency transposition in music: a transposition by 1 (prime) b) the mixture built on this frequency lets the note as it is, a transposition by 2 transposes (there are 5 mixtures ...) it 1 step up, and so on. Let this transposition value c) the duration be also 3. The result of the abstract group value, d) the start time transposed by the transposition value (A t B) is e) the maximum intensity then 5. This is the number of the group. Then two f) the envelope other series are used to determine the column of the group (C) and the position in this column (D). The method which Stockhausen used for Let the values of the column and the position be 5 obtaining the data for the events can't be seen as respective 3, then we have as the base frequency of simple and straightforward. It is rather complex, the first mixture 690 Hz. The kind of mixture giving the impression of a kind of organic equals the column value, so we get mixture structure, organic movement, as if there are worms number 5 and by this the frequencies of the other which move under certain circumstances. I will four sine tones above 690 Hz as 952, 1310, 1810 describe the first part more in detail, so that then and 2500 Hz. the other parts can be discussed in relation and deviation to it. It is important that the four series are not running at the same speed but asynchronously. There is one 3.1.1 Frequencies and Mixtures "superior" series (S) which determines how many For obtaining a base frequency for an event, the repetitions of A and C are performed. For instance, frequency table (cf. 2.1) is structured by 9 groups 5: if the first five values of A are 3 5 1 4 2, and the values of S are 2 4 5 3 1, the result is A with these repetitions:

A: 3 3 | 5 5 5 5 | 1 1 1 1 1 | 4 4 4 | 2 S: 2 4 5 3 1 (repetitions)

This is the way A and C are running. The transposition value B does not change during the whole part.6 And the series D (giving the positions) is running entirely without repetitions. So there are three different movements: A and C are moving in a "regular speed", ruled by S; B is not moving at all; D is moving fast. In writing a function for this, it results in two nested loops:

6Obviously it was the plan to have one transposition 5Heinz Silberhorn, Die Reihentechnik in Stock- value for each part: 3 5 1 4 2. But Stockhausen changed hausens Studie II, Rohrdorfer Musikverlag 1980, S. 17 this plan later (see 4.3).

67 In Csound Code:

3.1.2 Durations and Starts of Events In the first part, Stockhausen decided to give the The method for obtaining the durations is events a somehow melodic structure. If one group essentially the same as for the frequencies. So, in consists of 3 events, just the time for the first one is the code the same function (called taken from the ghost serie. Then event 2 starts at SS2_MkParamTab_Meth1) could be used. It the end of event 1, and event 3 at the end of event returns a function table which contains the 2: durations of the 75 events of the first part. As for the starting times of the events, two different decisions have been made. The first one is a general one for all the event starts in Studie II: A second series of durations is generated. But these In the code, instead of calculating all the 75 starts, durations are not read as durations, but as starting just the 25 starts of the groups (sequences) are points: they are imagined to succeed each other calculated. These values are then taken as the start immediately, and the resulting times are captured. values of a loop, in which the durations are shifted For instance, if these ghost-duration serie had the by the previous ones. values 20.9, 45.3, 104.6, 71.2, 91.9, 91.9 (in cm), the resulting starts are 0, 20.9, 66.2, 170.8, 242.0, 333.9, 425.8. The second decision is different for each of the five parts of Studie II. How are these values used?

68 The realization as Csound Code works again with two nested loops (see above 3.1.1). Here the inner loop has to add the durations:

note: ;common start: first duration normal, all others are added up if ienv == 1 || ienv == 2 then ;durations from normal table idurnorm tab_i indxnotabs, iftdur ;real duration = sum of preceding durations of this sequence plus idurnorm idur = idurnorm + iduraccum ;this is the next value for the assembled durations iduraccum = idur ;write durations to iftout tabw_i idur,indxnotabs,iftout ;common end: last duration normal, all others are added up reversely else ;counting down indxback = indxabsseq+icount - indxnote-1 idurnorm tab_i indxback, iftdur idur = idurnorm + iduraccum iduraccum = idur tabw_i idur, indxback, iftout endif indxnotabs = indxnotabs + 1 loop_lt indxnote, 1, icount, note

Figure 1: Stockhausen's sketches for Studie II, part 1. 3.3 Third, Fourth and Fifth Part © Archiv der Stockhausen-Stiftung für Musik, Kürten (www.stockhausen.org) In order to not go into too much detail but to discuss more general questions in the next chapter, the focus here is taken on the main differences between the parts and its methods. 3.2 Second Part In the third part all events are isolated; like staccato tones on a piano. For widening the The main difference of the second part to the ambitus to potentially all frequencies, the method first one is the way the single events are treated. of using the series (cf. 3.1.1) changes in the way They no longer succeed each other but are played that the previously constant B now also changes simultaneously, building "chords". These chords like the A and C series. have either common starts or common ends. The (see figure 2, next page) durations are working like additions; if the durations of a group of 3 events are 18.4, 14.2 and 23.8 cm, and the 3 events start at the same time, the first event has 18.4 cm, the second has 18.4 plus 14.2 cm, and the third has 18.4 + 14.2 + 23.8 cm:

69 ;else starting point as difference to the maximum duration of the sequence else idur2 tab_i indxtonabs, iftdurs istart2 = istartseq + (imaxdur-idur2) endif ;value for this note from this calculation tabw_i istart2,indxtonabs,iftout ;value for the next note from normal table tabw_i istartabs, indxtonabs+1,iftout ;typ=3: isolated events, or last note from a typ=1 sequence else tabw_i istartabs, indxtonabs+1,iftout endif indxtonabs = indxtonabs + 1 loop_lt indxton,1,icount, ton ;indxenv++ if value was used indxenv = (ityp == 2 ? indxenv+1 : indxenv)

4 Results, Problems, Conclusions Studying Studie II is not meant as an act of museal interest. The principal questions here are: • Can this composition which was made entirely by hand - as a composition and as a realization - be regenerated by program code? If yes - why? • Does everything run smooth or are there serious problems in the re-generation? • What can be learned by Stockhausen's way of Figure 2: Stockhausen's sketches for Studie II, part 3. Note that the "Tr" (Transposition) is now changing. © generation for today's algorithmic Archiv der Stockhausen-Stiftung für Musik, Kürten compositions? (www.stockhausen.org) 4.1 Can Studie II be regenerated as program In the fourth part this method is kept. Besides, code? the way of building chords from part 2 is repeated. Part five is a mixture of all the different Yes, it can. It can be proven that in principle all methods. It consists of "melodic" sequences like in the events of the piece can be re-generated, by just part 1, "chord" structures like in part 2 and 4, and starting with the numbers 3 5 1 4 2 and executing "staccato" passages like in part 3. So the inner loop some (describable = programmable) methods. It of the start times has to differentiate between the has been done in Csound, and it should be possible cases: to do it in any other audio programming language ton: with a similar power as Csound. The reason why istartdiff3 tab_i indxtonabs, iftstarts5a this is possible leads to the method of composing ;starts if direct from iftstarts5a/typ=3 which Stockhausen applied in this piece (and istartabs = istartabs + istartdiff3 ;typ1: concatenating the notes of one probably tests for the first time so consistently), the sequence so-called serial way of composing. Serialsm can be if ityp == 1 && indxton < icount-1 then idur1 tab_i indxtonabs, iftdurs decribed as a method which situates the decisions idurshift = idurshift + idur1 of a composer in the space of structures and istart1 = istartseq + idurshift methods. Studie II structures the material ;value for the next note after the duration of this note (frequencies, durations, timbres, intensities), tabw_i istart1, indxtonabs+1, iftout starting "from nothing", and defines methods of ;typ2: chords elseif ityp == 2 then generating events, using series which are also ; for common starts generated in a regular way. So there is a very if (ienv == 1 || ienv == 2 || ienv == 5) && (indxton < icount-1) then interesting interrelation between a compositional istart2 = istartseq technique which came from a purely musical development (Schoenberg - Webern - Messiaen)

70 and the emergence of the computer and mainly the first high-level programming languages (Fortran, Lisp) exactly at the same time in the mid-50s.

4.2 What are the difficulties in the regeneration of Studie II? The problems in the regeneration of Studie II can be divided into three different categories. First, there are obviously a number of errors in Stockhausens calculations of some frequencies or durations. These deviations of the correct calculation are recorded truly in the work of Silberhorn7. They are easy to understand in a work with 380 events, each event requiring at least four calculations, all done by hand and paper. But, second, there are certain deviations from the "actual" result, which are more complicated. It seems that sometimes Stockhausen decided intentionally for a different single event. In some cases it seems he avoided a nearly-repetition, in other cases corrected densities 8. And in part five one gets the impression that he often decided "on the fly", on the basis of the predefined structures, what he preferred for this situation. From the point of programming, these latter kind of deviations are crucial. If they are more than exceptions, the re- Figure 3: Stockhausen's sketches for Studie II, part 5. generation makes no sense, because it has just to Note the spontaneous side notes about the musical translate the exceptions, and this is no more a re- content. © Archiv der Stockhausen-Stiftung für Musik, generation but a transcription of the score. In my Kürten (www.stockhausen.org) opinion, in the first four parts of Studie II, there are just deviations by error or small exceptions, but the shapes. So I believe the envelopes can be fifth part comes close to the point of changing to a generated by series, too, and could show the series new quality. A closer look at the sketches proves it: for parts two and four, but the origin of the series (see figure 3, next column) are obscure.10

Third, there are some incertainties with respect to the generation of the envelopes. It can be shown by accurate analysis that part two and four use series of five shapes for the envelopes. 9 This should be possible to show also for part one. Part three uses just one envelope, and part five is also in this respect a combination of all the previous

7pp 128-167 8cf Silberhorn pp 118-125 9In my analysis and transcription, the series for part two and four are the following: 1 2 4 3 5 2 5 4 3 1 1 4 3 5 2 2 1 5 4 3 1 4 5 2 3 2 3 1 4 5 10Unfortunately, nothing could be found in the 4 3 1 5 2 2 5 1 3 4 sketches of Stockhausen which clarifies the origin of the 3 1 4 5 2 2 3 5 1 4 series which are mentioned in the previous footnote.

71 need too much inspiration to suspect that the values of the other three parts were planned to be 1, 4 and 2, so representing the king's crown (3 5 1 4 2) in this field. But then, at the beginning of part three, something exciting happens. Stockhausen decided to break the whole method: not to have one constant transposition in the whole part, but changing this value by running a series. Why? Because he saw that insisting on the method would have been "too consistent", this means: predictable, and wouldn't allow the staccato sequences of part three to have larger leaps. This is the paradigm of a musical reasonable decision Figure 4: Stockhausen's early sketch for Studie II, containing a list of possible envelopes. © Archiv der which breaks the "purity" of a method. Stockhausen-Stiftung für Musik, Kürten 3. Listen Also to the Single Events and Change (www.stockhausen.org) Them if Your Ear Judges So As it has been discussed (see 4.2), Stockhausen changes not just methods, but also single 4.3 What does the regeneration of Studie II events, or even small groups of events, show regarding questions of algorithmic according to the musical scope or a certain composition? situation. Generalizing this, we can say that a With a certain right, Studie II can be considered balance must be found in algorithmic an early model of algorithmic composition, the compositions between the coherence of application of serial compositional methods in the methods, of the generation of musical structures area of electronic sound (and today: computer on one hand, and the need of listening to the music). Re-generating this composition means: results - also in a small context - and coming to the root of musical decisions; being consequently changing some of them by where the composer was while working. From this, decisions on the other hand. Both extremes are some important conclusions can be drawn by this to avoid: just generating in an algorithmic way, work. even in an intelligent and changeable way, can 1. Change the Methods Depending on the Musical result in unmusical structures or "untempered" Structures you Want to Get relations at a scope; too much deviations gets It is impressive and inspiring to see how lost of the structure and the large context. Stockhausen arranges and adapts the flow of the series depending on the musical structures and differences he wants to get. This way of 5 Conclusion composing can be compared with an architect Re-Generating Stockhausen's Studie II forces of landscapes, regulating the streams of water, one to take to the inner movement and planting this here and that there. It is totally compositorial method of this piece. It reveals different from a way of "composing" which can meaningful correlations between a compositorial unfortunately not be considered as an exception technique and a formalized way of thinking which in the field of algorithmic composition: put one becomes reality in programming, and it shows algorithm in the machine and let the machine important qualities of a non-dogmatic but musical play for a long long time ... way of working in this domain. In this early piece, 2. Listen to the Results and React as a Musician Stockhausen has shown two virtues (which he was Instead of Sticking with Principles not famous for later in life): to be modest ("I'm It is extremely interesting to see how doing a study") and appropriate, and to be short, Stockhausen changed his own plans, obviously knowing that stopping before "all has been sayed" because he saw some musical problems if he can often be more than showing all the material would have kept them. One example is the one has generated. This work is a worthy model generation of the frequencies. He started with a exactly in this way, and a re-generation can convey constant "transposition" (cf 3.1.1) value of 3 in a modern formulation for this exemplarity. the first part and 5 in the second part. We do not

72 6 Acknowledgements Thanks to Anna Buschart for reading the manuscript, and to the students of the analysis seminar of electronic music at the HMT Hannover in winter 2009/2010 for their patience and interest. Special thanks to Kathinka Pasveer and the Stockhausen Stiftung for the generousity of not just letting me study the sketches but also giving the permission for printing some of the sketches here.

References The Re-Generation of Stockhausen's Studie II is part of the regular QuteCsound distribution since version 0.4.5. See http://sourceforge.net/projects/qutecsound (in the Examples -> Music menu).

It is based of the analysis of Heinz Silberhorn: Heinz Silberhorn, Die Reihentechnik in Stockhausens Studie II, Rohrdorfer Musikverlag 1980

73 74 Using open source music software to teach live electronics in pre-college music education

Hans Roels University College Ghent - Faculty of Music Hoogpoort 64 B-9000 Ghent, Belgium [email protected]

Abstract children, teenagers and amateur musicians can learn to perform on a digital musical instrument. A basic course of live electronics is needed in pre- This paper describes the basic components of college music education to teach children how to live electronics courses on a pre-college level, perform on a digital musical instrument. This paper describes the basic components of such a live examines whether open source music software electronics course, examines whether open source -and in specific Pure Data- is suited to realize music software is suited to realize these these components and finally presents Abunch, a components and finally presents Abunch, a library library in Pure Data created by the author, as a in Pure Data created by the author, as a solution for solution for the potential educational the potential educational disadvantages of open disadvantages of open source music software. source music software. 2 Live electronics Keywords 2.1 Digital Musical Instruments live electronics, music education, Pure Data, digital musical instruments, open source music It is important to know which fundamental software differences exist between live electronic and acoustical music before the content and 1 Introduction methodology of a live electronics course can be discussed. Since more than a decade home computers and First, in a digital musical instrument (DMI) the laptops have become powerful enough to process sound production unit and the user interface can be sound in real time. This in fact transformed seperated and recombined[1]. An acoustical link computers into musical instruments. As this between both parts as in an acoustical instrument change was taking place, computers became more is unexisting and unnecessary. The unique widely available in households and schools. Anno modular nature of a DMI should be central to live 2010 the computer has probably become the most electronics education as it is this particular feature widespread musical instrument in a large part of that distinguishes it from other traditional the developed world. instruments. This unique situation urgently prompts us to Secondly, we should ask ourselves what rethink and redesign our music education which is 'performing well' means in a live electronic context still largely built upon our traditional knowledge as we wish to learn our children and students to of acoustical instruments and music and to play this instrument well. Virtuosity in the digital 1 question why pre-college music education in musical era has an off-stage component which is 2 Europe lacks a course of live electronics in which almost as important as the on-stage one and which is not only different but also more diverse and 1The term 'pre-college music education' is used to extensive than in acoustical virtuosity. denote all kinds of music education for children, Developing, building, modifying or adapting a teenagers and grown-ups who haven't taken music digital instrument is highly important to produce a courses on a college, university or professional level convincing and expressive performance apart from 2Live electronics is used in a broad sense to describe the classical training. a performance with at least a human performer and an electronic device producing or processing sound

75 The third difference can be found in the level is new and necessary because they are information stream between composers and essential to use the full and unique potential of a performers. In electronic music the number of DMI (see 2.1). The essential parts of these parameters that can be manipulated (before or mapping consist of: during a performance) turn a traditional score into 1. basic math a restricted medium to communicate a message 2. basic boolean operators between a composer and a performer. Because not 3. comparison operators all parameters can be notated (in detail) a 4. assignment operator performer often needs to improvise along more 5. relay switch general guidelines. If a performer wants to learn 6. a module or system to order all this logic from a composer or from other performers how to and math in time perform he has to attend a performance, talk Because mapping is in fact programming, the directly to a collegue or composer or consult other components could be a lot more extensive but this media (recordings, texts, source codes, patches, selection provides a basic set with a lot of websites, mailing lists, videos,...) than a score. possibilities to connect user interface and DSP in Information has become multimodal in live many different ways. electronics. 2.3 Methodology Live electronic music is clearly different from acoustical music, the categories of human In a live electronics course the sound experience activities within music making reflect these and performance should form the basis of the changes. The boundaries between composer, teaching methodology. Performance requires fast improviser, performer and instrument-builder have skills and automatisms of the human body. These been blurred and 'performing' has in fact become are in fact feedback loops between the sensoritory quite an inadequate term. In general whenever this information and body gestures. Our brain receives term is used in live electronics, a larger amount of perceptions from our ears, eyes, hands, etc., composition, improvisation and instrument- processes these in a conscious and inconscious building is implied than in acoustical music way and creates intentions that are embodied in because of the reasons mentioned above. Therefor body gestures which adapt and change the sound creativity and autonomy- have become more [4]. This -almost simultaneous- cycle of outspoken in the live electronic music scene. perception, cognition, intention and movement is action-based and all the separate units are related 2.2 Content to the central act of performance. In an action- Taking into account the specific character of the based methodology the theoretical knowledge can musicianship and the instrument in live be integrated in listening and performance electronics, a basic content of a beginners course experiences and foster further progress in sound for live electronic music is presented. In short the imagination, experimentation and performance. following knowledge domains should be part of Music theory and technical knowledge thus are a this content: tool to develop performance -and in general 1. Digital Signal Processing techniques musical- skills. 2. Basic audio hardware The forementioned mapping techniques might 3. Mapping techniques seem to be boring or very theoretical in a music 4. History of electronic music course, but, if they are integrated in performance 5. Auditory training and instrument design tasks, they can be fun. One 6. Sound organisation in real time can hear whether the result of a mapping logic is 7. Performance training right or wrong and consequently recognize a Each of these seven domains is very extensive problem and try to solve it... (a method that math and could be the subject of a new course. In a or programming teachers would certainly find very beginners course of live electronics these subjects attractive). need to be treated only very basically and a The multimodal information, the lack of detailed selection within each domain has to be made. The scores and the modular nature of a DMI in live seperate implementation in music education of six electronics (see 2.1) require a high level of of these domains is not new and has been creative input from the children and a need to rely researched and applied in classrooms before[2][3]. on direct aural information (and not on a score). Introducing mapping techniques on a pre-college The teaching methodology in live electronics

76 should therefore be mainly auditory based and 3.2 The combination of user software and encourage personal autonomy and creativity. For programming language example, as there are no fixed and absolute rules At the moment there is a wide choice of open about the right coupling of user interface and DSP, source software for live electronic music (Pure it is very important that children have sufficient Data, ChucK, CSound, SuperCollider, etc.). All time and freedom to experiment with those these programs are in fact combinations of user techniques in order to learn more about the software and audio programming languages and different factors (available equipment, physical some pedagogical problems -especially on a pre- skills, artistic demands,...) on which this coupling college level- may arise because of this choice depends. These experiments also help to combination.3 hear to what extent the user interface and mapping First, a beginner can easily get lost in the define the audio result, even when the same DSP massive and confusing range of possibilities and techniques are used. lose his motivation to learn more about electronic music. To summarize, a live electronics course should Second, if a newbie wants to find information center on the immediate perception and (articles, websites, mailing lists, books,...) about performance of sound, use background theory and open source software for live electronics, it is auditory training to improve and develop this often quite technical and requires a lot of inside musical activity, promote the tools to take knowledge. For a beginner some texts or mailing advantage of the modular nature of a DMI and lists look like cryptograms. Although a lot of foster the creativity and autonomy of each pupil. patches and example files for beginners do exist in Any software to learn live electronics should fit in a program like Pure Data, the level is often too this general framework. high for pre-college teenagers, especially for the ones that have no background in electronic music 3 Open Source Music Software in live or software programming. electronics education Third, as I started teaching live electronics with Is open source music software suited to teach Pd to teenagers in a music school, I noticed that and learn live electronics? In the next section Pure there is another disadvantage to this combination: Data (Pd) is used as an example to answer this for a musician who wants to perform or compose it question. takes too long before he can be creative with the instrument design. He has to know too many basic 3.1 A continuous learning environment units and syntax rules before he can produce sound Anyone who wants to learn to play an and start combining them. instrument should regularly have access to his Of course the combination of user software and instrument. This is an almost self-evident truth in programming language has a great pedagogical instrument training. The simple but very powerful advantage, especially in a graphical environment argument in favor of open source music software like Pd where there is no compiler and no split for live electronics education is its accessibility, between the source code and the compiled low cost and ability to run on several operating program. This kind of transparancy helps a tutor to systems. These features enable schools and pupils deal with more theoretical knowledge within a to install and use this software at school and at performance context and corresponds very well to home. In this way pupils can have regular access the action-based methodology of live electronics to their digital musical instrument and can start courses. At the same time this transparancy also developing all the subtile automatisms and helps to abstract or transcend the software that is gestures that are required to perform music on an used in the classroom and to learn more about instrument. As in acoustical instrument training, electronic music and its performance in general. the main part can be done at home while the class By seeing the basic source code and learning more room only serves to guide this continuous process. about fundamental techniques, it becomes easier to In this way open source music software enforces the importance of performance in live electronics. 3It is no coincidence that most software packages for live electronic music are more or less programming languages. This reflects the fundamental changes in Digital Musical Instruments and in the performance of live electronic music described in 2.1

77 recognize similar procedures in other software for the normal help procedure in Pd -right-clicking an live electronic music. object-. Moreover there are more than 40 example Finally as programming languages these files that not only demonstrate the general forementioned open source music software application rules of Abunch objects but also the packages have all the basic tools for learning and internal structure of general audio techniques (FM applying the mapping techniques. Understanding synthesizer, loopstation device,...) and general and using the modular nature of a DMI becomes musical 'recipes'. The latter uses guidelines and feasible in a live electronics course using this kind tricks -for the musical application of specific of software. techniques- that have been developed in the last 60 years by composers and performers in electronic 4 Abunch music of different styles. Finally a 'Quick Start' tutorial was made about 4.1 Content Abunch and a mapping tutorial. This last tutorial Abunch4 is a collection of 60 high-level objects gives some simple examples on how to connect (so-called abstractions) in Pd and an additional set user interface and sound production using the of information files. The aim of the main part of basic operators explained in 2.2. the abstractions is to perform electronic music 4.2 Simplified procedures while the remaining abstractions and the info files In the first place the ease of use for beginners demonstrate, analyse or explain techniques and musical applications in live electronic music. was obtained by providing high-level objects as The abstractions provide a set of ready made building blocks but the installation and working objects to procedure was also simplified as much as possible. • record and play sound files (from hard The installation is straightforward and only disk and memory) requires 1. the Pd core version (called 'Vanilla') • manipulate and process sound (effects) which implies that no externals need to be installed and 2. the Abunch folder with files. • generate sounds (synthesizers) • prepare control data (sequencers with If you want to create a new object in Pd, you different graphical interface) have to know the name, use a '~' sign to • synchronize control data (clocks) discriminate audio from control objects and provide a set of arguments which refer to specific • analyse sound and control data parameters and functions of that object. In (oscilloscope, spectrum analyzer,...) • record control data to a score Abunch you can start creating new objects by • receive data from common interfaces typing the name (not caring about the '~' in the • algorithmically generate control data name) and an unique number as an argument. Thus only one type of argument is used (to enable a These Abunch objects use techniques like FM 6 synthesis, granular synthesis and random walk general preset system) . algorithms. The majority of the abstractions were made by the author while approximately one fourth is based on files by other authors.5 All Abunch objects share a common architecture and can easily be connected with each other (and with native Pd objects) to create all kinds of custom made live electronics. Thus in the first place this library is an active toolbox to experience and learn electronic music. The different procedures in Abunch (left) and To start off with performing, Abunch also Pure Data (right). In Abunch the argument (the contains a lot of information and documented number after the object name) refers to the preset patches. Every Abunch object has a help file that system. In Pure Data there are more options: explains its workings and that is accessed using '1000' refers to the frequency of the oscillator 'osc~' object and '1' to one audio hardware output 4 Abunch is available for download at of the digital-to-analog converter (dac~). www.hansroels.be/abunch.htm and is released under the Creative Commons GNU General Public Licence. 6Giving every Abunch object an unique argument 5Authors such as Miller Puckette, Frank Barknecht enables to store several instances of the same object in and Tristan Chambers the presets.

78 the students to test and evaluate the other objects Once a new object is created, one can start and their own built patches and thus help them to connecting objects. In Pd objects can receive take control of their own learning. numbers, audio signals, lists or all kinds of 4.4 Advanced features and integration within messages for special functions. In Abunch the Pure Data connection types were reduced to numbers (for control data), audio signals and 2 special Abunch was launched in 2008 and tested and connection types: a 'clock' signal to synchronize used with a large number of children and students. time related objects and a 'record' connection to As they got acquainted with the library and its way combine record and play objects into live of working I started noticing that for some of them recording units. Abunch was becoming too easy and restricted and a mode to combine user friendliness and more advanced features had to be found. One of the solutions was to hide the more advanced procedures and use the 'wireless' sends and receives in Pd. Almost all graphical user interface (GUI) objects (faders, knobs, switches,...) within Abunch objects can be easily controlled by other objects via the inlets, the usage of the GUI elements can thus be automatized. For ease the Three Abunch objects connected to each other, control values are normalized to a range of 0 to the control window of every object can be closed 127. This also facilitates the use of MIDI hardware or opened. to control Abunch objects. The restrictions of this easy MIDI-like procedure can be superseded by using the sends and receives system. Every Abunch object can print out a list of hidden send and receive names to which values within any range can be sent. Via this 'hidden' procedure more parameters can be controlled and adjusted. In a similar way more advanced features were added without changing the easy-to-use layout.7

The control window of the 'play-file' object. 4.3 Goal The main goal of Abunch is to provide a practical open source software tool that enables beginners to learn more about the musical possibilities in real time of a computer. Users can Normal procedure in Abunch: a sequencer listen to basic tools and techniques and actively object called 'timeline' controls the 'pan' fader learn more about these techniques. In this way within the 'panning' object by connecting it to the Abunch fosters an active teaching and learning right input. This input of the 'panning' object is method in which sound is the main focus and normalized to the range 0 - 127 just like the output theory can be integrated in the performance. from 'timeline'. Pupils can immediately start experimenting with computer sound by combining these objects and techniques with each other to create their own desired sound devices and thus their own set of knowledge. An experimental attitude and a critical, personal opinion and methodology is encouraged by the open-ended architecture of Abunch [5][6]. 7Extra features were also added because Abunch was The analysing objects in the Abunch library enable used by the author in his PhD-research.

79 • a neat and uniform structure within each object with information and comment in the source code • a style guide for other developers that want to make new Abunch (or similar) objects • an easy-to-use template for composition and improvisation algorithms Advanced procedure: the range of the values in 'timeline' can be adjusted in the 'extra_options' of 5 Conclusion this object and are sent to the 'send name' 1- The modular nature of a Digital Musical panrpos of the 'pan' fader in the 'panning' object. Instrument and thus mapping techniques are considered as central parts of any live electronics In a further stage pupils and students can start course in pre-college education. Open source using native Pd objects to add more possibilities to music software for live electronic music is well Abunch. One part of the example files adapted to teach these mapping techniques and demonstrates how these native Pd objects can be -because of its low cost and accessibility- ensures combined with Abunch objects. The procedures in the tutor and student regular access to their Abunch are made as similar as possible to Pd musical instrument which is a prerequisite for any procedures to aid the transition from Abunch to course of instrument training. Therefor it is Pd. In general Abunch uses a reduced number of possible to use open source music software like Pd procedures, thus the main challenge for the successfully to teach children to perform live transition is to learn new methods and electronic music. possibilities. Two specific changes were added to A solution for the massive amount of Abunch though (and these were mentioned before possibilities and the insufficient user friendliness in 4.2): the '~' is not used in the name of audio in an audio programming language like Pd is found objects and the argument of an object only refers in the development of a library of high level to the general preset system. objects like Abunch. This library is a balanced mixture of performance and theory-orientated 4.5 Future plans objects with simplified ready-to-use procedures Abunch is of course a work in progress with and more advanced hidden features. room for improvement, especially because until today it is a solo project and the pace of References development is mostly dictated by short-term [1]Miranda, Eduardo Reck, and Marcelo M. educational demands. Wanderley. New Digital Musical Instruments: A frequently heard criticism from pupils is the Control And Interaction Beyond the Keyboard. simple layout. Another problem is the large Pap/Com. A-R Editions, 2006. number of files that is needed to use abstractions and presets within a main file. There is no direct [2]Brown, Andrew. Computers in Music method to bundle all the files in one folder8 and Education: Amplifying Musicality. Routledge, only a workaround solution was found. This 2007. problem is pedagogically relevant because pupils [3]Landy, Leigh. The ElectroAcoustic Resource perform and pratice at home and often forget to Site (EARS). Journal of Music, Technology and copy a number of files when they return to the Education, no. 1 (November 2007): 69-81. classroom. [4]Leman, Marc. Embodied Music Cognition and Future versions of Abunch will hopefully Mediation Technology. 1st ed. MIT Press, 2007. include the following: [5]Holland, Simon. Artificial Intelligence in Music • more example files (about the musical Education: A Critical Review. In Readings in application of techniques) Music and Artificial Intelligence, Miranda, • more usage of the data structures in Pd to Eduardo Reck, ed. Routledge, 2000. . create a more attractive and diverse layout [6]Smith, Brian. Artificial Intelligence and Music 8There is no procedure or object in the core version of Education. In Readings in Music and Artificial Pd ('Vanilla') to know the current file name, to copy files and to create a new folder.

80 Intelligence, Miranda, Eduardo Reck, ed. Routledge, 2000.

81 82 Field Report: A pop production in Ambisonics

LAC 2010, Utrecht

Jörn NETTINGSMEIER Freelance audio engineer and qualified event technician Lortzingstr. 11 Essen, Germany, 45128 [email protected]

Abstract using a tetrahedral main microphone backed up with a standard close­miking setup. This paper describes an acoustic pop Since the recording was to serve both as production in mixed-order Ambisonics, using promotional material and merchandise, it was clear an ambient sound field recording augmented that an easily accessible and distributable stereo with spot microphones panned in third order. mixdown was the primary target. In addition, we After a brief introduction to the hard- and planned to create a 5.1 mix with an "on­stage" software toolchain, a number of miking and perspective as an extra feature for fans with the blending techniques will be discussed, geared necessary equipment, plus a native B­format towards the capturing (or faking of) natural release for the limited number of Ambisonics ambience and good imaging. I will then enthusiasts out there. describe some peculiarities of Ambisonic mixing and the struggle to make the resulting Equipment mix loud enough for commercial use while retaining a natural and pleasant sound stage The session was recorded on a Lenovo ThinkPad and as much dynamics as possible. X61 running a heavily customized openSUSE 11.1 with kernel 2.6.31.12­rt20 and current SVN heads Keywords of JACK (r3898), FFADO (r1794) and Ardour2 (r6635). Ambisonics, surround sound, pop music The audio data was written to an external USB production, mixing techniques 2.0 drive1 and backed up to a second harddisk every night. Introduction A Focusrite Saffire Pro 26 served as the In Feb 2010, German singer/songwriter Tom recording interface. Its eight microphone preamps Gavron scheduled a recording session featuring were complemented with another eight from an three different line­ups: a quartet featuring piano, RME Micstasy, and eight cheap Behringer ADA violin, cello and percussion, a duo with piano and 8000 channels for line signals and less important bassoon, and a jazz sextet. Overdubs were to be microphones. All units were connected via ADAT limited to the vocal tracks, to capture a natural and externally synced to the Saffire's wordclock group feel and allow for improvised interaction. out. With so many interesting acoustic instruments, it became clear that their spatial characteristics and 1 I found external disks to be more dependable than interaction with the room ambience had to be built-in notebook drives, since they have less tendency captured, rather than relying on panned mono to overheat, deliver better sustained write rates and sources as usual. generally run more quietly. As an additional bonus, they Although I had no idea of the recording venue can help circumvent shared-interrupt problems with the built-in controller and other important parts of the signal acoustics, I decided to try an Ambisonic approach, chain (as was necessary here since the SATA chip shares an IRQ with the cardbus controller).

83 A Core Sound TetraMic (see appendix) was used as the main microphone. On the software side, Ardour2 [Dav10] was used both for tracking and mixing.2

Recording Using a “main microphone” approach in pop music may seem strange, and it does have a few pitfalls. The soundstage is more or less determined during setup. While you can drag instruments away from their natural location with the spot mikes, you will risk incongruent directional cues if you overdo it. From early on, you have to get your Illustration 1: Quartet session. Interesting mikes, top client to cooperate and refrain from fancy panning to bottom, left to right: coincident pair of AKG CK91s suggestions in post­production. The prize will be a for drum set; Core Sound TetraMic as main beautiful, natural ambience. microphone, offset towards the strings; AKG CK91 violin spot; BPM CR­95 cardioid cello spot 1 Quartet session

The first two studio days were dedicated to the The entire set (which also included a number of violin/cello/percussion/piano line­up, and were splash and ride cymbals and a hi­hat) was covered recorded in a rather small room (about 4x7m) that with an overhead XY pair of AKG CK­91 was acoustically treated as a percussionist's cardioids at a height of about 2 metres, pointing rehearsal room, i.e. very dry but pleasant. almost straight down. With no separate control room available, the The electric piano (a Korg StageVintage) has recording gear (and the engineer) ended up in the very nice balanced outputs which were used to same room, which greatly eased the capture the direct signal. Its stereo jack outs were communication with the performers. The mike routed to two active RCF cabinets placed behind setup however was rather tedious, since several the pianist on the floor, tilted upwards like a test recordings of every instrument were required monitor wedge. to be able to judge the recorded sound accurately. This way, the electric instrument blended nicely with the room and the acoustic instruments, and The four musicians were arranged in a circle, the TetraMic could capture some meaningful facing each other, and the TetraMic was placed in directional information. the centre, slightly favoring the strings to ensure a For the violin, we chose another AKG CK­91 good natural balance. and placed it 20­30 cm above the upper part of the Spot microphones were used as follows (from fingerboard, pointing at the bridge. rear left to rear right): The cello was covered with a BPM CR­95 switchable large dual­diaphragm in cardioid On the pedal timpano, a trusty old AKG D­112 setting, 30cm away from an f­hole slightly below about 10cms away from the head, close to the rim. the bridge. It was shielded from the drums with The tom­tom was handled by a Sennheiser some jackets draped over a music stand (not shown e904. in photo), to reduce crosstalk from the snare drum. A cheap Sennheiser vocal mike that we had lying around was put on the after it was snare All musicians were facing each other, and the found that a Røde NT5 could not cope with the nulls of all microphones were kept pointing sound pressure up close, even with the comparably inwards as much as possible, to improve channel light touch of a classical drummer. separation.

2Wiring Ardour for Ambisonics is outside the scope For the entire duration of the session, all tracks of this paper. See [Net09] and [Net09-2] for a detailed were kept armed and recording, with short breaks explanation.

84 every hour or so, to save clean snapshots while the subtended an angle of around 120 degrees. The transport was stopped.3 resulting hole in the soundstage was reserved for Since the material to be recorded was new to the the vocal overdub. musicians, the arrangements evolved considerably 3 Sextet session during the session. It also meant that preciously few compatible parts were available for editing. The jazz sextet consisted of a standard jazz drum In the end, most of the recordings were entire kit, double bass, piano/keyboard, guitar, takes (usually between 13 and 20 per day), the last trumpet/flugelhorn and vocals. Again, the session few of which where usually selected for post­ was captured with a TetraMic in the centre. This production with minor edits to be made. Some solo time, the spots were used rock'n'roll fashion, i.e. parts were recorded as separate takes to reduce extremely close, to get some channel separation. stress and fatigue, but always "live", i.e. with the The drums were covered with three Beyer clip­ entire band. ons for snare and toms, an AKG D­112 for the A click was used to ensure a consistentent base kick, and two AKG CK­91 in XY configuration for tempo before each take, but the recordings overheads. themselves were done without. The double bass was recorded to two tracks: a The vocals were to be overdubbed on a separate DI signal from a piezo pickup and the BPM CR­95 date, allowing time for careful selection of the final in cardioid setting, positioned very close to an f­ material and preliminary editing. hole. 2 Piano/bassoon session Regrettably, no guitar amp was available for the session, so the guitarist plugged his ES­175 and a The room was 5 by 6 metres with a height of Telecaster into a Lexicon MPX­100 which was around 3 metres, and acoustically treated. driving a small active RCF P.A. cabinet miked We were lucky to be able to use a Steinway with a good ol' SM57. Needless to say, the attack grand piano, which was recorded with an ORTF was shoddy. pair of AKG CK­91s. The lid was propped up in The Steinway grand had to be kept closed so as low position, and the mikes were located to the not to upset the natural balance between the rear of the instrument, slightly above lid level, to instruments in the room. Hence, the mike (an avoid the boomy quality of the direct sound ORTF pair made of Røde NT5s) had to be placed coming through the opening. at the only available opening, about 30cms above Another CK­91 covered the bassoon. During the tuning pegs. The coverage of the instrument in warm­up, I learned that the sound of a bassoon the extreme registers was not too good, but for the travels along the entire length of the instrument, reduced jazzy playing style, which mostly featured depending on register: the low notes originate from the middle register, the compromise was the top of the bell (the upper part), whereas the acceptable. high notes emerge from the boot (the lower part), The trumpet and flügelhorn were captured with many positions in­between. To capture this with another CK­91 with 10dB pad, aimed well interesting quality, the spot mike was augmented above the bell. The player was instructed to lift the to an M/S pair with the BPM CR­95 in figure­of­ instrument slightly for emphasis, so that extra eight setting. brilliance would be captured whenever he felt Additionally, a Røde NT­5 was aimed at the bell necessary. from above, to have some additional wind noise The vocalist used a hand­held Beyer MCE 91 and more pronounced overtones for flexibility in for a live take of Sinatra classic "Come fly with the mix. me". All instruments and microphones were lined up The second piece to be recorded was an along a common axis, the microphones pointing instrumental rendition of "My funny valentine", for outward to reduce crosstalk. The main microphone which the vocals were to be overdubbed later. was placed well outside the center of the room at a As expected, snare and cymbals leaked into most height of about 2m, so that the instruments microphones. For a few dBs of crosstalk reduction 3 This precaution turned out to be unnecessary, since (which can make a world of difference in the mix), Ardour saved reliably even while transport was rolling. an empty gear case was used as a barrier between

85 piano mike and drums, and a couple of winter 5 Post-production coats hung over a mike stand served as a shield for the bass mike. 5.1 Editing 4 Vocal overdubs Edits were done in the recording configuration, using a standard stereo master bus, and monitored The vocal dubs were done two days after the through stereo speakers. Each edit was then cross­ main session, in a 4x5m living room with wooden checked with headphones. flooring and one wall deadened with a large As expected, the main microphone technique mattress. and associated cross­talk made convincing edits We used the BPM CR­95 in cardioid setting, very difficult, and the assembly process was time­ with a windscreen in front, positioned slightly consuming. After some experimenting, it was below mouth height to ensure a relaxed singing found that staggering the cuts of main mike and posture and avoid the tendency to lift the chin spots helped hiding otherwise questionable edits: while singing. minor problems with overhanging sounds and long About half a metre behind the main mike, the crossfades in the ambience would become TetraMic was running along for some optional acceptable if a clean note onset had been room ambience. established in a spot mike before. The singer was fed a mixture of the main mike Again, the “slide edit” and “lock edit” modes of and a virtual Blumlein array from the TetraMic. It Ardour were used heavily. “Slide” allows regions was pointing upwards, away from the direct sound to be dragged around in time, and is needed to as much as possible, to avoid coloration. I find that align an insert with the groove. When that has a good room signal helps reduce the listening been done, “lock” fixes all regions in time and fatigue induced by closed studio headphones. only permits the trimming of region boundaries – Since it feels natural even at low levels, it does not that way, the edit can be cleaned and made affect intonation as much as artificial reverb, inaudible, without the danger of messing with the which usually has to be turned way up to for a groove unintentionally (which happens rather comfortable listening experience. easily with Ardour when screen space is limited A stereo fold­down of the TetraMic recording of and track heights are small). the basic tracks was used as the primary monitor signal, complemented with some dry piano for 6 Mixing pitch reference, and additional direct signals as For mixdown, a new 16­channel summing bus required by the vocalist. was added for third­order Ambisonic mixdown, and the old "master" bus was deleted. Additional The overdub takes consisted of one four­channel two­, four­, and nine­channel monitoring busses 4 track for ambience and one mono track. They were created for monitoring in UHJ­encoded were recorded on top of one another during the stereo, first and second order Ambisonics. session. To sort the material, we created four new pairs 6.1 Using convolution reverb of tracks, to which the recorded takes were moved Spot mikes need some additional reverb to sound after listening: one for trashcan, one "maybe natural at the listening spot defined by the position usable", one "satisfactory" and one for material of the main microphone. Ideally, this reverb is an deemed very good. Ardour's "Lock edit" mode impulse response of the recording room recorded proved very helpful, as it eliminates time by the main mike, where the excitation speaker is alignment errors when dragging material between placed at the location of each spot microphone. For tracks. Good takes with questionable parts in them maximum fidelity, separate IRs should be captured were split to exclude the problematic section. for each instrument group (or even every microphone position).5 4In retrospect, it would have been better to record the vocal takes to a single five-channel track each, to avoid confusion in the editing stage. Such a compound track is 5Aliki [Adr09] is a good tool for the job. The easily split into manageable parts using Ardour's capturing of room responses is described in detail in the flexible buses. Aliki manual.

86 These IRs can then be combined into a Each spot mike was delayed to compensate for convolution matrix for an engine such as the offset to the main microphone, to avoid comb jconvolver6, so that it has N inputs, one for each of filtering effects in the combined signal. the IRs, and either two or four outputs, depending Sometimes, the actual delay used differed from the on whether the room response was recorded in measured value by several milliseconds, if a more stereo or first­order B­format. pleasant timbre could be obtained. Optionally, the early reflections and tail section 6.3 Equalisation of an IR can be separated. One tail can then be used globally (because the tail does not contain As shown in the photo, the recording situation significant directional cues and differences from was rather cramped, resulting in a pronounced bass one position to another are very subtle at best), and boost in the microphone due to proximity effect. only the short early reflection parts are treated When played back, the sonic impression was quite individually. This conserves CPU and allows for obtrusive, although technically correct. Some bass an extra degree of flexibility, namely the ratio of reduction in combination with gentle reverb was early reflections to reverb tail. employed to give the mix a more spacious feel and To plug such a beast into ardour, create an N­ make the instruments back away from the listener. channel bus with a corresponding N channel insert Signal crosstalk was quite bad, and steep high­ connected to the external jconvolver. Only the first pass filters had to be used on almost every few return channels will be used, and the rest can microphone to keep the timpani and bass drum out. be left unconnected. Similarly, only the four active The extreme close­miking of the strings outs of the N­channel bus will be connected to the produced a slightly harsh tone that proved difficult first four channels of the master bus which contain to correct without losing the “shimmer” of the bow the zeroth­ and first­order components. sound. In the end, reverb was more effective than Unfortunately, no adequate speaker for an IR filters. measurement of the recording rooms was During post­production, it was found that an EQ available, so some “foreign” IRs had to be used on setting that works for the UHJ­encoded signal will the spot mikes to blend them with the slightly also sound good in Ambisonics, but not vice­versa. wetter room signal. The full Ambisonic rendering is a lot more spacious and transparent, and can absorb more reverb. One must resist the urge to “fatten it up” 6.2 Source alignment too much, as this will result in a boomy and overly The natural sound stage as recorded by the wet UHJ stereo image. TetraMic was used as the basis for source Frequently, instruments which had been very positioning. With stereo in mind, the sound field pleasant when soloed sounded tinny or otherwise was rotated to provide a not­too­unconventional artifical in the mix. This is a common phenomenon balance when folded down, and to leave space for when many microphones are open, and there is no the singer in the front. Where possible, strings way around it other than to paint the spot mike were placed in the back, since they benefit from sound “with the large brush” (i.e. to over­ the slightly phasy, blurry quality which UHJ stereo exaggerate the desired characteristics a little), to encoding adds to rear sources. For the same close unused microphones wherever possible, and reasons, bass instruments should be placed in the to keep fiddling with the delays. In retrospect, front quadrant if possible. hypercardioid patterns wounld have helped to Spot mikes were brought up one by one and reduce crosstalk between adjacent instruments. aligned with the Tetramic sound stage by ear, 6.4 Building the mix moving them until the sources stopped “jumping” when switched. My usual approach is to start with the drum overheads and any room microphones, add spots one by one, assemble a basic track, and bring the vocals in at the end. Since the Ambi setup reacts 6[Adr10]; jconvolver is unique among the freely very differently than a standard stereo system, I available JACK convolvers in that it uses a variable partition size and can be configured to incur only one found that I had trouble finding room for the period of latency regardless of IR length. vocals this way and gave up after a few failed

87 attempts. Instead, I started with the vocals and a 7 Other production approaches little bit of piano, making sure the song would Recording more­or­less “live”, without click, is work as­is. The other instruments were then tucked nice but not always practical. Far more material “under” this basic mix. Afterwards, the room needs to be discarded for mistakes, there are less microphone was brought up for some “air”. options for repair and improvement, and the Additional reverb was added to each signal overall level of perfection that can be achieved is individually, and finally, fader automation was limited unless the musicians are of the very best. used to emphasize dynamics, clean up the mix and But you can easily use traditional single­ add some final polish. instrument overdubbing in an Ambisonic 6.5 Dynamics production. Naturral room ambience (if desired at In the final stages of the ambisonic mixdown, it all) can either be faked using B­Format impulse became apparent that commercial impact would responses as described earlier, or you can keep the require some sort of peak limiting and gentle main microphone running each time a musician overall compression. Regrettably, no free lays down a track. In theory, the result will be the multichannel­capable compression tools are same as if everybody had played in the room at the available at this point7 (and Ardour2 cannot make same time. In practice, you will also get a lot more use of a plugin's side chain port easily), so the hiss. But to blend one or two soloists into the mix, master was left unprocessed and the individual this approach is feasible. channels were treated instead. This is a serious drawback, since it entangles the Conclusion mixing and mastering stages. Keeping them Doing a pop production in Ambisonics is separate (and bringing in a fresh pair of ears) has definitely possible. One must constantly double­ its advantages: the mixing engineer does not have check the mix in both UHJ and Ambisonic to deal with real­world playback systems and can renderings, but that is a fair deal compared to the create an artistic mix under optimum hassle of an extra surround mixing session. With circumstances, and the mastering engineer can then the available free software tools, most recording deal with the necessary compromises to make it problems can be dealt with, and Ambisonic work on Joe Sixpack's car stereo. It takes some panning opens up new creative possibilities. Even effort to focus on mastering processing and not on plain stereo systems, the sound stage can be constantly question and revisit earlier mixing extended well outside the usual stereo triangle, decisions. without complicated manual phase trickery. On the up side, the achieved mix was a lot more A flexible multichannel compression tool with transparent than with sum compression, while the appropriate side­chaining would help get the job loudness was slightly lower than average. To done more quickly. However, with a periphonic compensate, some additional limiting was sound stage encompassing the entire sphere, sum performed on the UHJ­encoded stereo output. compression is even more questionable than for stereo (where likewise the current best practice The automatic 5.1 folddown was done with Fons backs away from global dynamic processing and Adriaensen's hand­optimized second­order ITU moves towards stem­based separation mastering). decoder that comes with AmbDec as a default It will be interesting to take this production preset. approach to other genres, such as electronic dance music (where a slim chance of native ambisonic playback might exist in some clubs).

Appendix: The TetraMic 7 JAMin [Har05] or the very promising Calf multiband An implementation of a microphone design compressor [Fol10] come to mind. As it stands, even devised by Gerzon, Craven et al. in the 1970s, the something as kludgy as a compile­time switch to allow TetraMic consists of four cardioid capsules for multichannel work with hardwired side­chaining would be highly welcome. arranged in the edges of a tetrahedron. Its native signal set (called A­format) can be converted into

88 the B­format used in Ambisonics by a simple References matrix operation: [Adr09] Fons Adriaensen, Aliki, W' = LFU + RFD + LBD + RBU http://www.kokkinizita.net/linuxaudio/downloads/ X' = LFU + RFD - LBD - RBU [Adr09­2] Fons Adriaensen, TetraProc Y' = LFU - RFD + LBD - RBU tetrahedral microphone processor, l.c. Z' = LFU - RFD - LBD + RBU [Adr10] Fons Adriaensen, jconvolver, l.c. (where L/R means left/right, F/B is front/back, [Dav10] Paul Davis et al., Ardour digital audio and U/D is up/down, to uniquely identify each workstation, http://ardour.org capsule) [Far06] Angelo Farina, A­format to B­format conversion, http://pcfarina.eng.unipr.it/Public/B­ The signals are primed to indicate that some EQ format/A2B­conversion/A2B.htm correction is still missing to compensate for the [Fol10] Krzysztof Foltman, Thor Harald slight positional error of the capsules (for a perfect Johansen et al, Calf audio plugin pack, multiband microphone, they should be precisely coincident). compressor contributed by Markus Schmidt, For an easy­to­understand discussion of A­to­B http://calf.sourceforge.net format conversion, see [Far06]. [Har05] Steve Harris et al., JAMin, the JACK Audio Mastering interface, On Linux systems, the conversion is handled by http://jamin.sourceforge.net TetraProc [Adr09­2], whose author will provide a [Net09] Jörn Nettingsmeier, Ardour and custom configuration file in cooperation with the Ambisonics, in: eContact! 11.3 (Journal of the microphone manufacturer. Canadian Electroacoustic Community), The B­format can then be used natively for http://cec.concordia.ca/econtact/11_3/nettingsmeie Ambisonic playback (either horizonal­only or full r_ambisonics.html 3D), or an arbitrary number of first­order [Net09­2] Jörn Nettingsmeier, Using microphone patterns can be derived from it. In Ambisonics as production format, in: Beta Ardour practice, one would create a coincident stereo pair, reference manual, or a set of five (hyper­)cardioids for Dolby http://ardour.stackingdwarves.net/ardour­en/8­ Surround. The big advantage is that orientation, ARDOUR/279­ARDOUR/280­ARDOUR.html opening angle(s) and polar characteristics can be selected during post­production, making it a very versatile main microphone.

In terms of localisation precision, the Tetramic is one of the best microphones of its design available today, owing to its small capsules which make the array nearly coincident to begin with, and the theoretically perfect digital post­matrix filtering. Its one great disadvantage is the low signal­to­ noise ratio (a consequence of the small, cheap capsules), which makes it less well suited to very soft music such as a single acoustic guitar at a distance. For the task at hand, however, it was ideal.

The Tetramic is available from Core Sound LLC, http://core­sound.com. Quieter (and more costly) variants of the design are offered by SoundField, http://soundfield.com.

89 90 5 years of using SuperCollider in real-time interactive performances and installations - retrospective analysis of Schwelle, Chronotopia and Semblance. Marije A.J. BAALMAN Design and Computation Arts Concordia University Montr´eal,Qu´ebec Canada, [email protected]

Abstract are also all situated in a collaborative context, Collaborative, interactive performances and instal- where there are several artists collaborating us- lations are a challenging coding environment. Su- ing different output media, as well as different perCollider is an especially flexible audio program- programming environments with which data is ming language suitable to use in this context; and in to be exchanged. As the development of these this paper I will reflect on 5 years of working with three projects has been sequential and taking this language in three professional projects, involv- place over the course of 5 years, certain ap- ing dance and interactive environments. I will dis- proaches and methods using SC3 have emerged, cuss the needs and context of each project, common as well as a re-evaluation of methods used in problems encountered and the solutions as I have implemented them for each project, as well as the earlier projects. In all of these projects, I have resulting tools that have been published online. encountered similar problems, however, as my proficiency at SC3 developed, I have discov- Keywords ered different solutions. In some cases because SuperCollider, interactive performance, sensing, I found that the new project asked for a dif- composition systems, live coding, tool/code devel- ferent approach, based on the specifics of the opment problems, or because I wasn’t content with the solution used previously. 1 Introduction While working on artistic projects there is al- Between 2005 and 2010 I have been involved ways a trade-off between developing “general- in three real-time interactive projects, in which purpose” tools that are robust and flexible in SuperCollider (SC3) was the central core to deal use, and quickly putting something together, with realtime sensor data, audio analysis, inter- that is usable and reliable for the project at action with other programs for data exchange hand, but may not translate well to other and show control, and sound, vibration and projects. This paper reviews the methods I light output. This paper gives an overview of have used over the years, and identifies the com- the techniques used within SC3 and provides ponents that could be, or have been, adapted a critical analysis of the problems encountered to more general purpose tools for use in future along the way and solutions provided. The work projects. on these three projects has resulted in a num- For those readers unfamiliar with SuperCol- ber of tools that have been made available to lider, I have added a section detailing some gen- the general public as open source software, and eral information regarding SC3 at the end of that aid other artists in the realisation of their this paper. Class names within SC3 will be put projects. in bold in the following text. The three projects are all collaborations with 1 2 Coding in the context of artist/researcher Christopher Salter and vari- interactive performance ous other artists. Two of the projects are dance performances, one of them is a one-person ex- Coding in a professional performance context periential installation. Furthermore all three has different demands than product oriented projects have an interactive or responsive com- coding, in the sense that while writing the code, ponent based on sensor inputs and dynamic the purpose of the code and its needed function- mapping of these inputs to output media. They ality is not yet known, but will emerge during the artistic process of discussions, experimen- 1http://www.chrissalter.com tation and rehearsals. This is especially true,

91 when the artistic project involves real-time sens- 3 The artistic projects ing, where it is not known beforehand what the 3.1 Schwelle input data will be, and how it will influence the output media, which are also being shaped in Schwelle is a theatrical performance that takes the process of creation. place between a solo dancer/actor (Michael Schumacher) and a “sensate room”. The ex- Within the rehearsal process for all of these erted force of the performer’s movement and projects it is important to have a flexible sys- changing ambient data such as light and sound tem which allows for on-the-fly manipulation of are captured by wireless sensors located on both audio synthesis processes as well as sensor data the body of a performer as well as within the mappings. Part of the preparation for the re- theater space. The continuously generated data hearsal process is to create systems that allow from both the performer and environment is for such flexibility, so that many different kinds then used to influence an adaptive audio scenog- of interactions can be explored. This is only raphy, a dynamically evolving sound design that possible if it is clear in advance what kind of creates the dramatic impression of a living, possibilities there are, i.e. what kind of data breathing room for a spectator. We aimed at is to be expected from the sensors, the type of creating an auditory environment whose sonic audio processes that will be used (its composi- behaviour is determined continuously over dif- tional structure, as well as its sonic quality), and ferent time scales, depending on the current in- what kind of interactions the collaborators in put, past input and the internal state of the sys- the project are interested in. Extensive discus- tem generated by performer and environment in sions about this with the other collaborators, as partnership with one another. well as short exploratory sessions with the per- The data from the sensors is statistically an- formers, and a basic understanding of some of alyzed so the system reacts to changes in the the movement material of the dancers (so that environment, rather than absolute values. The you can e.g. wear an accelerometer and pro- statistical data is then scaled dynamically, be- duce some data yourself while writing and test- fore being fed into a dynamical system inspired ing code) are essential components in this pro- from J.F. Herbart’s theory on the strength of cess. Having some skill at livecoding to quickly ideas [Herbart, 1969]. The dynamic scaling en- develop new interactive processes is also vital sures that when there is little change in the sen- for a succesfull rehearsal process. sor data, the system is more sensitive to it. The For the eventual showtime in the theater or at output of the Herbart systems is mapped to the an exhibition, it is important to have a robust 2 density, as well as to the amplitude of various “show control” system from which the show sounds that comprise a “room” compositional can be run, while at the same time being flexi- structure of more than 16 different layers. An ble to adapt to differences in setup (e.g. audio overview of the data flow is given in figure 1. balance/mix), based on the venue in which the The mapping, which is indicated between the performance takes place. Ideally, you should be dynamic scaling and the Herbart Groups, is a able to adapt “cues” during the show, should matrix which determines the degree to which there be the need. Backup solutions, in case each sensor influences which sound [Baalman et sensing infrastructure breaks down, can also be al., 2007]. Additionally, there is a “state” sys- useful (even just as a reassurance). tem, defining different parameter spaces within In the case of installations, the code (and the which the soundscape can move. The system machine it runs on) may need to be prepared moves between these states, depending on long to be started and stopped by gallery personnel term development of the input data. The the- who have no knowledge at all about coding, and atrical light also has distinct behaviours based in some cases even of computer environments. on the state of the room. The information In the ideal case the machine running the code about the current state is transferred to the can boot up and start the code automatically, computer controlling the lights via OpenSound- so the computer only needs to be turned on. Control (OSC)3 [Wright et al., 2003]. The diagram also shows that there is a sec- 2In theater/performance the collective control for all ond data flow path, which constitutes a more events supporting the performer’s action on stage (i.e. classical, instrumental approach of using the sonic, light, mechatronics, video) happening on stage is usually referred to as “show control”. 3http://www.opensoundcontrol.org

92 Sensors Acceleration Light

statistics

dynamic scaling mapping

Sound mapping Light Instruments Herbart Herbart Group Group State system Figure 2: A snapshot of the view on both the Density Amplitude| Sound stage and my computer screen visualising the light matrix.

Figure 1: Data flow diagram an XBee wireless chip. For the control of the lights, I used Synths on sensor data. The mapping between a move- the server sending their output to control rate ment created within the room by the performer busses, which I then polled at a regular interval or objects in the room and a resulting light- in order to send it over a serial protocol to the ing or sonic event is direct, and recognizable as XBee network from within the langauge. This an action-reaction interaction. This interaction approach allowed me to make use of the various has been carefully tuned to certain dramatic se- envelope curves that are available within Syn- quences within the piece, and is easily switched thDefs, and use the extensive Pattern library on and off, depending on which scene is taking for sequencing of these Synths. In the proto- place. In some scenes I change the strength of typing phase for this project, I extensively used the interaction at suitable points in the dance, scgraph4, which allowed me to model the light thus creating a duet with the dancer. matrix and see its behaviour on screen. Dur- Apart from the system mentioned above, ing the setup and performances in the theater, there was also a backup system in case the wire- it allowed me to monitor the behaviours of the less transmission of data from the dancer would system and compare it to the actual output on break down, so in case of need I could mimic the the stage (see figure 2). data using the joysticks of a gamepad device. Additionally, I used camera based motion The sound was spatialised across the perfor- tracking data (from a camera looking down at mance space using several methods of amplitude the stage), to map to the light matrix, as well as panning between speakers. Additionally, there beat and pitch tracking data extracted in real- was an elaborate system for submixing the dif- time from the soundtrack. We also exchanged ferent audio layers, before sending them to the data between the light control and the inter- outputs. In this submix system there were con- active video, both for synchronisation of cues trols, both for direct setting of volumes, and for with the soundtrack (using frametime of the dynamic volume control mapped to the dynam- playback), and for connecting the intensity of ical system output data. the lights to the video image (maximum output value of all the lights was used to control the 3.2 Chronotopia brightness of the video image in specific scenes). Chronotopia is a dance performance by the Bangalore (India) based Attakkalari Centre for 3.3 Semblance Movement, in collaboration with visual artist JND/Semblance is an interactive installation Chris Ziegler; the music score is composed and that explores the phenomenon of cross modal performed by Matthias Duplessy. For this per- perception — the ways in which one sense im- formance we created a responsive light installa- pression affects our perception of another sense. tion: a 6 by 6 matrix of cold cathode fluores- The installation comprises a modular, portable cent lights (CCFL), and 3 handheld lights. We environment, which is outfitted with devices controlled these lights with wireless technology, using a combination of an Arduino board and 4http://scgraph.sourceforge.net

93 that produce subtle levels of tactile, auditory movements there are a couple of different parts, and visual feedback for the visitors, including with varying output combinations, and various a floor of vibrotactile actuators that partici- mappings to the sensor data. pants lie on, peripheral levels of light and au- dio sources, which generate frequencies on the 4 Common techniques thresholds of seeing, hearing and (tactile) feel- 4.1 Collecting sensor data ing. The collection and processing of sensor data is In this installation we use wireless sensing an essential part of working on interactive per- devices to gather data from floor pressure sen- formances. The first step is interfacing with sors. The loudspeaker setup consists of 12 spe- the hardware that actually does the collection cial speakers designed to enhance home the- of data. In Schwelle we used Create USB in- ater sound setups with tactile vibrations. These terfaces5, which show up as HID devices to speakers are laid out in a grid of 2 by 6 under- the operating systems. In Chronotopia mo- neath a platform, or bed, on which the visitor tion tracking data is used, which is received lies down. With the pressure sensors we can de- from another program (see 5.4) through OSC. tect micro-movements of the body. The light In JND/Semblance we use wireless§ XBee based appearing above the visitor is controlled with sensors and the data communication takes place DMX (see 5.5) from Max/MSP. OSC commu- over a serial port6. nication is§ used to send the desired cues to For Schwelle I made an abstraction between a Max/MSP. class named SchwelleSensor and a class inter- The synthesis of vibrational output was a dif- facing with the actual HID device (two classes, ficult process. While sound synthesis methods one for Linux, one for OSX). I then had al- can be used, it is quite different to find vibra- ternate versions of the SchwelleSensor class tions that work — artistically — on a tactile which were using backends like the WiiMote level. While with sound you can sit behind and a “mix” of several other sensors. In later the computer, and code synthesis processes and projects this abstraction was generalized, in- tweak them while listening to them, to lie down stead using a general purpose data framework on a vibrational floor and code at the same time (the SenseWorld DataNetwork) into which any is unpractical. Furthermore, it is a medium kind of device can input data and further use of with which neither of us had any previous expe- the data is agnostic of the way in which the data rience to draw upon. It was also hard to create was initially gathered [Baalman et al., 2009]. vibrations without an acoustical counterpart so 4.2 Processing sensor data that we had to find vibrations that were also sonically interesting. In the class SchwelleSensor, I also stored data calculated from statistics of the data, which was The sensor data was analysed statistically in performed in the class SensorData. These cal- realtime so that changes in pressure, rather than culations all took place in sclang. In the later the absolute pressure, was used to map to the projects I moved the statistical processing to synthesis processes. Since there were 24 areas scsynth, taking advantage of the efficiency of of sensing, in a 6 by 4 grid, and 12 speaker out- the DSP algorithms implemented in the UGens. puts, some combining of sensor data was done The resulting, “derived” or “cooked” data was to map local movements of the body to local made available on the DataNetwork — a cen- vibrations. In certain parts, we used a sum of tral hub for all the control data. This approach all the changes in pressure to map to an over- makes it flexible to switch between using the di- all amplitude of the vibration. Furthermore, in rect data and a derived version by simply chang- one part we did amplitude tracking on the vi- ing the data source. bration output, and used that to determine the maximum level of brightness of the light. 4.3 Mapping sensor data For spatialisation we employed various meth- Mapping of the sensor data involves not only ods of panning (in one or more directions), or remapping the value ranges between the input outputting to one or more speakers at the same data and output parameters, but also the merg- time, with either the signals in phase to all ing of data streams, extracting features from speakers, or with a randomized phase. 5 http://www.create.ucsb.edu/~dano/CUI/ and The composition is made up of three move- http://overtone-labs.com ments and lasts about 13 minutes. Within each 6See http://sensestage.hexagram.ca

94 data streams, and creating dynamical processes 4.5 Managing synthesis processes which develop compelling behaviours based on SC3 has two distinct methods for working with realtime sensor inputs. Synths. One is instantiating a Synth directly Schwelle was by far the most complex system on the server and then changing parameters of interaction as described above and shown in of a synth either manually or automated in a figure 1. The interactions between the different task or routine. The other method is using the stages in the dataflow path is organised with the Pattern infrastructure, which provides many class SchwelleSensorSystem, which reads the higher-level mechanisms for creating sequences input data, calculates the dynamical scaling (in in time. DynamicScaleSystem) and maps it to input In Chronotopia I mostly employed the Pat- data for the dynamical system, implemented in tern infrastructure, to create spatial sequences the class SchwelleHerbart. All the processing across the light matrix. Only in a few in- of the data is taking place inside sclang, and stances I found it more convenient to instantiate in custom classes with a lot of interconnection Synths, which would then be mapped to con- between these classes. trol busses with data from the motion tracking. In Chronotopia and JND/Semblance the data For Schwelle I built up an infrastructure to processing is centered around data process- deal with common methods to handle Synths ing units that are integrated with the Sense- and their parameters. The class SchwelleIn- World DataNetwork framework. This approach strument handles common methods for start- makes the creation and alteration of dataflows ing and stopping Synths, including fading in much more flexible than the approach taken in and out, and providing a submix for the given Schwelle. However, some of the data process- instrument for individual volume control; var- ing algorithms used in Schwelle still have to be ious subclasses then implement different vari- ported as units to the general framework. ants of instruments, depending on the use of For JND/Semblance, I started developing a samples (Buffers), audio input from a micro- framework for creating presets, combining set phone, mapping of controls to sensor data, as parameters for use with specific Synths as well “clouds” of Synths dependent on input well as mapping of parameters to specific data data. Each instrument has a default GUI to see streams taken from the DataNetwork. These and manipulate the status of the instrument. presets can then be stored to disk and recalled For JND/Semblance I developed the preset in future sessions, and instantiated and tweaked system mentioned above. In addition to this in realtime, both through a graphical interface preset system, I developed a central engine, the and through code. JNDEngine which manages all JNDSynths and their connections to the DataNetwork. 4.4 Data exchange with other programs JNDSynth provides control over settings and mapping of the synthesis processes to data from In each of the three projects, one of the col- the DataNetwork. JNDEngine also has a laborators was using the software Max/MSP to graphical interface, with volume controls for all control theatrical lighting. In order to exchange running synths, and buttons to open GUIs for data we had to set up OSC-communication pro- editing individual JNDSynths. tocols to exchange data. While in Schwelle this The approach in JNDSynth is more gen- was done on an ad-hoc basis, defining a OSC eral in the sense that it doesn’t require many address pattern each time we needed to ex- subclasses for special cases of synths, but the change some data, for the later projects we de- submixing approach of SchwelleInstrument veloped a general purpose framework, namely is absent and it would not yet be able to handle an OSC-interface to the previously mentioned the clouds used in Schwelle. SenseWorld DataNetwork. The framework pro- In future work, I anticipate merging the two vides for robust methods to allow for quick re- approaches into a common class. connection upon restarting the code or patch. The DataNetwork provides a very quick way of 4.6 Spatialisation methods sharing any data that may be needed by more All of the works involved spatialisation of some than one collaborator, and it is used for shar- kind. In Schwelle there was a setup with ing show control data (timing, cues), as well as two “rings” of speakers, four speakers around sensor data and output data. the performance area, and then four (or more)

95 speakers around the audience. Additionally, certain cases they would prompt the performer. there were two speakers mounted at the ceil- Since there is a fair amount of improvisation in ing, one pointed downwards, and one towards the piece, there was no absolute time at which the ceiling, so that the audience would only these cues needed to be executed, although we get reflected sound from that speaker. Sounds had defined relative times between events in cer- were then either routed to specific speakers, or tain occasions (for the latter, I created a sim- panned dynamically between the two rings of ple ShowTimer which showed me a window speakers. In the class SchwelleSurround I of how much time had elapsed since a certain created various standard methods for the sound moment). In addition, some cues needed spe- spatialisation, which could then be applied to cific code to be executed shortly beforehand to parts of the soundscape at specific moments prepare processes (allocation of resources), and during the performance. Thus I was routing some others needed code to clean up afterwards the output from one or more Synths through (freeing resources). While I started writing a another Synth, which did the spatialisation. general purpose class for this, I did not end up In Chronotopia I was working with a matrix using this, being unsure about its robustness in of outputs, so I needed to pan between these showtime (not having tested it extensively dur- outputs, without wrapping around to the other ing rehearsals). Rather I ended up with a file side. As SC3 at the time of creation of Chrono- with code snippets organised according to the topia only had a panner that wraps around timeline of the show, with numerous comments (PanAz), I doubled the amount of output chan- as to when to execute which code. This also nels used within the panner, and then only sent allowed me to make quick changes “on the fly” the channels I actually needed to the output during the performance. of the Synth. But, I also suggested to the On the other hand in Chronotopia, the timing SC3 -community that there should be another of the piece is strictly tied to the music score panner UGen which can pan between multiple and there is no improvisational aspect in the channels, but not wrap around. This led to the piece. Here, I ended up creating a CueList development of the UGen PanX. In order to class, which executes functions at specific given direct the signals to the right outputs, I either frametimes. The cue list is stored in a code file, made synths outputting to a specific channel in where functions can be changed, or alternatively the matrix (using a LightGrid class to select you can use the class instance methods to add the channel numbers from the one-dimensional functions at specific times. The current time array), or I used SynthDefs which embedded was then updated according to the playback of the PanX UGen. the sound file with the music score. In JND/Semblance I was working again with For JND/Semblance, I used three tasks a matrix of outputs, so PanX was used ex- (Tdefs) for the three movements of the piece tensively. Rather than routing Synth out- and used a master task, starting these three puts to various spatialisation Synths, I devel- tasks at the right time. While this allowed us to oped a system for dynamic creation of Syn- try out each movement separately, while prepar- thDefs, where you can define a signal function, ing the piece, this approach was not yet com- which is stored in a library (JNDSignalLib), pletely satisfactory in its use, mainly because I and then create a JNDSynthDef with specific had to recalculate the total durations of each types of spatialised outputs. Furthermore all movement (for proper execution of the master the JNDSynthDefs are stored in a separate task) everytime we changed the timing of the SynthDescLib, which can be browsed with a piece during the preparations. graphical interface to test each SynthDef. During rehearsals of the two theatrical pieces that we often had to go back and forth between 4.7 Show control specific scenes; this is actually the main chal- In all projects some kind of show control was lenge for coming up with a general purpose cue needed. All works have distinct scenes or move- system, since skipping back and forth means ments in which certain things need to happen taking care of the proper allocation and freeing at certain times (usually referred to as cues in of resources, depending on where we are in the live theater lingua franca). show. Also certain cues may have set events in In Schwelle the cues were quite closely tied motion, which run during a number of scenes, so to the performer’s movements on stage, and in there has to be a check which events should be

96 turned on at a specific time. Finally, of course, 5.2 GeneralHID during rehearsals the shape of the piece may change, so a quick editing of cues should be pos- Moving back and forth between running code sible too. on Linux and OSX developed the need for a general approach for interfacing with HID de- 4.8 Summary vices. Working from various parallel, platform specific implementations used in Schwelle, and From the above we can see that the different also some previous projects, I developed the ab- projects have led to the exploration of multiple straction GeneralHID, which provides a com- ways of working on similar problems. The expe- mon interface to both HIDDeviceService (the riences made with capturing, manipulating and OSX HID implementation in SC3 ), and LID sharing sensor data has resulted in a consolida- (Linux input device). The GeneralHID ab- tion into the SenseWorld DataNetwork. straction is part of the standard distribution of The work done in JND/Semblance is mov- SC3 since May 2007. ing towards a complete composition system that makes it easy to create signals with different 5.3 WiiOSC and SC3 WII spatialised outputs, and creating presets for dif- implementation ferent instruments and playing and managing the running synths. This system integrates with The use of the WiiMote in Schwelle has resulted the DataNetwork. On the other hand there in the publicly availabe Linux based program are still some approaches deployed in Schwelle, wiiosc9, which captures the data from a Wi- which would be useful to integrate with the JND iMote and sends it to a desired client using the system to create a complete system. OSC-protocol using liblo10. It has also resulted And finally, the different approaches for show in a native implementation in the SC3 language control could still be consolidated into a gener- for direct access to the WiiMote, which has been alised system that integrates with the composi- part of the distribution since June 2007. tion system and the DataNetwork. 5.4 MotionTrackOSC 5 Software tools made public To use videotracking natively on Linux, I cre- My artistic work has resulted in several exten- ated MotionTrackOSC11, based on the OpenCV 7 sions to SuperCollider (available as “Quarks” ), library12 and liblo, a simple adaptation of the additions to the standard capabilities of SC3, as motion tracking example of the OpenCV li- well as standalone programs (supplied with SC3 brary, expanded with OSC control of parame- classes to interact with them). In this section ters, and sending the data to a specific client. I will briefly discuss the various tools that have Furthermore, this program is integrated with been released and are publicly available. SC3, through some classes implementing the OSC-communication. As the output of Motion- 5.1 SenseWorld and SenseWorld TrackOSC is “raw”, namely there is no consis- DataNetwork tent numbering of the motion tracking points, I The SenseWorld Quark is a collection of classes implemented an algorithm to keep track of the used for dealing with sensor data at a higher positions of previously detected moving points level, and some convenience methods. It con- and matching these to the new ones, so that tains some language side methods to calculate the identifiers are consistent. Additionally, I im- statistics of incoming data streams, as well as a plemented some algorithms to filter out “short- number of PseudoUGens8 to do the same. lived” tracking points, which can occur when The SenseWorld DataNetwork is a set of the light conditions of the tracked area change classes dealing with sharing data between col- — this typically happens a lot in a theatrical laborators using various programming environ- context. ments. This framework is discussed at length in [Baalman et al., 2009]. 9available at http://www.nescivi.nl since August 2007. 7http://quarks.sourceforge.net 10http://liblo.sourceforge.net 8PseudoUGens are implemented as small code blocks 11available at http://www.nescivi.nl since January in sclang consisting of other UGens, which can be used 2009. just like any other UGens in a SynthDef. 12http://opencv.willowgarage.com/

97 5.5 DMX coding sessions. Thanks to Josh Parmenter for DMX is “the MIDI of theater lighting control”, implementing the PanX UGen. a serial protocol consisting of 8bit control val- SuperCollider in brief ues for up to 512 light channels within one DMX “universe”. Most common theater light SuperCollider consists of two components: an devices can be controlled via DMX, such as dim- audio programming language, called sclang, and mer packs, stroboscopes and motorized lights a audio synthesis engine, called scsynth; these with many individual control channels. Al- two components communicate with each other though there exists a dmx4linux-project13, that via a set of OSC messages. sclang is an object project has a very lowlevel approach and there oriented audio programming language with dy- are hardly any programs that integrate with in- namic type casting and garbage collection. As teractive software. As a result of the projects a reference for some SC3 nomenclature I have discussed in this paper, a DMX extension has been using throughout this paper: been made for SC3, which can currently com- UGen unit generator, or its representation in municate with the EntTec DMX USB Pro14. sclang. This extension will eliminate the dependency on SynthDef “blueprint” for a Synth, like an “in- Max/MSP for DMX control and simplify some strument”, consisting of a set of intercon- of the setups in the future. nected UGens. 5.6 PanX Synth a running synthesis node on scsynth, created from a SynthDef; like a “voice”. The need to spatialise sound on a grid of out- Quark “packaged” set of sclang classes to ex- puts (speakers) rather than a circle led to the tend the default class library of SC3. development of the PanX UGen, which allows for panning similar to the PanAz UGen, but SuperCollider can be found at http:// does not wrap around. This UGen was imple- supercollider.sourceforge.net. mented by Josh Parmenter and is available from References the sc3-plugins project15. Marije A.J. Baalman, Daniel Moody-Grigsby, 6 Conclusions and Christopher L. Salter. 2007. Schwelle: Sensor augmented, adaptive sound design for Interactive live performance is a challenging and live theater performance. In Proceedings of exciting context for coding, and SuperCollider NIME 2007 New Interfaces for Musical Ex- is certainly a suitable choice of language for this pression, New York, NY, USA. purpose. Creating tools for solving problems as they are encountered (or invented) may lead to Marije A.J. Baalman, Harry C. Smoak, ad-hoc solutions for one performance, but result Christopher L. Salter, Joseph Malloch, and in more solid tools in subsequent works as prob- Marcelo Wanderley. 2009. Sharing data in lems reoccur. While most tools were written collaborative, interactive performances: the based on an immediate need, the publication of SenseWorld DataNetwork. In Proceedings of these tools has helped and hopefully will help NIME 2009 New Interfaces for Musical Ex- many other artists working in similar areas. pression, Pittsburgh, PA, USA. This retrospective analysis of my coding Johann Friedrich Herbart, 1969. Kleinere Ab- strategies in these three projects will hopefully handlungen, chapter “De Attentionis Men- give other artists and researchers some insight sura causisque primariis” (orig. published in the creative process of working with code in 1822). E.J. Bonset, Amsterdam. artistic projects, and the specific challenges in this context. M. Wright, A. Freed, and A. Momeni. 2003. OpenSoundControl: State of the art 2003. In 7 Acknowledgements 2003 International Conference on New Inter- Thanks to Chris Salter and Harry Smoak for the faces for Musical Expression, McGill Univer- many years of collaboration; also to Alberto de sity, Montreal, Canada 22-24 May 2003, Pro- Campo for a number of pleasant and insightful ceedings, pages 153–160.

13http://llg.cubic.org/dmx4linux/ 14http://www.enttec.com/ 15http://sc3-plugins.sourceforge.net

98 Applications of Blocked Signal Processing (BSP) in Pd

Frank Barknecht Cologne, Germany, [email protected]

Abstract Music V and Cmusic are block-oriented. The Sample processing in Pure Data generally is block- Csound language is also block-oriented, since it based, while control or message data are computed updates synthesis parameters at a control rate one by one. Block computation in Pd can be set by users.”2 suspended or blocked to save CPU cycles. Such The introduction of different rates for audio “blocked signals” can be used as an optimization and control streams makes specific optimiza- technique for computation of control data. This pa- tions for each domain possible. For the spo- per explores possible applications for this “Blocked radic events of control streams, that only hap- Signal Processing” (BSP) technique and presents a system for physical modelling and for feature extrac- pen rarely compared to the calculation of audio tion as examples. samples, redundant or unnecessary calculations can be omitted. Keywords Audio data however can be computed in Pure Data, parallel computation, algorithmic com- blocks of samples. Instead of computing every position, optimization, physical modelling, feature sample and then the next, several samples are extraction computed in one go, which significantly reduces the overhead in systems based on “unit genera- 1 Introduction tors”: “This is done to increase the efficiency of Software for computer music and realtime syn- individual audio operations (such as Csound’s thesis has to deal with two competing require- unit generators and Max/MSP and Pd’s tilde ments: It continuously has to compute audio objects). Each unit generator or tilde object samples at a rate specified by the underlying incurs overhead each time it is called, equal to samplerate (like 44.1 kHz as “CD quality”) and perhaps twenty times the cost of computing one it has to deal with sporadic events, for instance sample on average. If the block size is one, this note events coming from midi streams. In many means an overhead of 2,000%; if it is sixty-four computer music system, the events of such con- (as in Pd by default), the overhead is only some 3 trol streams are computed less often than the 30%.” actual audio samples: In addition to the sample Apart from avoiding the overhead of function- rate as a defining parameter of a music software calls for each unit generator, blocked process- a slower control rate was added: “The control ing also can be implemented in a way that re- rate is the speed at which significant changes duces the number of memory allocations neces- in the sound synthesis process occur. For ex- sary. For this, the block size has to be constant ample, the control rate in the simplest program over a sufficient time. would be the rate at which the notes are played. But block computation also has a major dis- [. . . ] The idea of a control rate is possible be- advantage: It adds latency of at least one block cause many parameters of a sound are ’slowly duration. “Sample-oriented compilers are more varying’.”1 flexible, since every aspect of the computation The separation of control and sample rate has can change for any sample”.4 been a part of computer music software since its Contrary to audio data control streams usu- early days: “Among the languages of the Mu- ally are not computed in blocks. As most con- sic N family, Music IV and its derivatives (in- cluding Music 4C) are sample-oriented, whereas 2[Roads, 1996, p. 801-802] 3[Puckette, 2007, p. 63] 1[Dodge and Jerse, 1985, p. 70] 4[Roads, 1996, p. 801]

99 trol events happen only sporadically there may nal Processing” or BSP to have a catchy and not be enough data to fill a block that would short buzzword available. be useful to compute. For instance midi notes may be produced only every quarter beat which 2 Blocked Signal Processing at a tempo of 120 BPM would amount to only A very simple example will now show the basic one note event every 500 milliseconds. Filling a principle of BSP in Pd, how it can optimize cer- block of 60 quarter notes would then take half tain actions and how it compares to traditional a minute - no sane musician would accept a la- message computations. The task solved by the tency of that duration. following two Pd code examples is simple: Read While this is an extreme and admittedly a 4096 numbers stored in a table “ORIG”, add 0.5 bit silly example, the number of control events to it and store the result in table “RESULT”. often is small enough to not let the overhead of calling the unit generators for every event affect the overall performance of the system. inlet But this is only valid, as long as the number b of events to compute stays small. Especially in algorithmic composition, in simulations or sim- 4096 ilar cases the amount of data in a “score” can t a b become very big. The overhead of control infor- mation now becomes a problem. Computation until 0 in blocks may yield a significant performance gain. The results of the control computations f 0 + 1 counts from 0 to 4096 may still only be needed infrequently compared to the rate of audio signals. Some modern lan- guages like LuaAV or ChucK are designed to t f f deal with this problem right from the start. In ChucK, “the timing mechanism allows for the control rate to be fully throttled by the pro- tabread ORIG grammer - audio rates, control rates, and high- level musical timing are unified under the same + 0.5 timing mechanism.”5 Pure Data was not designed with variable tabwrite RESULT control rates in mind, but a peculiar feature of Pd can be exploited to do block computations on control data, confessedly in a limited way. Pd can suspend its own audio computations lo- cally using the [switch ] object. This object Figure 1: Transposing a table by 0.5 using mes- ∼ can stop all sample computations inside of a Pd sage computations subpatch or canvas and restart it on demand. A common use is to activate parts of a Pd patch Implemented with message objects, a counter only when needed, for example to manage voices is started by an incoming bang-message, its in- in a polyphonic synthesizer: inactive voices can dex number is used to read out the table data, a be switched off when not in use to save CPU control-rate addition object adds 0.5 to it, then cycles. the value is stored. A not so common use case for switched-off The BSP implementation uses a pair of [tabreceive ] and [tabsend ] objects to con- subpatches is introduced and explored in this ∼ ∼ paper. We use subpatches to perform block- stantly read and write the tables as a block of computations at rates much lower than the au- samples. The signal-rate addition object adds 0.5 in between. The [switch 4096 1 1] object dio samplerate. These computations will em- ∼ ploy the unit generators originally intended to resizes the blocksize to 4096 for this sub-canvas, do audio signal computations. The computa- so that the table-accessing objects can process tion rate is adjusted by suspending blocks lo- that many samples. Additionally it switches off cally. We will call this approach “Blocked Sig- the DSP computation inside the subpatch at the beginning. So unless some messages to [switch ] ∼ 5[Wang and Cook, 2004] switch on the computation, this part of the

100 A particle simulation applies the Newtonian inlet laws of mechanics to a simulation of point b masses. The physically-inspired rules are used to calculate velocities and accelerations of point switch~ 4096 1 1 particles, that are defined by vectors describ- ing their positions and impulses. Every particle tabreceive~ ORIG also includes a force accumulator that holds any external forces applied to a particles. Forces, +~ 0.5 positions and impulses fully specify the current

tabsend~ RESULT state of a particle system. Transitions from one state to a next are calculated in discrete steps: Usually a world clock is employed that advances one simulation step and initiates a new run of Figure 2: Transposing a table by 0.5 using BSP the physics calculations to find the new posi- computations tions of the particles. As the same set of phys- ical rules has to be applied to many particles, patch doesn’t consume any CPU resources. By the problem is an ideal candidate for doing the sending a “bang”-message to [switch ], the sub- calculations in blocks of particles. ∼ 7 8 patch is switched on for exactly one block of With PMPD and MSD , two implementa- samples, then it’s switched off again. During tions for this already exist as extensions to the time it is on, the actual computation hap- Pd (so called externals). For this paper a pens, so the end result is the same as for the BSP-implementation of a particle system called message version. “[physigs]” was written in Pd and is tested One small difference is important to note against the MSD and PMPD implementations. here: Probably for performance reasons Pd de- It’s help file is shown in Fig. 7 at the end of this fers updates of the graphics in arrays or tables paper. that are used with active [tabsend ] objects. [physigs] is a particle simulation in two di- ∼ Changes to the included data are only visible mensions. In consists of a main Pd abstraction when the graph is closed and opened again. called [physigs] that can be called with a prefix- The two implementations of the “transposer” tag to make using the object several times in don’t show much difference in CPU usage. This a single project possible. The state of the sys- is expected, as the simple calculations made tem is held in a number of Pd [table] objects for here do not involve many objects so the over- positions, velocities, masses, forces and a table head is negligible. Also both the BSP and the that holds meta-data, currently only the mobil- traditional method avoid any memory alloca- ity state of masses is watched: A mass can be tion by directly working on pre-allocated tables. mobile or fixed. A particle is identified by an When dealing with lists of numbers of arbitrary integer number which is used as a lookup index size, a common idiom in Pd is to build these into the state-holding tables. Particle 10 for ex- lists by pre- or appending elements to existing ample would hold its x-position in the 10th ele- lists stored in [list] objects.6 For longer lists ment of the table “pos-x”, its y-position would this can become a major cause of slowdown in be the 10th element in table “pos-y” and so Pd patches. on for tables “force-x”, “force-y”, “mobile” or “mass”. 3 [physigs] - Physical Modelling The size of these tables controls the size of the implemented with BSP system: Tables of size 64 can control up to 64 The BSP technique should show more of its po- particles. All 64 particles are computed regard- tential when applied to algorithms involving a less of particles actually being used. The table bigger number of unit generators and more par- size can be selected when creating the [physigs] allel tasks. We will now turn to such a use case object. and take a look at a physical modelling system In Figure 3 the calculations that advance the based on spring-connected particles. simulation one step are shown. At the top left, the current x-positions and 6See the list-help.pd file in Pd’s documentation for an example. This file also shows the opposite operation of 7[Henry, 2004] serializing a list with [list split] which is slow as well. 8[Montgermont, 2005]

101 ber of tables again, where each spring indexes get state tabreceive~ $1-pos-x tables with its integer-ID. Any spring links two

tabreceive~ $1-vel-x tabreceive~ $1-force-x particles. The ID numbers of these two parti- xdot = v tabreceive~ $1-mobile cles are held in two state tables called “link-m1” *~ and “link-m2”. Other state tables hold param- r $0-drag *~ eters like damping, stiffness, relaxed length and tabreceive~ $1-mass *~ 0.99 friction vdot = f/m so on. /~ Springs generate forces that have to be ac- cumulated into the tables holding the forces for +~ v + vdot each particle. This currently is realised in a sep- +~ x + xdot arate step that doesn’t use the BSP technique, because the “vanilla” version of Pd doesn’t of-

pd checkbounds fer an object that can write into a table at a

pd clip-x tabsend~ $1-vel-x position specified by an audio signal. Miller set new velocity Puckette intends to include such an object in tabsend~ $1-pos-x future versions of Pd, the C-code for the ob- set new position ject is rather trivial and it already exists as an external. Unfortunately falling back to tradi- tional control-rate calculations in this part de- stroy many of the performance gains as we will Figure 3: Advancing the world simulation one see in the benchmark section. time step 3.1 Performance comparison x-velocities are read using [tabreceive ] objects. To compare the different implementations of a ∼ Respective [tabsend ] objects write the new po- particle system, a test system has been devised ∼ sitions back, after the physics laws have been that creates a configurable number of particles applied. A table “force-x” holds an accumu- at random positions and with random mass and lation of all forces that have been applied to daisy-chains them with spring links. Such a per- particles. The “mobile” table is 0 when a par- formance test is part of the MSD distribution. ticle should be fixed, and 1 when it is mobile. Pd’s built-in “Load meter” has been used to get Weights are stored in a “mass” table. Some rough CPU usage values. The world advances friction is applied to fight instabilities and sim- one step every 50 milliseconds (or 2 * 25 msec in ulate air drag. The actual algorithm is rather [physigs] for link and mass computations each). simple: Any force on a particle is converted to Table 1 shows the results of several test runs. an acceleration by dividing the mass, the accel- It turned out, that [physigs] runs much slower eration is added to the velocity, which in turn is than both MSD and PMPD when the control- added to the position. The forces then get reset rate calculations are used to distribute the to zero, which is a step not shown in the fig- spring-forces to each particle. Switching off this ure. The subpatches “checkbounds” and “clip” part of the patch in the BSP implementation restrict the possible positions to a configurable will let [physigs] catch up to MSD and beat range and possibly flip the velocity to simulate PMPD. bouncing of the world’s ends. The same calcu- MSD doesn’t have any overhead, as it com- lations are made for the y-coordinates as well. putes the full simulation in just a single object. The system is driven by an outside [metro] PMPD however has separate objects for each whose speed specifies the speed of the physical particle and link. In the example patch 4096 simulations. Metro periods in “haptic” ranges objects for the simulation participants alone are of 20 to 100 milliseconds are good choices. The used. So here the overhead is significant as ex- [metro] then regularly generates bang-messages pected. to switch on and off the DSP-signal computa- [physigs], especially when used without the tions in the subpatch holding the simulation control-rate workaround for a missing “write step. to table”-object, turns out to be a capable A similar construction in [physigs] calculates contender for physical modelling in Pd-vanilla the forces of visco-elastic springs connecting two or Pd-almost-vanilla environments. Note that particles. These springs are defined by a num- these benchmarks are only meant to give a per-

102 Simulation Type CPU Usage ration for these. Detecting ”changed rooms” is MSD 5 a common need of composers working for RjDj. physigs 30 The analysis rate of the FE objects is config- physigs (w/o force distribution) 5 urable similar to the [physigs] object by adapt- PMPD 16 ing the speed of a [metro] object that blocks and activates them. Just as in the introductory ex- Table 1: Simulation of 2048 particles and 2048 ample of a “table transposer” the objects work springs at metro-period of 50 ms using three on shared table data and extract statistical fea- different implementations tures like centroid, mean, energy or flatness. If the table to be analysed holds a magnitude spectrum, the extracted features describe the formance estimation and should not be taken frequency content of the sound environment. Of as “hard numbers”. But the author has suc- course the objects may be used to acquire sta- cessfully run [physigs] on the “RjDj” version of tistical analysis of tables holding other kinds of Pure Data on the iPhone with its much slower data as well. CPU compared to standard computers. As is well-known spectrum analysis with Fourier transformations is a relatively costly op- 4 Feature extraction and analysis eration. In Pd it is already carried out in sub- with BSP patches, so the analysis is a natural candidate [physigs] generates data-heavy control streams to be “blocked” with the BSP technique. Of in a completely artificial manner. Similar course the time resolution gets worse in this amounts of data points have to be handled case, and overlapping analysis is not useful any- when analysing external audio coming in over more, when there are pauses between adjoin- the ADC (soundcard) and looking for certain ing runs of the transformation. But if all that features to guide a musical composition. Pitch is needed is a “snapshot” of an environment’s tracking or onset detection work on single sound spectral status, BSP-blocking is a viable way to objects and must be prepared to react quickly. save CPU resources. The [sigmund ] or [bonk ] objects that are part Results of a spectral analysis written to a ta- ∼ ∼ of Pd, thus run at audio rate. In this case, ap- ble can be analysed by several objects at the plying BSP would not be viable. But if a com- same time, forming feature vectors that can poser is interested in features on a ”slower” time be post-processed for classification. The costly scale, BSP can be applied. Fourier transformation has to be run only once As an example lets consider the differences for each vector. in time scale of physically moving between two rooms compared with playing two differ- 0 ent notes on a clarinet. Changing rooms takes metro 50 much more time than playing notes on instru- b ments. To detect the clarinet’s pitch changes, pd accumulate loadbang the software has to be constantly “alarmed” pd get-mean ; of the spectral content registered through the pd dsp 1 soundcard. However when trying to detect if a 0.392 person holding a microphone changes rooms by $0-spectrum analysing the spectral or reverberant character- istics of the two environments, spectral snap- shots can be made and compared much less fre- quently than in the case of pitch detection. The author has developed a set of BSP fea- ture extraction (FE) objects mainly intended to be used in the RjDj version of Pd that runs on 1024 element table. mobile devices. The timbreID collection of Pd u_loadmeter 4 externals by William Brent9 served as an inspi-

9[Brent, 2009]. Newer versions of timbreID include both audio- and control-rate versions of its analysis ob- Figure 4: Calculating the arithmetic mean with jects. BSP - main patch

103 As an example for an FE-BSP object we will take a closer look at the calculation of the mean inlet of a table. The arithmetic mean needs an ac- b cumulation of all table values, which then is di- vided by the table size. switch~ 1024 1 1 While it would be possible to divide every sin- gle value by the table size before adding them, here the accumulation is calculated first, then tabreceive~ $0-spectrum a single division is made. This gives a minor speed-up, but here it should also illustrate the bang~ way, the execution order for BSP calculations clear can be controlled. With Pd’s normal message computations, rpole~ 1 [trigger] objects are used to specify a certain ex- accumulate ecution order. With signal objects the execution tabsend~ $0-sum order can only be manipulated by laying out table $0-sum 1024 the operations into subpatches, that are con- nected by signal patchcords.10 The connected subpatches are calculated by Pd’s main sched- outlet~ uler in the order they are connected with objects earlier in the chain calculated first (contrasting dummy outlet to get correct excecution order Pd’s depth first scheduling for messages). In Figure 4 both subpatches are connected like this, so first every signal object in [pd ac- cum] is run, then every signal object in [pd get- mean]. Dummy signal inlets and outlets are in- cluded in both subpatches to allow making these order-forcing connections. The accumulation of all values in a table can Figure 5: Subpatch “accumulate“: Accumulat- be coded using only a handful of objects: Using ing table values with a one-pole filter a one-pole recursive filter as supplied by Pd in the [rpole ] object with a coefficient of 1 will ∼ dio signal was especially limiting. For example accumulate all data as long as its switched on. while finding local extrema (minima or maxima) The filter is reset to zero after each run using in a table is easy by scanning through the table the [bang ] object that outputs a bang after ∼ once and comparing two adjacent table values, each DSP cycle. the author hasn’t yet found a way to efficiently The output of [rpole ] is written into a ∼ store the locations of these extrema for later use. summing-table using [tabsend ]. The final po- ∼ Indexed table writing would also make an- sition in this table will hold the accumulation other workaround or “hack” in the FE objects value. unnecessary: To calculate the geometric mean In the second subpatch (Fig.6) it’s a [bang ] ∼ the product of all values in a table is needed, object that will transport this value to the divi- but there is no “vanilla”-way to reuse the result sion by N before it is reported to the outlet after of the multiplication of previous samples again the block calculation has completed. Mean cal- as it is with the one-pole filter that adds pre- culation has a latency of one block of samples vious results (output values) back to its input: or blocksize divided by samplerate. Upsampling the next sample in a block. The workaround inside of subpatches to reduce this latency can currently applied is to transform the multipli- be achieved by changing the third argument of cations to additions using logarithms. the switch-objects or by sending it respective messages. 5 Other Applications of BSP When implementing the FE objects, the lack of a Pd object to selectively write a value into The BSP technique can be applied to various a table at a certain position specified by an au- other areas, where parallel, block-based pro- cessing is needed. Examples would be Cellular 10[Puckette, 2007, p. 212-216] Automata, L-Systems or swarm/flock systems.

104 BSP deals only with numbers, so it’s applica- dummy inlet for execution order bility to text processing is very limited, which inlet~ affects the use of formal grammars. The sym- inlet bols used in alphabets of L-Systems or similar systems based on rewrite rules have to be con- b verted to numbers by implementing a transla- tion map. switch~ 1024 1 1 7 Conclusions and future work "blocksize - 1" to get final value BSP has been successfully applied to a simple bang~ physical simulation for this paper and a growing 1023 set of feature extraction objects. The [physigs] object will be refined and published under an tabread $0-sum open source license, while the feature extraction objects will become a part of the “rj” library / 1024 developed for the RjDj application. While BSP is a powerful and so far under-used technique in outlet Pd, it cannot magically transform Pd to become a true multiple-rate software. Music compilers like ChucK, SND-RT or LuaAV still deal with the competing demands of variable control rates in a cleaner and more flexible way. Figure 6: Subpatch “get-mean“: Calculating the mean from the accumulated table and the 8 Acknowledgements table size. Parts of the research for this paper was kindly supported by Reality Jockey Ltd./RjDj. But even for much simpler daily tasks in the life of a Pd user, BSP is worth a look. For ex- References ample to copy the content of one table to an- William Brent. 2009. Cepstral analysis tools other one, a blocked combination of tabplay ∼ for percussive timbre identification. In Pro- and tabwrite can be used. ∼ ceedings of the 3rd Pure Data Convention in Sao Paulo. 6 Limitations of BSP Charles Dodge and Thomas A. Jerse. 1985. As many optimization techniques, BSP has sev- Computer Music. Schirmer Books, New York, eral limitations that have to be weighted against NY, USA. the possible performance gains. One important problem is execution order. Pd alternates au- Cyrille Henry. 2004. Physical modeling for dio and message computations. BSP however pure data (pmpd) and real time interaction lives in a grey area between the audio signal and with an audio synthesis. In Sound and Music control computations. New results will be com- Computing ’04. puted in the signal pass, where no other control Nicolas Montgermont. 2005. Modeles calculations happen. It is not possible to trigger physiques particulaires en environnement other control-events in the middle of a BSP-run. temps-reel: Application au controle des The algorithms to be used in BSP cannot use parametres de synthese. Master’s thesis, recursion inside one block, because feedback of Universite Pierre et Marie Curie. results computed at a later point in the DSP tree to an earlier point is not possible. The Miller Puckette. 2007. The Theory and Tech- minimal feedback delay time in Pd is one block, nique of Electronic Music. World Scientific other constructs result in “DSP loop detected”- Press. errors in Pd. This limit is expected to compli- Curtis Roads. 1996. The Computer Music cate applying BSP in recursive algorithms like Tutorial. MIT Press, Cambridge, MA, USA. L-Systems. The inclusion of a suitable table- writing object in Pd vanilla as mentioned above Ge Wang and Perry R. Cook. 2004. On-the- could alleviate this problem a bit. fly programming: Using code as an expressive

105 musical instrument. In Proceedings of the In- ternational Conference on New Interfaces for Musical Expression, pages 138–143.

106 physigs - physically inspired particle simulation in the how to drive [physigs] signal realm r $0-massbang ; metro 50 Once you've set up the particle system, you should pd dsp 1 alternatingly bang the link- and mass-bang inlets. The link r $0-linkbang f == bang calculates all forces generated by links and select 0 1 accumulates them into the respective slots of the mass r $0-control particles. It requires the time of one audio block with the s $0-linkbang size of samples. physigs $0-name 64 128 help s $0-massbang The mass bang advances the particle system one step and clears all forces afterwards. It requires the time of one audio block with the size of samples. Arguments:

The actual calculations are made on tables, that are The link and mass numbers must be powers of two, like 2, 4,

iue7 epfiefrte[hsg]abstraction [physigs] the for Help-file 7: Figure prefixed by the first argument passed to [physigs]. The 8, 16, 32, 64, 128, 256, 512, 1024, ... tables in the following subpatch are available (replace the $0 with the name you passed): All masses will be active, but the actual number of usable masses is one less than specified by the first argument pd particle-system-state because the mass at index 0 must not connected with any link (consider it a "guard mass"). You can directly read and modify all of these tables by Control messages to last inlet using the correct prefix. Note that first element (index 0) of all mass-tables must not be connected to any link and ybounds has to be immobile. xbounds bounce 0/1: turne edge bouncing on or off For example to set the k of all links in the example [physigs] object with prefix "$0-name" on the right at the drag: set global viscous drag multiplier. 0 is sticky, 1 is same time, use something like: no drag at all. Default is 0.99 107 add-link : add a link between mass m1 and mass m2 with rigidity k and damping D 0 reset: reset masses and links const $1 reset-masses: reset only masses s $0-name-link-k reset-links: reset only links Alternatively you can use a "table" control message to the auto-link-length: will set the lengths of all links to last inlet. Use the table's base name without prefix as their current extrusion first argument to a "table" command, then add whatever table: send message to internal table message you need to send. To set all links like above use this: Here's an example system: 0 pd scanned-synthesis-string table link-k const $1 s $0-control 108 OrchestralLily: A Package for Professional Music Publishing with LilyPond and LATEX Reinhold Kainhofer, http://reinhold.kainhofer.com, [email protected] Vienna University of Technology, Austria and GNU LilyPond, http://www.lilypond.org and Edition Kainhofer, http://www.edition-kainhofer.com, Austria

Abstract namically. Fore a more detailed overview over LilyPond [Nienhuys and et al., 2010] and LATEX pro- LilyPond we refer to the excellent Documen- vide excellent free tools to produce professional mu- tation of the LilyPond project: http://www. sic scores ready for print and sale. Here we present lilypond.org/Documentation/. the OrchestralLily package for LilyPond, which sim- A very simple LilyPond score containing only plifies the creation of professional music scores with one staff has the following form: LilyPond and LATEX even further. All scores are gen- version ”2.13.17” erated on-the-fly without the need to manually spec- \ ify the structure for each individual score or part. relative c’’ \ c4 p d8 [ ( c ] ){ e4 . d . A Additionally, a LTEX package for the prefatory mat- c1 \ bar ” .” − − | ter is available and a templates system to create all \ | files needed for a full edition is implemented. } Keywords    LilyPond, Music scores, Publishing, LaTeX, Soft-    ” ware package All notesp are entered by their note name1, followed optionally by the duration and addi- 1 Introduction tional information like beaming ([ and ]), dy- In professional music publishing applications namics (e.g. \p) and articulations (e.g. -. for like Finale, Sibelius and SCORE are the pre- staccato). When running this file through the dominant software packages used. However, the LilyPond binary, a five-line staff with a treble open source applications LilyPond and LATEX clef, 4/4 time signature and C major key is im- provide excellent free alternatives for producing plicitly created. All layouting and spacing is professionally looking music scores, as well. To done by LilyPond according to best practices ease the production of such scores even further, and standards from music engraving. we developed a package called OrchestralLily for 2.1 Writing Full Scores in Pure LilyPond and LATEX. Instead of having to pro- duce each score and instrumental part manually LilyPond in LilyPond, this package produces these scores To produce a score containing a system with dynamically from the music definitions. more than one staff (e.g. full orchestral scores, choral scores or vocal scores) or to produce a 2 A Short Introduction to LilyPond score with lyrics attached to the notes, LilyPond LilyPond, the music typesetting application de- can no longer automatically create the staves, veloped under the umbrella of the GNU project, but one has to write the score structure manu- is a WYSIWYM (”What you see is what you ally into the LilyPond file: mean”) application, taking text files containing version ”2.13.17” the music definitions and corresponding settings \ and typesetting it into a PDF file. Writing a sopmusic = relative c’’ c4 p d8 [ (\ c ] ) e4 . d . { c1 bar ” .” LilyPond score is very similar to coding a soft- s o p l\ y r i c s = lyricmode− − Oh,| be\ hap| } py now ! \ { −− ware program. } In this section we will give a very short 1Using Dutch names by default: b for the note below overview about the LilyPond syntax, mainly to c and the postfix -is for sharp alterations and -es for highlight our motivation to create the Orches- flat alterations. Other languages can easily be used by tralLily package, which creates the scores dy- including a language file, e.g. include "english.ily". \

109 a l t o m u s i c = relative c’’ Scheme interface to most of the functions re- \ { g4 f 4 e4 f e1 bar ” .” quired to create a score. This interface is the altolyrics = lyricmode| \ |Oh,} be hap py now ! \ { −− key for our OrchestralLily package, where all } scores are generated on-the-fly using Scheme. s c o r e \ { new ChoirStaff << 3 OrchestralLily: An easy example \ new S t a f f \ new Voice{ = ”Soprano” OrchestralLily uses a slightly different approach \ dynamicUp sopmusic { \ \ than manually writing LilyPond scores: Instead } of telling LilyPond explicitly about staves and }new Lyrics = ”SLyrics” groups, the score structure is already built in, \ lyricsto ”Soprano” s o p l y r i c s new\ S t a f f \ using default names, and the user only has to \ new Voice{ = ”Alto” define some specially-named variables, contain- \ dynamicUp a l t o m u{ s i c \ \ ing the music, lyrics, clef, key, special settings, } etc. The score is then generated on-the-fly by 2 }new Lyrics = ”ALyrics” the following call: \ lyricsto ”Alto” a l t o l y r i c s >> \ \ createScore #”MovementName” \ #’(”Instrument1” ”Group2” ”Instrument2”) } Of course, multiple \createScore commands p can be given in a file to produce multiple scores  Â  in the same file (in particular, this is used in  Oh, be  hap py now!  large works with multiple movements, where all movements should be printed sequentially).  Oh, be hap py now! The specially-named variables holding the      music definition mentioned above have the form Here, the actual music definitions require only eight lines of code, while the structure of the [MovementName] InstrumentMusic score requires already more lines. As one can image, creating a full orches- where Instrument is replaced by the (de- tral score with lots of instruments and multi- fault) abbreviation of an instrument or vocal ple movements quickly becomes a nightmare to voice and [MovementName] is optional for pieces produce manually. Each staff and each staff with only one movement. To define a spe- group defined this way takes 3 to 7 lines of code, cial clef, key, time signature, lyrics or spe- quickly leading to a score structure definition of cial settings for a voice, one simply defines a hundreds of lines. For example, a large work variable containing the clef, key, time signa- with 23 instruments and 12 movements has 276 ture, etc. This special variable for each instru- individual staves, not counting groups. Even ment is called [MovementName]InstrumentXXX, worse, each movement typically has the same where XXX is either Clef, Key, TimeSignature, well-defined structure in the orchestral score. Lyrics, ExtraSettings, etc. So, a lot of code is duplicated, with the only To show how this works, the two-voice exam- difference being the music expressions inserted ple from above using OrchestralLily will now into the scores. look like: This makes it extremely hard to maintain version ”2.13.17” \ include ”orchestrallily/orchestrallily.ily” large orchestral scores in pure LilyPond, and \ even small changes to the appearance of only SMusic = relative c’’ one instrument require lots of changes. c4 p d8\ [ ( c ] ) e4 . d {. c1 bar ” .” S L y r\ i c s = lyricmode− − | \ | } 2.2 C++ and Scheme / Guile Oh, be \ hap py{ now ! AMusic = relative−− c’’ } Internally, LilyPond is written in C++ with g4 f 4 e4\ f e1 bar{ ” .” Guile as embedded scripting language. Many ALyrics = lyricmode| \ | } Oh, be hap\ py now{ ! parts of the formatting code (e.g. all graphi- −− } cal objects like note heads, staff lines, etc.) are createScore #”” #’(”S” ”A”) defined in Guile and can be modified and over- \ written easily using Scheme code embedded into 2The hash sign # indicates a scheme expression, the the score. LilyPond even provides an extensive #’(...) is a list in Scheme syntax.

110 p    Oh, be  hap py now!

 Woodwinds

  Oh, be hap py now! It is clear that this automatic creation of staff groups saves a lot of effort for large- Violins scale orchestral projects. It should be noted Strings that OrchestralLily has a large hierarchy of orchestral instruments, including the identifier "FullScore" for a full orchestral score. So in- stead of #’("S" "A") above, we could have also said #’("FullScore") to generate a full score Solo of all defined voices. Voices not defined will be ignored, so #’("S" "A" "T" "B") would have FullScore the same output, as the T and B voices are not defined and thus not included in the output. Vocal 4 Structure of a Score

To understand OrchestralLily’s approach, we Choir have to take a closer look at the organization of a full score. A music score has an intrinsic hier- archy of instruments and instrument groups, as shown in Figure1. This hierarchy is pre-defined in Orchestral- Lily and will be used, unless the LilyPond score Continuo explicitly overrides it: Figure 1: Hierarchy of an orchestral score The instruments are named by their stan- • dard abbreviation (e.g. “V”, “VI”, etc. for violins, “Fag”, “FagI” etc. for bas- 5 More complex examples soon, “Ob” for oboe, “S”, “A”, “T”, “B”, The examples so far placed the music definition “SSolo” etc. for vocal voices, etc.). and the actual score creation via \createScore Each group of instruments has a pre- into the same file. For larger projects, it is • defined name: “Wd” for woodwinds, “Br” advisable to place the music definitions into a for the brass instruments, “Str” for strings separate file, as we need several different score (except continuo, i.e. celli and basses, files, each of which will include this defini- which are typically not included in the tions file. All the examples in this section will strings group, but placed at the bottom of use the following music definition file “music- a full score), “Choir”, “Continuo”, etc. definitions.ily”, which defines a flute, a violin, soprano and alto, as well as a continuo part for Several types of scores are pre- • a movement named Cadenza. Also, a Piano re- defined: “LongScore”, “FullScore” (like duction is defined in this file. Notice also, that “LongScore”, except that two instruments this include file already loads OrchestralLily, so of the same type, e.g. ObI and ObII, are we don’t have to do this again in the file for each combined and share one staff rather than individual score. using separate staves), “VocalScore” (only include ”orchestrallily/orchestrallily.ily” the vocal voices and the piano voice), \ i n c l u d e “ChoralScore” (only the vocal voices, no \ ”orchestrallily/oly s e t t i n g s n a m e s . i l y ” instruments). header \ { The score types pre-defined in OrchestralLily title = ”A cadenza” adhere to the standard instrument order usually }CadenzaPieceNameTacet = ”Cadenza tazet” employed for full orchestral scores.

111 % Flute and Violin: A cadenza CadenzaFlIMusic = relative c’’ e4 a g b , \ { c1 bar ” .” Flauti CadenzaVIMusic| \ =| relative} c’’ ž   \ {     c16[ ege] d[ f a f] e[ ge c] b[ dbg]  c1 bar ” .” | \ | Violino I ž      }               % The vocal voices: CadenzaSMusic = relative c’’ Soprano p \ { ž    c4 p d8 [ ( c ] ) e4 . d . c1 bar ” .”   Oh, be  hap py now! CadenzaSLyrics\ = −lyricmode− | \ | } Alto Oh, be hap \ py now ! { −− }  Oh, be hap py now! CadenzaAMusic = relative c’’       g4 f 4 e4 f \e1 bar ” .” { | \ | } 6 5 CadenzaALyrics = lyricmode Organo 6 5 4 3 Oh, be hap py\ now ! { −− }        % Continuo: Organ / Celli / Bassi / Bassoon The additional \setCreateMIDI ##t line CadenzaBCMusic = r e l a t i v e c c4 f 4 g g , c1 bar ” .” \ { | causes a midi file of the score to be created in CadenzaFiguredBassMusic\ | } = figuremode addition to the PDF file. s4 <6>8 <5> <6 4>4 <5 3>\ s1 { | Notice that we never explicitly said that the } continuo is supposed to be in bass clef. Orches- % Piano reduction: tralLily already knows that the “BC” (Basso CadenzaPIMusic = relative c’’ continuo) voice is in bass clef! Similarly, trom- twoVoice \ { \ c16[ ege]{ d[ f a f] e[ gec] b[ db bone parts will employ the correct C clef, the g ] choir bass will also use the bass clef, etc. | If some instruments should have cue notes, we }{e4 a 4 4 % 2 don’t want to print them in the full score, so in- <}c | g e>1 bar ” .” stead of \createScore, OrchestralLily provides \ | the command \createNoCuesScore, which will }CadenzaPIIMusic = r e l a t i v e c 4 f 4\ %{ 2 additionally remove all cue notes from the 1 bar ” .” | printed score. \ | } 5.2 Instrumental Parts Each instrumental part can be generated just All variables in this file start with Cadenza, like the full score. If one additionally defines the followed by the instrument name, which is how “instrument” header field, then the instrument OrchestralLily detects that these definitions be- name will be printed in the right upper corner, long to a movement name Cadenza. We also like in most printed scores. defined a piece title to print before the score. version ”2.13.17” Another thing to notice here is that \ include ”music definitions.ily” \ − we also include the file “orchestrallily/ i n c l u d e \”orchestrallily/oly s e t t i n g s instrument. ily” oly settings names.ily”. That file contains header instrument = VIInstrumentName many instrument and score name definitions \ { \ } createScore #”Cadenza” #’(”VI”) for most common instruments and causes them \ to be printed before each staff in the score. A cadenza Violino I

5.1 The Full Score                 version ”2.13.17” E  \ i n c l u d e If no music is defined for the given instru- \ ”orchestrallily/oly s e t t i n g s fullscore.ily” include ”music definitions.ily” ment for the desired movement (indicated by \ setCreateMIDI ##t− the first string that you pass to createScore), \ setCreatePDF ##t \ OrchestralLily will instead print a “tacet” head- createScore #”Cadenza” #’(”FullScore”) line. For example, if we try to create a score for \ the oboe, there is no oboe part defined and a “Cadenza tacet” is printed instead:

112 A cadenza version ”2.13.17” \ include ”music definitions.ily” \ header instrument− = ObIInstrumentName \ { \ } p   Â Â Â Â Â createScore #”Cadenza” #’(”ObI”)   \ Oh, be hap py now!

  Oh, be hap py now!     A cadenza Oboe I               Cadenza tazet         b          5.4 Figured Bass 5.3 Vocal Scores and Modifying The continuo part in the music definitions above Individual Staves is simply the bass line of the cadenza. How- ever, most old scores additionally provide a bass figuration to indicate the harmonies to the or- To create a vocal score (remember, we have al- ganist. Creating such a figured bass score is ready defined the piano reduction in the defini- also extremely simple in OrchestralLily: All you tions!), you only have to call \createScore for have to do is to define its corresponding vari- the “VocalScore” score type. To make things able, named CadenzaFiguredBassMusic in our more interesting, here we want the staves for case, where you define the appropriate bass fig- vocal voices to appear smaller than the piano ure in LilyPond’s \figuremode syntax: staff. Furthermore, the note heads of the so- version ”2.13.17” \ include ”music definitions.ily” prano voice should be colored in red and the \ − alto lyrics printed in italic. CadenzaFiguredBassMusic = figuremode s4 <6>8 <5> <6 4>4 <5 3>\ s1 { These special settings for S and A can | } createScore #”Cadenza” #’(”Continuo”) be provided by placing them into \with \ blocks and saving them into appropriately A cadenza named variables called Cadenza[SA](Staff| Voice|Lyrics)Modifications: 6 5 Organo 6 5 4 3    version ”2.13.17”  ” \ include ”music definitions.ily” 5.5 Cue Notes  \ − Suppose that we now want to add a second flute, CadenzaSStaffModifications = with f o n t S i z e = # 3 \ { which will set in on the third beat. In the instru- override StaffSymbol− #’staff space = mental part, we want to print cue notes from the \ #(magstep 3) − − first flute, but in the full score (or in a combined }CadenzaAStaffModifications = flute part) we don’t want the cue notes. CadenzaSStaffModifications In LilyPond, one can simply create cue notes \ CadenzaChStaffModifications = by first defining the music to be quoted via CadenzaSStaffModifications \ \addQuote and then inserting the cue notes via CadenzaALyricsModifications = with \cueDuring #"quotedInstrument" { r2 }. \ { override LyricText #’font shape = First, we add the new flute 2 part in a sepa- #’\ i t a l i c − } rate file “music-definitions-flute2.ily”: CadenzaSVoiceModifications = with addQuote #”Flute1” CadenzaFlIMusic override NoteHead #’color\ = #red{ \ \ \ } CadenzaFlIIMusic = relative c’’ createScore #”Cadenza” #’(”VocalScore”) \ { \ namedCueDuring #”Flute1” #UP ”Fl.1” \ ” Fl . 2 ” R1 g1 bar ” .”{ } | \ | }

113 Note that we quote the first flute directly A cadenza in the music for the second flute, using the method \namedCueDuring (which is equivalent to LilyPond’s built-in function \cueDuring, ex- Violino I  Â Â cept that it also adds the name of the quoted  Â Â Â Â Â Â Â Â Â Â Â instrument). Â Â Â The Flute 2 part now simply is: 5.7 Drum Staves and other staff types Of course, OrchestralLily is also able to print version ”2.13.17” \ include ”music definitions.ily” non-standard staves, like rhythmic staves or \ include ”music−definitions flute2.ily” tablatures: \ − − % The Flute 2 part: version ”2.13.17” \ createScore #”Cadenza” #’(”FlII”) include ”orchestrallily/orchestrallily.ily” \ \ header \ title ={ ”Drum and tab staves” A cadenza composer = ”Anonymous” Fl.1 Fl.2 } Â drumIMusic = drummode crashcymbal4 hihat8 Flauto II Â Â \ { Â halfopenhihat E  drumIIMusic = c4} c4 In the full score (or a combined flutes part), tabularMusic ={ c4 8 d16 r16 however, we do not want to print the cue { } orchestralScoreStructure #’( notes, since the notes from Flute 1 are already \ (”drumI” ”DrumStaff” ()) printed in the score. In this case, we can use (”drumII” ”RhythmicStaff” ()) \createNoCuesScore instead of \createScore (”tabular” ”TabStaff” ())) orchestralVoiceTypes #’( to suppress the creation of any cue notes: \ (”drumI” ”DrumVoice”) (”tabular” ”TabVoice”)) version ”2.13.17” \ include ”music definitions.ily” createScore #”Cadenza” #’(”drumI” ”drumII” \ include ”music−definitions flute2.ily” \ \ − − ” t a b u l a r ”) % remove the cues in Flute 2: createNoCuesScore #”Cadenza” #’(”FlLong”) Drum and tab staves \

A cadenza Anonymous  Flauto I      H H E   Flauto II 4    0   2 0 E 3 5.6 Transposition  6 Tweaking the Score Parts can be easily transposed (e.g. for trans- posing instruments between concert pitch and As the OrchestralLily package is implemented written pitch): entirely in LilyPond syntax and Scheme code, anyone can easily adjust or extend its function- version ”2.13.17” ality directly in a score or in an include file, \ include ”music definitions.ily” \ − without the need to recompile or reinstall any- %We need to give the key explicitly , thing. % so that it will also be transposed: CadenzaVIKey = key c major 7 LATEX for the Preface and Cover % Transpose to\ g major \ CadenzaVITransposeFrom = g So far, we have concentrated on creating the musical score. A professional edition, however, createScore #”Cadenza” #’(”VI”) \ also features a nice title page, a preface and in many cases also a critical report. For these,

114 we chose LATEX for typesetting together with a LATEX package providing a uniform layout, many macro definitions aiding with the criti- Inhaltsverzeichnis cal report and a beautiful title page. The music Vorwort / Preface iii

Partitur / Full Score 1 A 1. Pantalon ...... 1 scores are directly included into the LTEX file 2. Et´e...... ´ 1 3. Poule ...... 2 4. Tr´enis...... 3 5. Pastourelle ...... 3 using the pdfpages package. 6. Finale ...... 4

The templates (see next section) provided by

Zu diesem Werk (EK-2000-...) liegt folgendes Auffuhrungsmaterial¨ vor: OrchestralLily already produce a beautiful lay- Partitur (-1); Violino (-30), Viola (-32), Violoncello (-34). out without the need for any special tweaks. All one has to do is fill in the missing text and the

Hauptquellen der Bearbeitung / Main sources of this edition general information about the score, and the tes • Johann Strauss Sohn: Serben-Quadrille, 14 Werk, fur¨ das Pianoforte, Erstdruck, Verlagsnummer P.M. ›4102, Pietro Mechetti q.m Carlo, Wien, 1846. LATEX scores will look like the following pages from a real score, typeset using OrchestralLily:

© 2010, Edition Kainhofer, Vienna 1. Auflage / 1st Printing 2010 Computersatz mit Lilypond 2.13, http://www.lilypond.org/ Vorwort: Reinhold Kainhofer Alle Rechte vorbehalten / All rights reserved. Printed in Austria.

ii

Johann Strauss

Vorwort Preface

Die Serben-Quadrille ist als Opus 14 ein fruhes¨ Werk des The Serbian Quadrille“ Op.14 is an early work of the Serben-Quadrille ” Wiener Walzerk¨onigs Johann Strauss Sohn, komponiert Waltz King“ Johann Strauss Jr., commissioned by prin- ” im Auftrag von Furst¨ Obrenovi´cfur¨ einen Slawenball im ce Obrenovi´cfor a Slavic ball 1846 in Vienna. In it, Jahr 1846 in Wien. In diesem Stuck¨ verarbeitet Strauss Strauss uses traditional Serbian motifs in his typical Vi- Serbian Quadrille traditionelle serbische Motive in seinem klassischem Wie- ennese style. ner Stil. A quadrille – a precursor of modern square dancing – is Die Rivalit¨at von Johann Strauss Vater und Sohn ab etwa a dance formation performed by four couples in square Op.14 1845 – der Vater hatte fur¨ den Sohn eine ander Laufbahn formation. The related music style, also called quadrille, als die eines Musikers vorgesehen, jedoch wurde der Sohn typically consists of six figures in its Viennese style: 1. Le ” von seiner Mutter entsprechend unterstutzt¨ – und die Be- Pantalon“ (a pair of trousers), 2. L’´et´e“ (the summer), 3. ” kanntheit des Vaters in der Wiener Musikwelt hatten Jo- La Poule“ (the hen), 4. La Tr´enis“ (named after dance ” ” hann Strauss Sohn gezwungen, sich eher den Randgrup- master Trenitz), 5. La Pastourelle“ (the shepherd girl) ” pen der gehobenen Wiener Bev¨olkerung zuzuwenden. Zu and 6. Finale. The original French form consisted of only dieser Zeit weilte etwa der ehemalige Serbenfurst¨ Miloˇs 5 figures, missing the Tr´enis. Bearbeitung fur¨ Streichtrio Obrenovi´c(1780-1860) im Exil in Wien, nachdem im Jahr The history of this quadrille carries some interesting 1842 sein Sohn Mihailo Obrenovi´cnach einem Aufstand aspects as well: The rivalry with his father Johann Arrangement for String Trio zu Gunsten von Aleksander Karadordevi´cals serbischer Strauss Sr., a famous dance music composer as well, who Furst¨ abgesetzt worden und nach Wien geflohen war. did not want his son to become a musician, and the in- Obrenovi´cversuchte auf verschiedene Arten wieder ent- fluence of the father in Vienna had forced the son to turn sprechende Ruckendeckung¨ und die Herrschaft in Serbien his attention away from the mainstream of the Vienne- zuruckzuerlangen¨ (was ihm 1858 schließlich auch gelang), se high society. At that time, the former Serbian prince unter anderem durch die Veranstaltung zahlreicher Sla- MiloˇsObrenovi´c(1780-1860) was living in exile in Vien- wenb¨alle in Wien. Fur¨ den Ball am 28. Januar 1846 im na, after his son Mihailo Obrenovi´chad been deposed in Alt-Wiener Tanzsaal Zur goldenen Birn“ (den Grazi- favor of Aleksander Karadordevi´c.Obrenovi´ctried to re- ” ens¨alen im Bezirk Wien-Landstraße) beauftragte er den gain the support of the Serbian population to return as jungen Johann Strauss mit einer Quadrille, die nicht nur Serbian prince (where he eventually succeeded in 1858). in hoher Stuckzahl¨ in Wien verteilt, sondern von Obreno- As one of many ways to gain popularity he organized vi´cauch nach Serbien geschickt wurde. Die Auffuhrung¨ numerous Slavic balls in Vienna. Partitur / Full Score beim Slawenball wurde ein großer Erfolg, ebenso wie die For the ball on January 28, 1846, in the dance hall SZur erste ¨offentliche Auffuhrung¨ am 2. Februar. Die Zeitung goldenen Birn”(district Wien-Landstraße), he instructed Gegenwart“ schrieb so uber¨ den Slawenball: Strauß’s the younger Strauss, with a quadrille, which was distri- ” ” Quadrille wurde mit ungeheurem Beifall (kein Hyperbel!) buted not only in large quantities in Vienna, but also sent aufgenommen.“ to Serbia by Obrenovi´c.The performance at the Slavis Obwohl die Quadrille beim Ball anscheinend von Johann ball was a great success, as well as the first public per- Strauss’ Orchester aufgefuhrt¨ wurde, erschien sie nur als formance on February 2. Klavierausgabe im Druck bei Pietro Mechetti. Eine Or- Although the quadrille appears to have been performed chesterfassung ist – wenn sie je existierte – nicht erhalten. by Strauss’ orchestra at the ball, it appeared in print on- ly as a piano edition published by Pietro Mechetti. An Die vorliegende Bearbeitung der Serben-Quadrille fur¨ orchestral version is not preserved. Streichtrio entstand ebenfalls anl¨asslich eines Slawen- balls in Wien: Fur¨ den Wiener Serbenball im Jahr 2006 The present arrangement of the Serb-Quadrille for string sollte das Aleksi´cStreichtrio zur musikalischen Umrah- trio was also created for a Slavic ball in Vienna: At the mung auch die Serben-Quadrille von Johann Strauß dar- Serbian ball in 2006, the Aleksi´cString Trio was asked to bieten. Mangels einer entsprechenden Bearbeitung ent- perform the Serbian quadrille“ by Strauss as part of the ” stand – von den Mitgliedern des Streichtrios erstellt – musical accompaniment. Lacking a proper arrangement, die vorliegendende Ausgabe der Quadrille fur¨ Streichtrio the members of the string trio decided to write this arran- Edition Kainhofer, Vienna, EK-2000-1 in der Besetzung Violine, Bratsche und Cello. Die Me- gement for violin, viola and cello themselves. The melody lodie wird dabei praktisch ausschließlich von der Violine is almost entirely played by the violin, the viola supports ubernommen,¨ die Bratsche unterstutzt¨ entweder die Me- either the melody of the violin or the harmony of the lodie der Violine oder die Harmonik des Cellos. cello.

Quellen: / Sources [Mai99] Franz Mailer: Johann Strauß, Kommentiertes Werkverzeichnis, Verlag Pichler, Wien, 1999.

iii

Serben-Quadrille Bearbeitung für Streichtrio

Johann Strauss (Sohn) (1825-1899), Op. 14 Bearbeitung: Aleksa und Ana Aleksić 1. Pantalon fine Johann Strauss (1825-1899) Violino 2 ”   4 ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”   f ” ” ” ” ” sfz     2 Viola 4 ”    ”f ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” sfz     ” ”  ”   2 Cello 4    ” ” ” ” ” ” ” ” ” ” ” ”    ”f ” ”  ” ” ”  ” ” ”  ” ” ”  ” ” ”  ” ” ”  ”  ” ” sfz   ” ” ” Serben-Quadrille 9

” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”    p ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”    Serbian Quadrille    ” ” ” ” ” ”    p”  ”  ” ” ”  ”  ”  ” ” ”  ”  ”  ” ” ”  ”  ” ” ” ”  Op.14 ” ” ” ”    ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”   p ” ” ” ”  17 ” ” ” ” ” ” ” ” ” ” ” ”  ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”     f ” ” ” ” ” ” sfz       ” ” ” ” ” ”     f ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” sfz  Bearbeitung fur¨ Streichtrio    ” ”  ”  Arrangement for String Trio    ” ” ” ” ” ” ” ” ” ” ” ”      ”f ” ”  ” ” ”  ” ” ”  ” ” ”  ” ” ”  ” ” ”  ”  ” ”  sfz   Violino, Viola, Violoncello  ” 25 D.C. al fine ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”    p ” ” ” ” ” ” ” ” ” ”          ”  ”  ”  ”   ”  ”      p” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”    p ”  ”  ”  ” ”  ” ”    ”     ”   ”   ”      ” ” ”  2. Été ” ” ” 2 Violino 4 ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”  f ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”        Partitur / Full Score 2 Viola 4 ” ” ” ” ”  ”f  ”  ”  ”  ”  ”  ”  ”  ”  ”  ”  ”  ”  ” 2 Cello 4  ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” Bearbeitet von: / Edited by: ”f ” ” ” ” ” ” ”   ” ” ” Aleksa & Ana Aleksic´ 8 fine 3 ” ”  ” ”   p ” ” ”  ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”  ” ” p ” ” ” ”  ”  ” ” ” ” ” ” ” ” 3” ” ” ” ” ” ” ” ” ” Edition Kainhofer, Vienna, 2010 ” ” EK-2000-1      ” ” ” ” ” ” ” ”  ” ” p ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” © 2010, Edition Kainhofer,” Vienna,” EK-2000-1.” Alle Rechte vorbehalten” / All” rights reserved” / Printed in ”Austria. ”

115 8 Generating a Template Commons BY-NC 3.0 license [Creative Com- OrchestralLily also provides a template-based mons, 2010] as well as under the the GPL v3.0. script to generate all files required for a full edi- Its source code can be found in a public git tion of a score: The score information, including repository [Kainhofer, 2010a]: http://repo.or.cz/w/orchestrallily.git instrumentation, movements, voices with lyrics, . etc., are defined in one input file. After running More information about the OrchestralLily the generate_oly_score.py script, one has the package can be found in the documenta- http://kainhofer.com/ full set of files for the edition, including a Make- tion at its homepage orchestrallily/ file. Only the music, lyrics, and the actual text (which is unfortunately not of the preface and the critical report need to be always kept up to date) or better directly from filled in. Running make will always update all the source code. scores and produce ready-for-print files for all 10 Acknowledgements desired scores and instrumental parts. A typical input file for the cadenza example A project like OrchestralLily would of course used above is shown here: never be possible without the help of many peo- ple. The developers of LilyPond and of LATEX– too many to name them explicitly here – made {” o u t p u t dir”: ”Cadenza”, ”version”: ”2.13.11”, OrchestralLily possible in the first place by pro- ”template”: ”EK Full ” , viding excellent open source applications for both music and text typesetting. The enormous ”defaults”: flexibility and configurability of both applica- ”title”: ”A{ test for OrchestralLily”, ”composer”: ”Reinhold Kainhofer”, tions (including the possibility to modify the ”composerdate”: ”1977 ”, − internals and implement required features your- self) laid the foundation to turn a small project ”year”: ”2009”, ”publisher”: ”Edition Kainhofer”, into a professional music publishing framework. ”scorenumber”: ”EK 1040” , − The cadenza example used throughout this ”basename”: ”Cadenza”, article was originally written by me, until Ana ” p a r t s ” : [ ”id”: ”Cadenza”, ”piece”: ”A cadenza”, Aleksi´cpointed out several harmonic shortcom- {”piecetacet”: ”Cadenza tazet” , ings and helped me rewrite it. Similarly, Man- } ], fred Schiebel greatly improved my dilettantish ”instruments”: [”FlI”, ”FlII”, ”VI”, ”S”, attempts at producing a piano reduction. ”A”, ”Continuo”], ”vocalvoices”: [”S”, ”A”], References ”scores”: [”Full”, ”Vocal”, ”Choral”], , Creative Commons. 2010. By-nc 3.0 } at license. http://creativecommons.org/ ”scores”: [”Cadenza”], licenses/by-nc/3.0/at/. ” l a t e x ” : , {} } Reinhold Kainhofer. 2010a. Git repository of OrchestralLily. http://repo.or.cz/w/ Running this file through the script generates orchestrallily.git. one definitions file for the music definition, Lily- Pond files for each of the given scores (Full, vo- Reinhold Kainhofer. 2010b. The Orches- cal and choral scores), as well as for each in- tralLily package for lilypond. http:// dividual instrumental part. Each of the scores kainhofer.com/orchestrallily. LilyPond will also have a LATEX file that includes the ti- and LATEX package for professional music tle page, the preface (including the table of con- typesetting. tents, which is exported by LilyPond!), the score Han-Wen Nienhuys and Jan Nieuwenhuizen and optionally a critical report. et al. 2010. GNU LilyPond. http://www. All these files are tied together via a Makefile, lilypond.org/. The music typesetter of the so all one needs to do to create a first version is GNU project. to copy in the music definition and run make.

9 Availability of OrchestralLily The OrchestralLily package [Kainhofer, 2010b] is currently dual-licensed under the Creative

116 Term Rewriting Extension for the Faust Programming Language

Albert Gräf Dept. of Computer Music, Institute of Musicology Johannes Gutenberg University 55099 Mainz, Germany [email protected]

Abstract is free software distributed under the GPL V2+, This paper discusses a term rewriting extension for see http://faust.grame.fr. the functional signal processing language Faust. The This paper reports on the Faust term rewrit- extension equips Faust with a hygienic macro pro- ing extension which equips the language with cessing facility. Faust macros can be used to de- a kind of hygienic macro facility for specify- fine complicated, parameterized block diagrams, and ing complex block diagrams in a more conve- perform arbitrary symbolic manipulations of block nient fashion. Macros are defined by rewriting diagrams. Thus they make it easier to create elab- rules and are expanded away at compile time orate signal processor specifications involving many complicated components. by rewriting terms in the Faust block diagram algebra. The resulting Faust program is then Keywords compiled as usual. This facility has been devel- Digital signal processing, Faust, functional program- oped by the author in collaboration with Yann ming, macro processing, term rewriting. Orlarey, the principal author of Faust. It has al- ready been available in recent Faust versions for 1 Introduction quite some time but has never been documented Faust is a functional signal processing language, anywhere; this paper attempts to fill this gap. which is used to develop digital signal processors handling synchronous streams of sample values. 2 Basic Faust Example It is mostly targeted at audio and music appli- Faust works on sampled signals which are cations at this time. Faust has a formal se- thought of as functions mapping (discrete) time mantics (based on the lambda calculus and a to sample values. Signals can be constant or block diagram algebra) which means that it can time-varying, or they can be control signals em- be used as a specification language for describ- bodied by special elements which are typically ing signal proccessors in an implementation- implemented as GUI, MIDI and/or OSC con- independent way. These specifications are ex- trols, depending on the target architecture. The ecutable, however, and Faust provides sophisti- following listing shows a simple Faust program cated optimizations and compilation to C++ to which implements a sine tone generator: turn the specifications into efficient code which import("music.lib"); can compete with carefully hand-crafted rou- tines. Faust works with an abundance of differ- vol = nentry("vol", 0.3, 0, 10, 0.01); ent platforms and plugin environments such as pan = nentry("pan", 0.5, 0, 1, 0.01); Jack, LADSPA, VST, Pd, Max and several pro- freq = nentry("pitch", 440, 20, 20000, 0.01); gramming languages, just a recompile is enough process = osc(freq) vol : panner(pan); to create code for the various architectures. Last * but not least, the Faust compiler also provides The example features three control elements automatic documentation facilities which pro- for specifying volume, panning and frequency. duce block diagrams in SVG format and LATEX The sine signal created by the osc function documents. gets multiplied by the volume and then passed Faust has been discussed at the Linux audio through a panner which turns it into a stereo conference and elsewhere on various occasions output signal. The process function is the [3; 1] and is quickly gaining traction in the sig- “main” function of a Faust program; it denotes nal processing community. A description of the the signal processing function realized by the formal underpinnings can be found in [2]. Faust program. In this case, the process function has

117 no arguments and thus the implemented signal least one of the macro arguments must be a non- processor has no inputs. In general, any Faust trivial pattern (i.e., not simply a variable), oth- function (including process) can have an arbi- erwise the definition will be taken to be an ordi- trary number of input and output signals. nary function definition. Also note that macro definitions may involve multiple rewriting rules, 3 The Term Rewriting Extension as in the example above. Faust signal processors are essentially terms in Just like Faust’s functions are nothing but the Faust block diagram algebra (BDA) which is named lambdas, there are also anonymous described elsewhere [2]. Briefly, the BDA con- macros which take the form of a case expres- sists of algebraic operations (written as infix and sion listing all the argument patterns and the postfix operators) which specify various combi- corresponding substitutions. For instance, the nations of signal processing functions, in partic- above definition of the factorial macro is actu- ular: ally equivalent to: fact = case { (0) => 1; (n) => n*fact(n-1); }; • f’ and f@n delay f by one and a given num- ber n of samples, respectively. 4 Macro Evaluation by Rewriting • f:g and f,g specify the serial and parallel Rewriting rules are applied by matching them to composition of two signal processing func- BDA terms on the right-hand side of function tions. definitions. To these ends, the rules are con- • f<:g and f:>g split and merge (mix) the sidered in the order in which they are written in outputs of f and route them to correspond- the Faust program, binding variables in the left- ing inputs of g. hand side of rules to the corresponding values. Evaluation proceeds from left to right, inner- • f~g combines f and g to a loop with implicit most expressions first. Thus, in the above ex- 1-sample delay. ample the term fact(3) in the definition of the process function will be rewritten to the con- In addition, the usual arithmetic, logical and stant 6, using the following reduction sequence: comparison operators (+, *, etc.) as well as fact(3) 3*fact(3-1) 3*fact(2) mathematical functions (exp, sin, etc.) work on → → 3*(2*fact(2-1)) 3*(2*fact(1)) signals in a pointwise fashion. Thus, e.g., f+g is → → 3*(2*1*fact(1-1))) 3*(2*(1*fact(0))) the signal obtained by adding each sample of the → → 3*(2*(1*1))) 6 signals f and g. → → Term rewriting provides us with a means to Note that in this process the Faust compiler manipulate these BDA terms in an algebraic also evaluates constant signal expressions such fashion at compile time. For instance, the fol- as 3-1 2 and 3*(2*(1*1))) 6. → → lowing two rewriting rules define a macro named Another example, which employs pattern fact which implements the factorial: matching on BDA operations, is the following serial fact(0) = 1; little macro which turns parallel compo- fact(n) = n*fact(n-1); sitions into serial ones: process = fact(3); serial((x,y)) = serial(x) : serial(y); The last line in this program gives the usual serial(x) = x; process = serial((sin,cos,tan)); process function of a Faust program. The re- sulting signal processor outputs the constant The result is the same as if you had written signal 3! = 6. the serial composition sin:cos:tan, cf. Fig. 1. Note that Faust doesn’t have a special key- As the above examples show, macro defini- word for denoting macro definitions, instead tions can also be recursive, making it possible to these are flagged by employing patterns on the analyze and build arbitrarily complicated BDA left-hand side of such a rule, which can be con- terms. Here is another example, which employs stants (such as the constant 0 in this example), a variation of the fold operation (customary in variables (such as n) and arbitrary expressions functional programming libraries) to accumu- formed with these and the BDA operations. late values. In this case the values are actu- That is, a pattern is an arbitarily complex BDA ally signals, produced by a function (or macro) expression involving variables and constants. At x which maps a running index to a signal. This

118 #define F(x) { int y = x+1; return x+y; }

Given this definition, F(y) expands to { int y = y+1; return y+y; } which is usu- ally not what you want. Faust macros do not have any such pitfalls since they work on the internal representation of BDA terms rather than the program text. This means that variables on the left-hand side of a macro rule are always bound lexically, i.e., according to the block scoping rules defined by Figure 1: Parallel-serial conversion macro. the Faust language. Thus a Faust macro like F = case {(x) => x+y with { y = x+1; };} allows us to emulate Faust’s sum function which (which is roughly equivalent to the C macro adds up an arbitrary collection of signals: above, minus the name capture) will work correctly no matter what gets passed for the fold(1,f,x) = x(0); macro parameter x (even if it is a signal named fold(n,f,x) = f(fold(n-1,f,x),x(n-1)); y fsum(n) = fold(n,+); ). Macros with this desirable property are also called hygienic macros in the programming f0 = 440; a(0) = 1; a(1) = 0.5; a(2) = 0.3; language literature. h(i) = a(i)*osc((i+1)*f0); v = hslider("vol", 0.3, 0, 1, 0.01); 6 Example: A Systolic Array process = v*fsum(3,h); The following program illustrates how to use The resulting signal processor (a simple addi- macros in order to abbreviate complicated, pa- tive synthesizer which adds up three harmonics), rameterized block diagrams. The example we is shown in Fig. 2. Note that the three oscilla- consider is a kind of “systolic array”, a grid of tors and their amplitudes are also defined using binary operations organized in a mesh (Fig. 3). rewriting rules in this example.

Figure 3: Systolic array.

The Faust program used to produce this lay- out is given below. g(1,f) = f; Figure 2: Sum macro example. g(m,f) = (f, r(m-1)) : (_, g(m-1,f));

h(1,m,f) = g(m,f); 5 A Note on Macro Hygiene h(n,m,f) = (r(n+m) <: (!,r(n-1),s(m), _ It is worth noting the differences between ( ,s(n-1),r(m) : g(m,f)))) : _ Faust’s rewriting-based macros and the textual (h(n-1,m,f), ); macros in languages such as C. In the latter case, r(1) = _; r(n) = _,r(n-1); // route through macros are just textual substitutions which is s(1) = !; s(n) = !,s(n-1); // skip very flexible but also has various shortcomings, most notably name capture. For instance, take f = + <: _,_; // sample cell function the following C macro definition: process = h(2,3,f);

119 Note that the macro g constructs a single row The signals y2, . . . , yn are then recombined with of the mesh for the given number m of grid cells the first m outputs x10 , . . . , xm0 of g, and the re- and the given function f which takes two input sult is passed to h to recursively construct the signals and produces two output signals. The remaining mesh of n 1 rows, yielding the out- − macro h applies g repeatedly in order to build an put signals x100, . . . , xm00 ; yn0 , . . . , y20 . Tacking on n m mesh from its rows. Two helper macros r the remaining output signal y of the call to g × 10 and s perform the necessary routing between the gives us the final result x100, . . . , xm00 ; yn0 , . . . , y10 . components. These employ two basic elements of the BDA, ‘_’ which simply routes through a single input to a single output (i.e., _(x) = x), and ‘!’ which denotes a “sink” for a single input (!(x) = ()). (A more detailed explanation of the construction is given below.) The given process function illustrates the use of the h macro to construct a 2 3 “accumulator” × mesh from the cell function f = + <: _,_ which just adds its two inputs and sends the computed sum to its two outputs. A Pd patch showing this signal processor in action is shown in Fig. 4.

Figure 5: Recursive construction in the systolic array macro. Figure 4: Systolic array example in a Pd patch. There are still some ways in which the h macro can be improved. For programming To see how the recursive construction works, convenience and simplicity, h takes its inputs note that there are two equations for the h as y1, . . . , yn; x1, . . . , xm, with the row inputs macro. The first equation deals with the case coming first, and produces the row outputs of one row. This is handled by just invoking yn0 , . . . , y10 in reverse order. In fact, this layout the macro g which in turn applies the function of arguments and results is quite convenient in f once (digesting the single row input y1 and the the Pd patch. However, as a programmer using first column input x1) and invokes itself recur- this macro in your Faust programs you’d prob- sively on the first output of f and the remain- ably prefer a macro which maps an m + n tuple ing inputs x2, . . . , xm (which are routed through of input signals x, y to a corresponding tuple with the r macro). The base case m = 1 is x0, y0 of output signals in the “right” order. For- treated in the first equation for g which just tunately, adding this functionality as separate yields f itself. pre- and postprocessing stages by making good The interesting case is the second equation use of the r and s macros is fairly easy. We leave for h in which we deal with more than one this as an exercise to the interested reader. row, cf. Fig. 5. Here the input signal, consist- ing of n row inputs y1, . . . , yn and m column 7 Conclusion inputs x1, . . . , xm, gets split up in two parts. The term rewriting extension sketched out in The expression !,r(n-1),s(m) gives y2, . . . , yn this paper equips Faust with a simple macro pro- which is simply routed through. The expression cessing facility which is useful to define abbrevi- _,s(n-1),r(m) yields y1, x1, . . . , xm which is ations for complicated, parameterized block di- piped into g, producing a single row of the mesh. agrams, and to perform arbitrary symbolic ma-

120 nipulations on block diagrams in the prepro- cessing stage of the Faust compiler. To these ends, terms in the Faust block diagram alge- bra (BDA) are rewritten using term rewriting rules. Evaluating macro invocations using the provided rules is performed by Faust at com- pilation time. Faust’s term rewriting macros are structured (they operate on term structures rather than program text) and hygienic, i.e., all bindings of macro variables are performed lex- ically, and thus Faust macros are not suscepti- ble to “name capture” which make less sophis- ticated macro facilities in languages such as C bug-ridden and hard to use. There are still some shortcomings in Faust’s macro system which will hopefully be addressed in the future:

• Faust does its own normalizations of BDA terms “under the hood” and thus it can be hard to figure out exactly which patterns are needed to rewrite certain constructs of the Faust language. • Only plain term rewriting rules are sup- ported at this time. Adding conditional (i.e., guarded) rules would make the system more versatile. • It would be useful to provide an interface to Faust’s block diagram optimization pass so that custom optimization rules could be implemented using macros.

References [1] A. Gräf. Interfacing Pure Data with Faust. In 5th International Linux Audio Confer- ence, pages 24–31, Berlin, 2007. TU Berlin. [2] Y. Orlarey, D. Fober, and S. Letz. Syntac- tical and semantical aspects of Faust. Soft Computing, 8(9):623–632, 2004. [3] Y. Orlarey, A. Gräf, and S. Kersten. DSP programming with Faust, Q and SuperCol- lider. In 4th International Linux Audio Con- ference, pages 39–47, Karlsruhe, 2006. ZKM.

121 122 Openmixer: a routing mixer for multichannel studios

Fernando Lopez­Lezcano, Jason Sadural CCRMA, Stanford University Stanford, CA, USA [email protected], [email protected]

Abstract in mind. It was only last year (2009) that the project really started taking off thanks to the The Listening Room at CCRMA, Stanford enthusiasm and dedication of Jason Sadural (a University is a 3D studio with 16 speakers (4 hang student at CCRMA) and the support of Jonathan from the ceiling, 8 surround the listening area at Berger and Chris Chafe. Both Jason and Fernando ear level and 4 more are below an acoustically have been working on hardware and software transparent grid floor). We found that a standard commercial digital mixer was not the best interface design and coding since then. for using the studio. Digital mixers are complex, have an opaque interface and they are usually geared towards mix­down to stereo instead of efficiently routing many input and output channels. We have replaced the mixer with a dedicated computer running Openmixer, an open source custom program designed to mix and route many input channels into the multichannel speaker array available in the Listening Room. This paper will describe Openmixer, its motivations, current status and future planned development.

Keywords The Listening Room Surround sound, studio mixer, spatialization, ambisonics. 2 Digital Mixers and the Listening Room 1 Introduction Existing digital mixers are very versatile but are The Listening Room was created when The generally not a good fit for a situation that needs Knoll (the building that houses CCRMA) was many input and output channels. Most of them are completely remodelled in 2005. It is a very quiet designed for mix­down to stereo or at most for studio with dry acoustics designed for research, mix­down to a fixed 5.1 or 7.1 surround composition and diffussion of sounds and music in environment. If more output channels are needed a full 3D environment. Digital mixers proved to be expansion slots have to be used and the physical less than optimal for using the room and the size and price of the mixer goes up very fast. It is Openmixer project was started to write an also difficult to find mixers with the capabilities application from scratch that would be a better we needed and without fans. match for our needs. Another usability problem of digital mixers is The need for an alternative solution to digital preset management. While it is possible to save mixers was recognized by Fernando Lopez­ and load the mixer state to presets (a feature Lezcano in 2006/2007 and he did some research absolutely necessary in our multi­user shared into existing software based alternatives together environment), digital mixers don't have a user with Carr Wilkerson at CCRMA. None of the database or per­user password protection, at least options that were found quite met the goals we had in mixers that are in our price range. Presets can be

123 protected but there is nothing to prevent someone 4 A solution from changing other user's presets by mistake. One solution is to use a general purpose Digital mixers have opaque interfaces, with computer and off the shelf components as the heart options sometimes buried deeply inside a layered of the mixer, and write a program that orchestrates menu structure. It is sometimes very hard to know the process of routing, mixing and decoding the why a particular patch is not working. audio streams. A Linux based workstation running We originally installed a Tascam DM3200 in the Jack [1] and Planet CCRMA is the high Listening Room, using two expansion cards to performance, low latency and open platform in barely provide enough input/output channels (the which we based the design. 3200 has 16 buses and we would like to have a Other projects have implemented software maximum of 24 mixer outputs). In addition to the mixing and sound diffusion in a general purpose generic digital mixer problems we found out that computer, examples include the very capable switching presets in the 3200 would cause clicks in SuperCollider based software mixer written by the analog outputs of the expansion boards that Andre Bartetzki [2] at TU Berlin. Or the ICAST were driving the speakers. How this can happen in [3] sound diffusion system at the Louisiana State a professional mixer is beyond our understanding. University. Or the BEAST system [4] at 3 Requirements Birmingham. Our needs are much more modest and require a What would be the ideal sound diffusion system system that is based on a single computer that can for the Listening Room? directly boot into a fully working mixer system. The user interface should be very simple. 4.1 The hardware Nothing should be hidden inside hierarchical menus. All controls should directly affect the The Listening Room is a very quiet room with diffusion of sound. It should be easy to learn to use dry acoustics. It is essential it remains quiet so one it (intuitive). It should be possible to load and save of our fanless workstations [5] is used as the presets using the same authentication system that is Openmixer computer. It is a high performance used to login in all computers at CCRMA. It quad core computer with 8Gbytes of RAM and should also be possible to control all aspects of the should be able to cope with all reasonable future mixer remotely over the network using OSC (Open needs for mixing and processing (see Figure 1 for Sound Control). an overview of the hardware). The system should support a routing and level 4.1.1 Input / output requirements control matrix with many multichannel input streams (up to 24 channels wide) coming from This is what we need to have available: 16 different types of sources (analog, digital, network) channels of analog line level balanced input, 8 and many analog outputs (up to a maximum of 24 channels of microphone level input, 8 channels of channels). The system should be able to also deal line level audio input for a media player (DVD/Blu with multichannel input streams already encoded Ray), 3 ADAT lightpipes for direct connection of in Ambisonics, and give the user a choice of the audio workstation in the Listening Room predefined decoders calibrated to the speaker array (another custom made fanless computer), 3 ADAT installed in the room. lightpipes available for external computers, and The system does not need to have audio dedicated Gigabit Ethernet network jacks with a processing tools such as limiters, compressors, DHCP server and netjack or jacktrip software for equalization, effects, etc. It should be as multichannel network audio sources (up to 24 transparent as possible. Anything that complicates channels per source). All external connections the user interface should be trimmed down. (analog, ADAT, network) should be available in a It has to be a stand­alone system, always rack mounted patch panel. running, and users should not need to login into a 4.1.2 Audio input / output hardware computer to start a custom mixer application (for example they should be able to play multichannel The core of the audio interface is an RME MADI content directly from the dvd or blue ray player). PCI express soundcard connected through MADI to a Solid State Logic XLogic Alpha­Link MADI

124 AX box. This combination alone provides for 64 The RME MADI card is the master clock source i/o channels from the Openmixer computer for the system, everything else slaves to it through including 24 high quality analog inputs and outputs Word Clock or ADAT Sync. and 3 ADAT lightpipes. 4.1.3 User interface hardware Additional ADAT lightpipes for external computers are provided by an RME RayDAT The only (for now) interface to Openmixer is sound card. This card is currently not installed as through a couple of USB connected dedicated we are waiting for the newer implementation of the MIDI controllers. The current controllers are a RME driver for Linux ­ the existing driver does not Behringer BCF2000 for the main controller and a support the word clock daughter card that we need BCR2000 for the secondary routing controller. to use to synchronize the RayDAT with the main Those were selected based on having enough RME MADI card using Word Clock. functionality for our needs and being very cheap The 24 analog outputs are reserved for direct (and thus easy to replace if they break down). speaker feeds. 16 of the 24 analog inputs are Other more robust controllers might be used in the connected to a front panel analog patchbay and the future. As Openmixer is just software it is easy to other 8 are connected to the media player. switch to a slightly different controller with simple Currently 2 of the 3 ADAT lightpipes go to the modifications to the source code. audio workstation and 1 ADAT lightpipe is fed 4.2 The software from an 8 channel mic preamp with connections to the analog patchbay (this setup will change when 4.2.1 Choosing a language and environment the RayDAT card is again part of the system). Initial proposals included a simple implementation using Pd but we did not go that way because a system based on a text language seemed easier to be debugged and could be easily booted without needing monitors, keyboard or a mouse to interact with it. Another option that was explored was to program Openmixer in C and/or C++ and borrow code from other compatible open source projects or libraries (Openmixer is released under the GPL license). This would have given us complete control at a very low level and would probably be the most efficient implementation, but it would have been more complex to code compared with our final choice. This approach was discarded and simplicity won over efficiency. Another option briefly discussed was to actually use Ardour as the engine for the mixer (after all that is what Ardour does best!), but adapting such a complex project to our needs would have also been very time consuming. In the end we settled for the SuperCollider [6] language as it is text based and can handle most of the needs of the project by Figure 1: Hardware Overview itself without the need to use any other software. It can handle audio processing and routing very The computer has two ethernet interfaces, one easily and efficiently, it can send and receive MIDI connects to the Internet and the other provides and OSC messages, and even use the SwingOSC DHCP enabled dedicated Gigabit switch ports for SuperCollider classes for a future GUI display. Of laptops or other wired computer systems that course Jack is used as the audio engine, as well as connect to Openmixer through the network using several support programs that are started and either Jacktrip or Netjack. controlled from within SuperCollider (amixer,

125 jack_connect, jack_lsp, ambdec_cli, jconvolver, etc). 4.3 The user interface The current interface is layered on top of the capabilities of the BCF2000/BCR2000 controllers. See Figure 2 for an overview of the main controls.

Figure 3: Normal mode

3. Turn up individual Speaker Volume Knobs Figure 2: User interface {3} to adjust speakers levels for that channel. There are currently two modes of operation 4. Repeat steps 2 and 3 for each input already implemented and being used: channel. For input sources that have more 4.3.1 Normal Mode than 8 channels use the Channel Strip Normal Mode (see Figure 3) routes and controls buttons {4} in the BCR2000 to access volume levels of any channel of any input source additional input channel faders up to a to a single speaker or a set of speakers. Each input maximum of 24 input channels for each channel from a given input source is associated source (a wider control surface would not with a fader in the BCF2000, and can be routed to need this extra step). a single speaker or a set of speakers through the 5. Control overall volume for that source speaker volume knobs in the BCR2000. The levels with the Master Volume Control {5} for of a particular input channel can be controlled that source. through the fader {2}, the volume through each speaker can be controlled though the Two keys allow you to set all levels to zero corresponding speaker knob, {3} and a master (Reset) or load a preset for the selected source volume {5} can control all channels from a given (Preset) in which individual input channels are pre­ source at the same time. routed to individual output speakers. Two more keys also allow you to select the Diffusion of sources is simple to learn and can sampling rate of the whole system (currently only be done as follows: 44.1 KHz and 48 KHz are supported). It should be noted that input sources are not 1. Press the Source Button {1} for the desired mutually exclusive, they route audio all the time to input (audio workstation, analog inputs, the speakers. To use more than one at the same microphone inputs, media player, network time just select several sources separately and sources, etc). adjust the levels appropriately. All faders and 2. Press one of the Active Fader Buttons {2} controls will reflect the current state of a source to activate it, or move the selected fader after the corresponding source button is pressed. {2}. The Active Fader Button will become 4.3.2 Ambisonics Decoder Mode lit and the fader can be used to set the In “Ambisonics Decoder Mode” (see figure 4) overall volume for that channel. the assumption is made that the multi­channel

126 input source is an Ambisonics encoded audio 4.3.3 Ambisonics Decoders stream (arrangement of input channels and scaling Openmixer uses ambdec_cli [7] and jconvolver coefficients are fixed for each source). By pressing [8] as external Jack programs that perform the the Ambisonics Mode button while in a “normal Ambisonics decoding. Several decoders tuned to mode” source, all channels of audio from that input the speaker array are currently available: source will be directly routed to an Ambisonics decoder with the same level, and from there to the 1. 3D 2nd order horizontal, 1st order vertical speaker array. 2. 2D third order horizontal All faders for that source are disabled and set to 3. 2D 5.1 optimized decoder zero except for fader #1 which is the master 4. Stereo (UJH decoding done through volume control for all channels of the encoded jconvolver) input source (if any of the faders besides fader #1 are adjusted, it will automatically be adjusted back This enables the composer or researcher to to zero after a few seconds). compare the rendering of the same Ambisonics encoded stream when it is decoded through several different decoders with varying order and capabilites, all properly tuned to the room and speaker arrangement. When more speakers are added in the future the vertical order of the 3D decoder will be increased. 4.4 Network sources An extremely simple and efficient way to connect external multichannel sound sources to the mixer is through a network connection. All laptop and desktop computers have a wired interface for high speed network connectivity and it would be perfect to use that for supplying audio to the mixer through the network (only one cable to connect). Figure 4: Ambisonics mode There are currently two options: 4.4.1 Jacktrip Routing audio to the ambisonics decoders can be The Soundwire [9] project at CCRMA has done as follows: created a software package (Jacktrip [10]) that is used for multi­machine network performance over 1. Select an input source {1} as you would in the Internet. It supports any number of channels of normal mode. bidirectional, high quality, uncompressed audio 2. Press the Ambisonics decoder button {2} signal streaming. It would be ideal for our purposes to route that input source directly to the except that it currently does not do sample rate Ambisonics decoders. synchronization between the computers that are 3. Adjust fader #1 {3} as the master volume part of a networked performance, so that if there is control for the multichannel input source. any clock drift between them there will be periodic All other fader or volume knobs are clicks when the computers drift more than one deactivated. buffer apart. 4. Choose the decoder {4} you would like to This is not of much concern in a remote network use. performance with high quality sound cards in both 5. Master mode volume controls {5} are set ends, as the quality of the network will usually to unity by default and have exactly the create more significant problems, but it is crucial in same functionality as in Normal Mode. a studio environment where all sources have to be in sync at the sample level (i.e.: no clicks allowed).

127 Still, Jacktrip is going to be supported with the The script starts the SuperCollider sclang idea of providing a remote (i.e.: not within the language interpreter, which takes care of setting up Listening Room) connection for jam sessions and and monitoring the rest of the system. The perl small telepresence concerts. script waits for the interpreter to finish (it should never exit in normal conditions) and restarts it if 4.4.2 Netjack necessary. This can deal with unexpected problems Netjack [11] is an ideal solution for a local that could cause SuperCollider's sclang to stop network connection as the client computer does not prematurely and it also implements a “system use its own sound card at all (but it is possible to restart” function useful for development and do that locally in the client computer with debugging the system (a user can “touch” a file resampling being used to synchronize the two that triggers an orderly shutdown of the clocks), but just sends samples to an external Jack SuperCollider portion of Openmixer, and removing sink in the Openmixer computer. It is expected it that file makes the perl script start it again). The will quickly become the preferred way to connect perl script also logs the output of the SuperCollider laptops and other external computers to the program for debugging. Openmixer system (as long as they can run Jack Once the SuperCollider sclang interpreter starts, and Netjack). it takes control of the rest of the Openmixer startup The implementation of the Netjack connection in process. A number of external programs are used Openmixer was somewhat delayed due to the to control the hardware. The mixer of the RME changing landscape of Netjack. The Openmixer card is initialized using amixer, and jackd is started computer is currently running Jack2 which had its with the appropriate parameters (currently jack own separate Netjack code base ­ one that relied on runs with 2 periods of 128 frames – aproximately automatic discovery of the netjack server through 5.3 mSecs of latency ­ but with the current cpu and multicast (which we don't currently have enabled at load it should be possible to run at 64 frames if the CCRMA). But there is now a backport of the Jack1 lower latency is deemed necessary). After Jack is implementation of netjack to jack2 (netone) that up and running two instances of the SuperCollider would appear to be perfect for our needs. synthesis server scsynth are started (to spread the The second ethernet interface of the Openmixer audio processing load between different cores of computer is connected to a small gigabit switch the processor), the Ambisonics decoders that provides local network jacks. The computer is (ambdec_cli and jconvolver) are started and finally set up to provide four DHCP ip addresses on that the jack_netsource processes for netjack inputs are ethernet port and the firewall is structured to also spawned. Everything is connected together in the route traffic to the Internet through the primary Jack world using calls to jack_connect. Finally the ethernet interface (NAT). SuperCollider MIDI engine is initialized. Four jack_netsource processes are spawned by Once audio and midi are initialized the SuperCollider, each one waiting for its DHCP ip SuperCollider Synths that comprise the audio address to become active with a netone jack client. processing section of Openmixer are started and When that happens the audio connection becomes finally the SuperCollider MIDI and OSC active and Openmixer can control the volume and responders that tie user controls to actions are routing of all channels for that netjack source. For defined and started. At this point Openmixer is simplicity all potential network connected sources operational and the user can start interacting with are currently mixed in parallel with equal gains it. (otherwise the user interface side of Openmixer Changing sampling rates is done by shutting would start to get more complicated). down all audio processing and Jack, and restarting everything at the new sampling rate. 5 Structure of the software Periodic processes check for the presence of the The software is written almost exclusively in USB MIDI controllers and initializes and SuperCollider. A boot time startup script written in reconnects them if necessary (that makes it perl is triggered when the computer enters into possible to disconnect them and reconnect them to unix run level 5. Openmixer without affecting the system).

128 Openmixer is currently around 2000 lines of (except for the movement of the faders at the very SuperCollider code. end if their position is different from the default).

Much needs to be done, some parts of the code 7 Future need work and reorganizing, and code needs to be Saving and recalling presets is a very important added to make the system more reliable. All feature and is difficult to implement properly. As a external programs will be monitored periodically stopgap measure Jason has written a very simple by SuperCollider processes and restarted and OSC responder that enables a user to send OSC reconnected if necessary. Depending on which messages that save and restore named presets from program malfunctions the audio processing may be a common directory. For now the communication interrupted momentarily, but at least the system is unidirectional so it is not possible to list existing will recover automatically from malfunctions. The presets, and they can't be protected or locked, but it worst offender would be Jack, if it dies the whole enables users to save and restore a complex set­up system would need to be restarted (as it also which would otherwise need to be redone from happens when, for example, the sampling rate of scratch each time a user starts using Openmixer. the system is changed). If SuperCollider itself dies Without a display and a keyboard it is difficult to the perl script that starts up the system will restart design an interface that is easy to use (because it. The system is expected to be very stable as the ideally preset management should be tied to the computer in which Openmixer is running is LDAP user database for authentication). One dedicated to that purpose alone (ie: users are not possible option to explore would be to use a web logged in and running random software in it). server interface in the Openmixer computer so that preset management can be accessed through a web 6 Future Development browser in any computer connected to the network in the Listening Room. A user would need to login 6.1 Encoding to Ambisonics and VBAP with the CCRMA user name and password to be This is a different mode of operation in which able to access the web site. After that presets individual channels of a given source can be would be stored and recalled from a dot positioned in space through Openmixer instead of subdirectory of the home directory of the user. This being spatialized in the source through a separate would provide adequate functionality without program. In this case the second controller adding a keyboard, mouse and display to the (BCR2000) is used to set elevation and azimuth system and without relying of special software in angles in 3D space for each input channel instead the user's computer. of controlling the volume of individual speakers. A GUI with feedback for level metering, state of This has already been implemented for the case the mixer, etc., would also be very useful (and of Ambisonics but we have not yet made it could potentially make it easier to code a preset available to end users. management interface). An alternative implementation would do the Once the code stabilizes we think it would also same thing but using VBAP for spatialization be very useful to add an Openmixer system to our instead of Ambisonics. small concert hall, the Stage. All parameters of the spatialization can also be controlled though OSC from a computer connected 8 Additional Applications to the network. The fact that Openmixer is just a program 6.2 Start-up sequence enables it to be customized for projects that have very special needs. Here is an example: The start­up sequence of the SuperCollider code A study into archaeological acoustics has led needs to be changed so that the MIDI subsystem is Stanford students into researching and questioning initialized at the very beginning. That will enable the implications of acoustical environments in a us to use the control surfaces to give feedback to 3000­year old underground acoustic temple in the user while Openmixer starts. Currently there is Chavín de Huántar, Perú. A group of students no way to know how far in the start­up sequence explored the effects of reverberant acoustical the software is, and when it is available for use features of the temple and ran a series of traditional

129 social scientific experiments to see what the effects [5] Fernando Lopez Lezcano, LAC2009, “The of a sound environment are on an individual's Quest for Noiseless Computers” ability to complete different types of tasks. Students used a customized version of Openmixer [6] SuperCollider, to connect piezo microphones around an apparatus in which subjects were asked to perform tasks on. http://www.audiosynth.com/ The signals were convolved (using jconvolver as http://supercollider.sourceforge.net/ an external program run from within SuperCollider) with synthetic signals emulating [7] Fons Adriaensen, AmbDec, An Ambisonics certain 3d aspect of the reverberation and presented Decoder, http://www.kokkinizita.net/linuxaudio/ to subjects in different spatialized settings during the tasks. [8] Fons Adriaensen, Jconvolver, A Convolution 9 Conclusions Engine, http://www.kokkinizita.net/linuxaudio/ So far Openmixer has improved the usability of the Listening Room and has opened the door to [9] The Soundwire Project, interesting ways of interacting with space. In https://ccrma.stanford.edu/groups/soundwire/ particular it now enables very simple multichannel connection through a network jack. Hopefully it [10] Jacktrip, http://code.google.com/p/jacktrip/ will keep making the Listening Room a productive environment for research and music production. Openmixer is released under the GPL license, [11] Netjack, http://netjack.sourceforge.net/ and is available from: https://ccrma.stanford.edu/software/openmixer

10 Acknowledgements Many thanks for the support of Chris Chafe and Jonathan Berger at CCRMA for this project. Many many thanks also to the Linux Audio user community for the many very high quality programs that have made OpenMixer possible.

References [1] Stephane Letz, Jack2, http://www.grame.fr/~letz/jackdmp.html

[2] Andre Bartetzki, LAC2007, “A Software­based Mixing Desk for Acousmatic Sound Difussion” (http://www.kgw.tu­ berlin.de/~lac2007/descriptions.shtml#bartetzki)

[3] Beck, Patrick, Willkie, Malveaux, , SIGGRAPH2009, “The Immersive Computer­ controlled Audio Sound Theater”

[4] BEAST (Birmingham ElectroAcoustic Sound Theatre), http://www.beast.bham.ac.uk/about/

130 Best Practices for Open Sound Control

Andrew Schmeder and Adrian Freed and David Wessel Center for New Music and Audio Technologies (CNMAT), UC Berkeley 1750 Arch Street Berkeley CA, 94720 USA, andy,adrian,wessel @cnmat.berkeley.edu.edu { }

Abstract 1.3 Format Structure The structure of the Open Sound Control (OSC) OSC streams are sequences of frames defined content format is introduced with historical con- with respect to a point in time called a timetag. text. The needs for temporal synchronization and The frames are called bundles. Inside a bundle dynamic range of audio control data are described are some number of messages, each of which in terms of accuracy, precision, bit-depth, bit-rate, and sampling frequency. Specific details are given represent the state of a sub-stream at the for the case of instrumental gesture control, spa- enclosing reference timetag. The sub-streams tial audio control and synthesis algorithm control. are labelled with a human-readable character The consideration of various transport mechanisms string called an address. In a message, the used with OSC is discussed for datagram, serial address is associated with a vector of primitive and isochronous modes. A summary of design data types that include common 32-bit binary approaches for describing audio control data is encodings for integers, real numbers and text shown, and the case is argued that multi-layered (Figure 1). information-rich representations that support mul- tiple strategies for describing semantic structure are Unlike audio data, which is sampled regularly necessary. on a fixed temporal grid, bundles may contain mixtures of sub-streams that are sampled at dif- Keywords ferent or variable rates. Therefore, the number audio control data, signal quality assurance, best of messages appearing in a bundle may vary practices, open sound control depending on what data is being sampled in that moment. 1 Introduction 1.1 Definition OSC Stream Open Sound Control (OSC) is a digital media Bundle Bundle Message... content format for streams of real-time audio (...) control messages. By audio control we mean any time-based information related to an audio OSC Bundle stream other than the audio component itself. Bundle NTP Timestamp Encapsulated Message(s) This definition also separates control data from Identifier Seconds Seconds Message #1 (...) Fraction stream meta-data that is essentially not time- #bundle Length OSC Message dependent (e.g. [Wright et al., 2000]). Natu- rally such a format has application outside of audio technology, and OSC has found use in OSC Message domains such as show control and robotics. Address Data

Typetags Arguments 1.2 What is Open /foo/bar ,ifs 1, 3.14, "baz" It should be noted that OSC is not a standard as it does not provide any test for certifica- tion of conformance. The openness of Open Sound Control is that it has no license require- Figure 1: Structure of the OSC content format ments, does not require patented algorithms or protected intellectual property, and makes no 1.4 History strong assertions about how the format is to be OSC was invented in 1997 by Adrian Freed and used in applications. Matt Wright at The Center for New Music and

131 Audio Technologies (CNMAT), where its first 2 Temporal Audio Control use was to control sound synthesis algorithms 2.1 Instrumental Gestures in the CAST system using network messaging (The CNMAT Additive Synthesizer Toolkit) Musical instrumental gestures are actuations of [Adrian Freed, 1997]. CAST was implemented a musical instrument by direct human control for the SGI Irix platform, which was one of the (typically by kinetic neuro-muscular actuation). first general purpose operating systems to pro- The transduction of a physical gesture into a vide reliable realtime performance to user-space digital representation requires measurement of applications [Freed, 1996] [Cortesi and Thomas, the temporal trajectory of all relevant dimen- 2001]. This capability was a key influence in sions of the physical system such as spatial the design of the OSC format in particular with position, applied force, damping and friction. respect to the inclusion of timestamped bundles An assessment of the performable dynamic that enable high quality time synchronization range with respect to each physical dimension of discrete events distributed over a network. is beyond the scope of this document, however, Following the success of the CAST messaging in a somewhat general way it is possible to system, the protocol was refined and published estimate the quantity of temporal information online as the OSC 1.0 Specification in 2002 contained in an isolated sub-stream of a musical [Wright, 2002]. performance. 1.5 Best Practices 2.1.1 Temporal Information Rate Considering OSC as a content format alone does It is estimated that the smallest controllable not account for how it can and should be used temporal precision by human kinetic actuation in applications. A larger picture exists around is 1 millisecond (msec), based on an example of the needs and requirements of sound control in an instrumental gesture called the flam that is general with respect to the signal quality and known to have a very fine temporal structure description of control data. In the past these [Wessel and Wright, 2002]. This limit of 1msec needs have been underestimated by hardware coincides with the threshold for just noticable and software designs, leading to less than ideal difference in onset time between two auditory results. Research into the needs of audio control events. data are summarized in this paper along with The flam technique in drumming is a method recommendations for how to best apply OSC of striking the surface of a drum with two features so that the requirements are satisfied. sticks so that the relative arrival time of each stick modulates the timbre (spectral quality) 1.6 Systems Integration of the resulting sound. Because of the very In the context of the Open System Interconnec- close temporal proximity of the events, a human tion Basic Reference Model (OSI Model), OSC listener perceives them to be a single event is classified as a Layer 6 or Presentation Layer where the spectral centroid of the timbre is entity. correlated to the temporal fine structure. This However in the larger scope of how OSC is inter-onset time can be reliably controlled by a used, other layers are considered as part of the trained performer between 1-10 msec. practice. Related topics and their associated The flam is an example of an instrumental layer are listed in Table 1. gesture with temporal precision of 1 msec. However the temporal accuracy of instrumental OSI Layer Topic in OSC Practice gestures is at least an order of magnitude larger. 7 Application Semantics, Choreography The tolerable latency for performance of music 6 Presentation OSC Format requiring very tight rhythmic synchronization 5 Session Enumeration, Discovery between two players has been measured to be 4 Transport Latency, Reliability 10 msec [Chafe et al., 2004]. Ignoring the 3 Network Stream Routing complications of fixed versus variable delays, it 2 Frame Hardware Clocks, Timing seems reasonable to assume 10 msec as an esti- 1 Bit Cabling, Wireless, Power mate of temporal accuracy in an instrumental gesture event. And finally, it is also reasonable Table 1: OSC related topics in the context of to suppose that trained musicians can perform associated OSI Model layers event rates up to 10 events per second in a single sub-stream (polyphonic streams are not

132 considered here). to the simple fact that if a spatial parameter The temporal information present in musical is modulated with a high frequency, perceptual events can be estimated from these numbers. fusion takes place yielding a transformation of The information in bits, also called the index the source in some way other than the intended of difficulty in Fitt’s Law, is calculated from outcome. For example a virtual source with the ratio between effective target distance ρ and rotating dipole directivity pattern ceases to be standard error of the effective target width σ: perceived as rotating for frequencies above 10 hz [Schmeder, 2009a]. Similarly if the location ρ I = log 1+ bits (1) of a virtual source alternates between two po- 2 σ ! " sitions at a high rate, the observer perceives a Suppose a musician performs the double- stationary source with a wider apparent source strike flam at a rate of 10 hz. This is equivalent width. to two tasks: 1) placement of events with an However, spatial audio control data requires average separation of 100 msec and standard very high temporal precision to avoid phase arti- error of 10 msec, 2) placement of two sub-events facts in multi-loudspeaker arrays. AES11-2003 with a separation of 10 msec and error of 1 recommends a between-channel synchronization msec. Assuming the dual-task is repeated at error of +/- 5% per sample frame [Audio Engi- 10 hz then the total information rate is, neering Society, 2003] [Bouillot and et al, 2009]. An audio signal stream at 96khz requires that 100 10 the temporal synchronization error does not I! = log 1+ + log 1+ bits, (2) exceed 0.5 microseconds. 2 10 2 1 # $ # $ The effect of synchronization error in the control stream for a spatial audio rendering 10 bits I! = 68 . (3) engine may have an impact on the final repro- × sec sec duction quality. For example in a phase-mode This estimate informs us that if the gesture as beamforming array ([Rafaely, 2005b]), synchro- described was transformed to a digital represen- nization inaccuracy is similar to a positioning tation without loss of information, the temporal error and synchronization jitter is functionally dimension alone would require 68 bits/sec to similar to transducer noise [Rafaely, 2005a]. encode. For sake of comparison the highest reported 2.3 Sound Synthesis information transfer rates for target-selection In a pure signal processing context audio control with a mouse in the ISO-1941-9 test are around data can be considered as the component of 3 bits/sec [MacKenzie et al., 2001]. the signal that is non-stationary. The rep- A distinguishing feature of the musical con- resentation of control information is a topic text of gesture is that the human performer uses specific to the design of any given algorithm, a combination of extensive training, anticipa- and so it is impossible to state a universal set tions of musical structure, and sensory feedback of requirements for audio control. It is worth to continuously adjust and refine the gesture. noting that some audio synthesis algorithms Therefore, musical gestures cannot be directly can have significant bandwidth requirements, compared to reaction-time studies or task-based especially the data driven methods such as assessments as they are used in the study of sinusoidal additive synthesis and concatenative human-machine ergonomics. However it is clear synthesis. from this example that musical gestures contain a far greater density of temporal information 3 Temporal Quality Assurance than is typical for human-computer interactions 3.1 Event Synchronization in other contexts. Assuming clock synchronization is available, 2.2 Spatial Audio Control the timetag can be used to schedule events Spatial audio effects such as early reflections with fixed delays that account for the network and reverberation are broad-band temporal- transport delay in communication between de- spectral transformations, however the maxi- vices. The delays must be known and bounded. mum useful rate at which a spatial audio effect This is called forward synchronization and can can be controlled is limited to the sub-audible be efficiently implemented with the priority frequency band between 0-50 hz. This is due queue data structure [Schmeder and Freed,

133 2008] [Brandt and Dannenberg, 1998] (Figure In Figure 4 we see what combinations of jitter 2). and carrier frequency will degrade a gesture stream with 8-bit dynamic range. OSC Bundle is x.Timestamp Execute Input x NOW? 0.01 msec 0.1 msec 1. msec 2. msec 4. msec 0.5 Hz 100.806 80.942 60.5853 54.4588 48.2834 1. Hz 89.4672 69.2973 49.5129 42.7719 37.1899 2 Hz 83.5256 64.1865 44.4936 37.811 32.166 is x.Timestamp Defer Future? 4 Hz 77.8606 58.3905 38.2024 32.4498 25.4497 8 Hz 72.3401 52.0053 31.2989 25.7653 20.1786 16 Hz 66.1133 45.8497 25.8291 19.7408 14.3312 32 Hz 60.2471 39.6844 19.7202 13.546 8.26448 is x.Timestamp Fault 64 Hz 53.9285 33.8882 13.9203 7.90135 1.7457 Past?

Figure 4: Signal headroom as a function of carrier frequency and standard deviation of Figure 2: Forward synchronization scheduling delay error. BOLD where effective headroom is for presentation of messages less than 8-bits dynamic range (8-bits = 48db). Applications using OSC timestamps for syn- chronization should make clear in their docu- 3.3 Jitter Attenuation mentation what type of clock synchronization is From a simple inspection of the table shown, to be used, if any, as well as limits on tolerable it is apparent that in order to transmit without network delay. loss of information an instrumental gesture data stream with frequency content up to 10 hz and 3.2 Effect of Jitter 8-bit dynamic range the jitter must be less than Jitter is randomness in time. This may be 1/10th of a millisecond. Greater dynamic range found in the transport delay, or in the clock requires proportionally less jitter where an error synchronization error. Unless it is removed, the reduction by 50% improves dynamic range by 6 effect of jitter on a signal is to corrupt it with dB or 1-bit. To transmit a 10 hz signal with noise. The noise is temporal so its magnitude 16-bit dynamic range requires jitter to be less depends on the rate of change of the signal, than .5 microseconds. or its frequency content. Even for relatively On contemporary consumer operating sys- low-rate gesture signals, jitter noise can play tems typical random delays of 1-10 milliseconds a significant role. In Figure 3 we see that a between hardware and software interrupts are 2 msec jitter causes a significant reduction in unacceptably large [Wright et al., 2004], and the channel headroom. The fact that temporal this source of temporal noise ultimately inhibits jitter has a strong influence on signal quality is the information transmission-rates for real-time well known in the audio engineering community control streams. If the data is isochronously where a typical sampling converter operating at sampled it is possible to use filters to smooth 96khz requires a clock with jitter measured in jitter [Adriensen and Space, 2005]. However the picosecond range. this is not a typical expectation in audio con- trol so something else must be done. If the clock synchronization error is smaller than the transport jitter (which is often the case) and the data stream uses timestamps, then it is possible to use forward synchronization to remove jitter from a control signal (shown in Figure 5). This operation trades lower jitter for a longer fixed delay. Provided that total delays after rescheduling are less than 10 msec, a satisfac- tory music performance experience is possible. 3.4 Atomicity Within each bundle exists a point-in-time sam- Figure 3: Effective channel headroom after ple of a collection of sub-stream messages. The jitter induced noise on a 10hz carrier signal scope over which the data is valid is defined both

134 contain any number of encapsulated messages and there is no way for a parser to determine the total number until the end of packet is reached. Therefore the major need for sending OSC on a serial transport is that the packets must be encoded with some extra data to indicate where the packet boundaries are, called a framing protocol. Two options have been proposed for packet framing on serial links. The idea proposed in the OSC 1.0 specification is an integer length- count prefixed on the start of each packet that indicates how many bytes to expect. This en- Figure 5: Typical transport jitter of 1-5 msec coding requires a totally assured transport such and its recovery by forward synchronization as TCP/IP or USB-Serial. A serial transport with possible errors (such as RS232) will be broken if there is any error in the encoded with respect to the message addresses as well as length. some temporal window at the reference timetag. The SLIP method for framing packets (RFC In implementation practice for application 1055 [Romkey, 1988]) is an alternative that design, message data needs to double-buffered is robust to transmission errors and stream or queued in a FIFO that is updated according interruption. In general it is preferred over the to the associated timetag. This prevents unin- former for its simple error recovery. tended temporal skew between sub-streams. Assured transports must be used for any mis- 4 Transport Considerations sion critical application of OSC and generally this means TCP/IP or something with a similar A common transport protocol used with the feature set is needed. OSC format is UDP/IP, but OSC can be encap- sulated in any digital communication protocol. 4.3 Isochronous Stream Transport The specific features of each transport can affect Isochronous protocols have guaranteed band- the quality and availability of stream data at the width, in-order delivery of data, but are not application layer. assured (no retries are made on failure). They may or may not provide natural packet bound- 4.1 Datagram Transports aries. A datagram transport (UDP being a canonical Ethernet AVB and the isochronous modes of example) is a non-assured transport. Each USB and Firewire are examples of this transport packet is either delivered in its entirety or not type. Ethernet AVB has additional features delivered at all. Packets may be out of order, in in that it also provides a network clock for which case OSC bundle timestamps can be used synchronization and its own timestamp for syn- to recover the correct order. Datagram trans- chronization of events with total latency as low ports provide a natural encapsulation boundary as 2 msec and synchronization error of less than for each packet. In the case of UDP if the packet 0.5 microseconds. Class A streams in the AVB exceeds the maximum transmission unit (MTU) framework are unique among all the transports the packet may be fragmented over multiple discussed here in that they are guaranteed to pieces. The fragmentation can introduce extra meet or exceed all synchronization and latency delay as the UDP/IP stack must then reassem- requirements needed for audio control data as ble the pieces before delivering the packet to an described in this paper [Marner, 2009]. application. 4.4 File Streams and Databases 4.2 Serial Stream Transport For the archival recording and recall of audio Serial transports (TCP/IP being an example) control data streams, file systems and databases provide a continuous data stream between end- can be treated as serial stream transports with points. The OSC content format does not define high block-based jitter in the retrieval phase. a means for representing the beginning and end By considering a file to be a type of serial of a packet. In particular the OSC bundle can stream, OSC can use the same framing protocol

135 for serial stream encoding as a file format. necessary. Again the SLIP method is recommended, and it error recovery features enable robustness 4.6 Network Topology and Routing against file corruption and truncation. The OSC 1.0 Specification included some lan- When a recorded stream of OSC messages is guage regarding OSC client and server end- replayed, the original timestamps are rewritten points. In fact this distinction is not necessary according to a simple linear transformation, and OSC may be used on unidirectional trans- and can then be reconstructed temporally using ports, and more complex network topologies in- the forward synchronization scheduler. The cluding multicast, broadcast and peer-to-peer. rewriting of timestamps does not require rel- The OSCgroups project provides a method for ative time encodings. Since relative time can achieving multicast routing and automatic NAT always be extracted from absolute time but traversal on the internet [Bencina, 2009]. not the converse, recording of OSC streams for A shortcoming of many current OSC imple- archival purposes should always use absolute mentations using UDP/IP is missing support time values. for bidirectional messaging. As a best practice OSC Stream DB OSC Application implementations should try to leverage as much Real-time Interface information as the transport layer can provide, /play Commands Informational /filter as this makes more simple the task of configu- Interface Access Control /index ration of the endpoint addresses in applications as well as stateful inspection at lower layers. Forward Read #bundle Synchronization Stream query results Scheduler 5 Describing Control Data The address field in an OSC message is where Write #bundle Real-time Gesture Stream to record Streams descriptive information is placed to identify the semantics of the message data. The set of all possible addresses within an application is Figure 6: Multi-stream recording and playback called an address space. interface to a database 5.1 Descriptive Styles A multi-stream approach for interfacing OSC Existing OSC practice includes a wide variety of streams to a database and efficient queries over strategies for structuring address spaces. Here the recorded data is demonstrated in the OSC- we intend to clarify the differences between StreamDB project [Schmeder, 2009b] (Figure the styles rather than to promote any par- 6). ticular method as preferred. Four common 4.5 Bandwidth Constrained Transports styles have emerged over decades of software engineering practice: RPC, REST, OOP and When bandwidth constrained transports are RDF. Examples of OSC messages in each style required such as wireless radios, OSC may be accomplishing the same task (setting the gain modified with some effort to enable lower bit- on a channel) are given here. rates. Depending on the nature of the data stream, adaptive sampling dependent on the 5.1.1 RPC information-rate can be used to reduce the The RPC (Remote Procedure Call) style em- number of messages on the network. Interpo- ployes functional enumeration, and lends itself lation between frames can be used to recover a to small address spaces of fixed functions: smooth control signal in reconstruction. This /setgain (channel number = 3) (gain value = x) is likely to work well for instrumental gesture data as the actual effective number of new bits 5.1.2 REST of information per second is relatively low. The REST (Representational State Transfer) Another major source of bit-sparsity in OSC style encourages state-machine free represen- streams is the message address field. Because tations by transmitting the entire state of an it is typically a human-readable string, and entity in each transaction. Web application pro- English text has a bit rate of 1-1.5 bits per grammers are familiar with this style wherein character, about 80% of the bits are redundant. every time a page is loaded, the application has A dictionary-type compression scheme could two phases: setup (recreating the entire applica- be used to compress the address strings if tion state from the transferred representation)

136 and teardown (throwing it all away). The state- operations, patterns enable one-to-many map- free property is what enables web-browsers to pings between patterns and groups of messages. always pages outside the context of a browsing However this technique is only useful if the session by recalling a bookmark. target messages have a structure that enables OSC address spaces using the REST style of the provided pattern syntax to make useful resource enumeration have a familiar appear- groupings. This structure is usually present ance since it is the most common style used in when the address space follows the REST design the construction of hyperlink addresses on the paradigm. Designers of address spaces for web. applications should consider how the resulting addresses might make use of grouped-control by /channel/3/gain (x) patterns. 5.1.3 OOP /channel/*/gain (common gain value x) The OOP (Object Oriented Programming) style Some effort is needed to retain efficiency of is based on an intuitive concept of objects that query operators in very large address spaces. are self-contained entities containing both at- This is possible using database structures such tributes and specialized functions called meth- as the RDTree [Schmeder, 2009b]. ods that are procedures transforming their own attributes. 5.3 Problems with Stateful Encodings /channel/3@gain (x) A stateful encoding of a control data stream /channel/3/setgain (x) is one where the meaning of a message has some dependence on a previously transmitted OOP enables abstraction and layering in large message. The interpretation of the message by systems. It may also need notations beyond the the receiver requires some memory of previously basic ’/’ delimiter used in path-style addresses received messages in addition to the necessary since the OOP structure requires differentiation logic to correctly fuse the information. This of the object, attribute and method entity logic is typically a finite state machine, although types. In the above example we have used in general it can be more complex. ’@’ following the XPath notation to indicate an Suppose that there is a switch, called attribute. In Jamoma a ’:’ character is used to /button, with two possible states, offor similar effect [Place et al., 2008]. on, represented by the numbers 0 and 1 5.1.4 RDF respectively. A designer wishing to conserve The RDF (Resource Description Framework) network bandwidth decides only to transmit style employs ontological tagging to describe a message when the switch changes from one data with arbitrarily complex grammars. This state to the other and so sends the number +1 style of control is the most powerful of the to indicate a transition from 0 to 1 and the alternatives shown here, however its use also number -1 to indicate a transition from 1 to 0. requires greater verbosity since it makes no +1 semantic assumptions about the data structure.

The interpretation of the delimiter ’/’ in OSC 0 1 as a hierarchical containment operator as it implies in the REST and OOP paradigms is -1 not used in this style. Instead it is interpreted as an unordered delimiter between tags, and Figure 7: Finite state machine for parsing the each tag is a comma-separated triple of subject, transitions between a two states predicate, and object entities. The following is a valid sequence of messages /channel,num,3 that can be verified at by a receiver using the /op,is,set finite state machine shown in Figure 7. /lvalue,is,gain /rvalue,units,dB (x) /button +1 /button -1 5.2 Leveraging Pattern Matching /button +1 In OSC there a type of query operator called /button -1 address pattern matching. Similar to the use of A potential problem with this representation wildcard operators in command-line file system is that it is not robust to any errors in the

137 Device transmission of the data. Suppose that a Abstraction message is lost due to the use of a non-assured s u A A Mapping transport such as UDP/IP, the sequence is then: A f f Transforms v A A /button +1 t /button +1 Signal + /button -1 Processing - In this example it is possible to make a Electronic more complex state machine that is capable of Sensing of recovering from the missing data, but we can immediately see that the complexity of the Physical program has doubled (Figure 8). Materials +1 User Action

-1 0 1 +1 Figure 9: Intermediate layers of representation between user interface controller and an appli- -1 cation Figure 8: Finite state machine for parsing the transitions between a two states with error 5.4.1 Complications of Mapping recovery logic The use of transformational mapping of gesture An alternative solution eliminates the need data is an important aspect in the design of for a parsing engine entirely, by simply trans- interactive musical systems [Hunt et al., 2003]. ferring the entire state of the switch in every In many cases useful mapping transformations message. carry out some type of information fusion that /button 0 is a non-linear transformation of the data. How- /button 0 ever non-linear transformations require extra /button 0 attention because they transform uniform noise /button 1 into a non-uniform noise with a complicated /button 1 spatial structure. It is possible to design adap- /button 0 ... tive filters that are optimal for a non-linear transformation, however this requires a more Furthermore, if these messages are contin- complex processing graph than what is shown uously transmitted even when the state does in Figure 9. not change, then the receiver needs to make no Consider the non-linear transformation func- special effort to recover from a missed message. tion, While this example is contrived to the point of x y being trivial, it does demonstrate that stateful f(x, y)= − . (4) x + y encodings require more complex programs es- pecially in the case of error handling. On mod- If x and y are corrupted with any noise (which ern network transports with typical bandwidth is inevitable) then the transformed variable capacity of 1Gbits/sec, the simplicity of state- f(x, y) will greatly amplify that noise when x free representations is often more valuable than and y approach zero. This is evident from saving network bandwidth. the fact that its derivative is unbounded as (x, y) (0, 0). 5.4 Layering Control Data → Between the source user action performed on a x y 1 ∂ f(x, y)= − (5) human input device to the high level application x,y −(x + y)2 ± x + y control stream, there are several intermediate layers of representation [Follmer et al., 2009] Therefore thresholding (outlier rejection) and (Figure 9). The OSC format can be used at noise filtering must take place after the map- every layer where a digital representation of ping transform, even though they are ostensibly the data stream is present. Even though such signal processing layer operations (see Figure streams may ultimately be not used in high 10). In other words, the layer model of Figure level abstractions it is useful to retain the OSC 9 is not entirely correct as the layers are not address labels at each step. strictly ordered. To enable out-of-order process-

138 ing across layers, designers should retain ver- bandwidth. Except for perhaps the most eso- sions of data streams before and after mapping teric applications, a 0.5 microsecond temporal transformations under different address labels. precision is sufficient for control of any audio synthesis algorithm.

Outlier 6.1.4 Atomicity Rejection In all cases careful use of double-buffering, lock- Mapping Transforms free queues and local memory-barrier opera- Noise Filtering tions should be used to ensure a best-effort Signal is made for minimizing the synchronization Processing Nonlinear skew between bundle-encapsulated sub-stream Function messages. Electronic Sensing Raw Sensor 6.1.5 Latency and Jitter Data The latency and jitter of secondary software interrupts typical of for human-input device Figure 10: Cross-layer dependencies exist in the streams are detrimental to the quality of control processing chain for input control data. data. Bounded latency and minimal jitter should be ensured for audio control data. 6 Conclusion 6.2 Control Meta-data Here we present a brief summary of each point 6.2.1 Interface Design Patterns made in the document. Many styles of meta-data description are 6.1 Transport and Synchronization possible including procedural (RPC), resource- 6.1.1 Instrumental Gesture Control oriented (REST), object-oriented (OOP) and ontology-oriented (RDF). Application designers For data streams from measurement of human should feel free to choose the most appropriate kinetic gestures, the temporal synchronization style, however designers creating generic tools error should be not more than .1 milliseconds to for OSC processing should support as many ensure lossless transmission of periodic signals styles as possible. up to 10 hz with 8 bit dynamic range, with each extra bit of dynamic range requiring half 6.2.2 Stateful Representation the temporal error (50 usec for 9 bits, 25 usec When stateful representations of control data for 10 bits, etc.). To resolve events with 1 streams are used (with care), then assured msec relative temporal precision the sampling transports should also be employed to reduce frequency of measurement should be 2000 hz. software errors that may be triggered by trans- 6.1.2 Spatial Audio Control mission errors. The transport should be capable of updating 6.2.3 Multi-Layered Representation the spatial audio parameters at rates of 100hz Retaining data stream representations at mul- in order to resolve the full range of perceivable tiple levels of abstraction is useful as some spatial effects that may extend up to 50 hz. operations on control data streams cannot be For spatial audio control data used in beam- performed in a strictly sequential order. The forming or wavefield synthesis rendering, fol- general process structure for transformation of lowing the AES recommended limits the syn- control data is a directed graph. chronization error should be less than 5% of a sample frame for the highest controlled fre- 7 Acknowledgements quency. For example control over coefficients in We are grateful to Meyer Sound Laboratories a phase-mode beamforming array operating up Inc. of Berkeley CA for financial support of to 10 khz requires a synchronization accuracy this work. The anonymous reviewers provided of 5 microseconds. helpful suggestions to improve this document. 6.1.3 Digital Audio Synthesis Control The users of OSC in the community includ- For audio synthesis algorithms in general the ing computer music application designers and needs of control data are dependent on the researchers have played an important role in nature of the algorithm and may vary widely bringing to light important issues related to depending on the level of detail and control audio control.

139 References Geoffrey M Marner. 2009. Time Stamp Ac- Matthew Wright Adrian Freed. 1997. CAST: curacy needed by IEEE 802.1AS. Technical CNMAT Additive Synthesis Tools. http:// report, IEEE 802.1 AVB TG. archive.cnmat.berkeley.edu/CAST/. Timothy Place, Trond Lossius, Alexander Fons Adriensen and A Space. 2005. Using Jensenius, Nils Peters, and Pascal Baltazar. a DLL to Filter Time. In Proceedings of the 2008. Addressing Classes by Differentiating Linux Audio Conference. Values and Properties in OSC. In NIME. Audio Engineering Society. 2003. AES B. Rafaely. 2005a. Analysis and design Recommended Practice for Digital Audio of spherical microphone arrays. Speech and Engineering - Synchronization of Digital Audio Processing, IEEE Transactions on, Audio Equipment in Studio Operations. 13(1):135–143, Jan. Technical Report 11, AES. B. Rafaely. 2005b. Phase-mode versus Ross Bencina. 2009. OSCgroups. delay-and-sum spherical microphone array http://www.audiomulch.com/~rossb/ processing. Signal Processing Letters, IEEE, code/oscgroups/. 12(10):713–716, Oct. Nicolas Bouillot and et al. 2009. AES White J. Romkey. 1988. RFC1055 - Nonstandard for Paper: Best Practices in Network Audio, transmission of IP datagrams over seria lines: volume 57 of JAES. AES. SLIP. Technical report, IETF. Eli Brandt and Roger Dannenberg. 1998. Andy Schmeder and Adrian Freed. 2008. Time in Distributed Real-Time Systems. In Implementation and Applications of Open Proceedings of the ICMC, pages 523–526, San Sound Control Timestamps. In Proceedings Francisco, CA. of the ICMC, pages 655–658, Belfast, UK. ICMA. Chris Chafe, Michael Gurevich, Grace Leslie, and Sean Tyan. 2004. Effect of time delay Andrew Schmeder. 2009a. An Exploration on ensemble accuracy. In In Proceedings of Design Parameters for Human-Interactive of the International Symposium on Musical Systems with Compact Spherical Loud- Acoustics. speaker Arrays. In Ambisonics Symposium. David Cortesi and Susan Thomas. 2001. Andrew Schmeder. 2009b. Efficient Gesture REACT™ Real-Time Programmer’s Guide. Storage and Retrieval for Multiple Applica- Number Document Number 007-2499-011. tions using a Relational Data Model of Open Silicon Graphics, Inc. Sound Control. In Proceedings of the ICMC. Sean Follmer, Bj¨orn Hartmann, and Pat David Wessel and Matthew Wright. 2002. Hanrahan. 2009. Input Devices are like Problems and Prospects for Intimate Musical Onions: A Layered Framework for Guiding Control of Computers. Computer Music Device Designers. In Workshop of CHI. Journal, 26:11–22. Adrian Freed. 1996. Audio I/O Programming Matthew Wright, Amar Chaudhary, Adrian on SGI Irix. http://cnmat.berkeley.edu/ Freed, Sami Khoury, David Wessel, and Ali node/8775. Momeni. 2000. An xml-based sdif stream Andy Hunt, Marcelo M. Wanderley, and relationships language. In International Matthew Paradis. 2003. The Importance of Computer Music Conference, pages 186–189, Parameter Mapping in Electronic Instrument Berlin, Germany. International Computer Design. Journal of New Music Research, Music Association. 32(4):429–440, December. Matthew Wright, Ryan Cassidy, and Michael I. Scott MacKenzie, Tatu Kauppinen, and Zbyszynski. 2004. Audio and Gesture Miika Silfverberg. 2001. Accuracy measures Latency Measurements on Linux and OSX. for evaluating computer pointing devices. In Proceedings of the ICMC, pages 423–429. In CHI ’01: Proceedings of the SIGCHI Matthew Wright. 2002. Open Sound conference on Human factors in computing Control 1.0 Specification. http: systems, pages 9–16, New York, NY, USA. //opensoundcontrol.org/spec-1_0. ACM.

140 supernova, a multiprocessor-aware synthesis server for SuperCollider

Tim BLECHMANN Vienna, Austria [email protected]

Abstract lider server instances from the same language SuperCollider [McCartney, 1996] is a modular com- process. puter music system, based on an object-oriented This paper is divided into the following sec- real-time scripting language and a standalone syn- tions. Section2 gives an introduction to the thesis server. supernova is a new implementation of difficulties when parallelizing computer music the SuperCollider synthesis server, providing an ex- applications, Section3 describes the general ar- tension for multi-threaded signal processing. With chitecture of SuperCollider and the program- adding one class to the SuperCollider class library, ming model of a SuperCollider signal graph. the parallel signal processing capabilities are ex- Section4 introduces the concept of ‘parallel posed to the user. groups’ to provide multi-processor support for Keywords SuperCollider signal graphs. Section5 gives an overview of the supernova architecture, Section SuperCollider, multi-processor, real-time 6 describes the current limitations. 1 Introduction 2 Parallelizing Computer Music In the last few years, multi- and many-core com- Systems puter architectures nearly replaced single-core Parallelizing computer music systems is not a CPUs. Increasing the single-core performance trivial task, because of several constraints. requires a higher power consumption, which would lower the performance per watt. The 2.1 Signal Graphs computer industry therefore concentrated on in- In general, signal graphs are data-flow graphs, a creasing the number of CPU cores instead of form of directed acyclic graphs (DAGs), where single-core performance. These days, most mo- each node does some signal processing, based bile computers use dual-core processors, while on its predecessors. When trying to parallelize workstations use up to quad-core processors a signal graph, two aspects have to be consid- with support for Simultaneous Multithreading. ered. Signal graphs may have a huge number of Most computer-music applications still use nodes, easily reaching several thousand nodes. a single-threaded programming model for the Traversing a huge graph in parallel is possible, signal processing. Although multi-processor but since the nodes are usually small building aware signal processing was pioneered in the blocks (‘unit generators’), processing only a few late 1980s, most notably with IRCAM’s ISPW samples, the synchronization overhead would [Puckette, 1991], most systems, that are com- outweigh the speedup of parallelizing the traver- monly used these days, have limited support for sal. To cope with the scheduling overhead, Ul- parallel signal processing. Recently, some sys- rich Reiter and Andreas Partzsch developed an tems introduced limited multi-processor capa- algorithm for clustering a huge node graph for a bilities, e.g. PureData, using a static pipelining certain number of threads [Reiter and Partzsch, technique similar to the ISPW [Puckette, 2008], 2007]. adding a delay of one block size (usually 64 sam- While in theory the only ordering con- ples) to transfer data between processors. Jack2 straint between graph nodes is the graph or- [Letz et al., 2005] can execute different clients in der (explicit order), many computer music parallel, so one can manually distribute the sig- systems introduce an implicit order, which nal processing load to different applications. It is defined by the order, in which nodes access can also be used to control multiple SuperCol- shared resources. This implicit order changes

141 with the algorithm, that is used for traversing IDE GUI the graph and is undefined for a parallel graph traversal. Implicit ordering constraints make it especially difficult to parallelize signal graphs of pipe / c api c api / osc max-like [Puckette, 2002] computer music sys- tems. 2.2 Realtime Constraints sclang Computer Music Systems are realtime systems with low-latency constraints. The most com- osc monly used audio driver APIs use a ‘Pull Model’1, that means, the driver calls a certain callback function at a monotonic rate, when scsynth / supernova the audio device provides and needs new data. The audio callback is a realtime function. If it doesn’t meet the deadline and the audio data c api is delivered too late, a buffer under-run oc- curs, resulting in an audio dropout. The dead- line itself depends on the driver settings, espe- Unit Generators cially on the block size, which is typically be- tween 64 and 2048 samples (roughly between Figure 1: SuperCollider architecture 1 and 80 milliseconds). For low-latency appli- cations like computer-based instruments play- back latency below 20 ms are desired [Mag- a real-time garbage collector, which is in- nusson, 2006], for processing percussion instru- spired by Smalltalk. The language comes ments round-trip latencies for below 10 ms are with a huge class library, specifically de- required to ensure that the sound is perceived signed for computer music applications. It as single event. Additional latencies may be includes classes to control the synthesis introduced by buffering in hardware and digi- server from the language. tal/analog conversion. Synthesis Server (scsynth) The SuperCol- In order to match the low-latency real-time lider server is the signal processing engine. constraints, a number of issues has to be dealt It is controlled from the Language using a with. Most importantly, it is not allowed to simple OSC-based network interface. block the realtime thread, which could happen when using blocking data structures for syn- Unit Generators Unit Generators (building chronization or allocating memory from the op- blocks for the signal processing like oscil- erating system. lators, filters etc.) are provided as plugins. These plugins are shared libraries with a C- 3 SuperCollider based API, that are loaded into the server supernova is designed to be integrated in the at boot time. SuperCollider system. This Section gives an IDE There are a number of integrated devel- overview about the parts of SuperCollider’s ar- oper environments for SuperCollider source chitecture, that are relevant for parallel signal files. The original IDE is an OSX-only ap- processing. plication, solely built as IDE for SuperCol- 3.1 SuperCollider Architecture lider. Since it is OSX only, several alter- SuperCollider in its current version 3 [McCart- natives exist for other operating systems, ney, 2002] is designed as a very modular system, including editor modes for Emacs, Vim, consisting of the following parts (compare Fig- gedit. Beside that, there is a plugin for ure 3.1): Eclipse and a windows client called Psycol- lider. Language (sclang) SuperCollider is based on GUI To build a graphical user interface for Su- an object-oriented scripting language with perCollider, two solutions are widely used. 1e.g. Jack, CoreAudio, ASIO, Portaudio use a pull On OSX, the application provides Cocoa- model based widgets. For other operating sys-

142 tems, one can use SwingOSC, which is (i.e. buses and buffers) needs some attention to based on Java’s Swing widget toolkit. avoid data races. While with nodes in sequential groups the access to shared resources is ordered, This modularity makes it easy to change some this ordering constraint does not exist for paral- parts of the system, while keeping the func- lel groups. E.g. if a sequential group G contains tionality of other parts. It is possible to use the synths S1 and S2, which both write to the the server from other systems or languages, or bus B, S1 writes to B before S2, if and only if control multiple server instances from one lan- S1 is located before S2. If they are placed in guage process. Users can extend the language a parallel group P, it is undetermined, whether by adding new classes or class methods or write S1 writes to B before or after S2. This may be new unit generators (ugens). an issue, if the order of execution affects the re- 3.2 SuperCollider Node Graph sult signal. Summing buses would be immune The synthesis graph of the SuperCollider server to this issue, though. is directly exposed to the language as a node Shared resources are synchronized at ugen graph. A node can be either a synth, i.e. a level in order to assure data consistency. It is synthesis entity, performing the signal process- possible to read the same resource from differ- ing, or a group, representing a list of nodes. ent unit generators concurrently, but only one So, the node graph is internally represented as unit generator can write to a resource at the a tree, with groups at its root and inner leaves. same time. If the order of resource access mat- Inside a group, nodes are executed sequentially, ters, the user is responsible to take care of the the node ordering is exposed to the user. Nodes correct execution order. can be instantiated at a certain position inside 5 supernova Architecture a group or moved to a different location. The architecture of supernova is not too differ- 4 Multi-core support with supernova ent from scsynth, except for the node graph in- supernova is designed as replacement for the terpreter. It is reimplemented from scratch in SuperCollider server scsynth. It implements the c++ programming language, making heavy the OSC interface of scsynth and can dynam- use of the stl, the boost libraries and template ically load unit generators. Unit Generators programming techniques. This Section covers for scsynth and supernova are not completely the design aspects, that differ between the Su- source compatible, though the API changes are perCollider server and supernova. very limited, so porting SuperCollider ugens to 5.1 DSP Threads supernova is trivial. The API difference is de- The multi-threaded dsp engine of supernova scribed in detail in Section 5.3. consists of the main audio callback thread and a 4.1 Parallel Groups number of worker threads, running with a real- To make use of supernova’s parallel execution time scheduling policy similar to the main audio engine, the concept of parallel groups has thread. To synchronize these worker threads, been introduced. The idea is, that all nodes in- the synchronization constraints, that are usu- side such a parallel group can be executed con- ally imposed (see Section 2.2), have to be re- currently. Parallel groups can be nested, ele- laxed. Instead of not using any blocking syn- ments can be other nodes (synths, groups and chronization primitives at all, the dsp threads parallel groups). are allowed to use spin locks, that are blocking, The concept of parallel groups has been im- but don’t suspend the calling thread for syn- plemented in the SuperCollider class library chronizing with other realtime threads. Since with a new PGroup class. Its interface is close to this reduces the parts of the program, that can the interface of the Group class, but uses paral- run in parallel, this should be avoided whenever lel groups on the server. Parallel groups can be it is possible to use lock-free data structures for safely emulated with traditional groups, making synchronization. PGroup it easy to run code using the extension 5.2 Node Scheduling on the SuperCollider server. At startup supernova creates a configurable 4.2 Shared Resources number of worker threads. When the main au- When writing SuperCollider code with the dio callback is triggered, it wakes the worker PGroup extension, the access to shared resources threads. All audio threads processes jobs from

143 a queue of runnable items. With each queue unlocks the buffer, when running out of item, an activation count is associated, denot- scope. ing the number of its predecessors. After an ˆ item has been executed, the activation counts To prevent deadlocks, one should not lock of its successors are decremented and if it drops more than one resource type at the same to zero, they are placed in the ready queue. The time. When locking multiple resources of concept of using an activation count is similar to the same type, only resources with adja- the implementation of Jack2 [Letz et al., 2005]. cent indices are allowed to be locked. The Items do not directly map to dsp graph nodes, locking order has to be from low indices to since sequential graph nodes are combined to high indices. reduce the overhead for scheduling item. E.g. groups that only contain synths would be sched- 6 Limitations uled as one item. At the time of writing, the implementation of supernova still imposed a few limitations. 5.3 Unit Generator Interface The plugin interface of SuperCollider is based 6.1 Missing Features on a simple c-style API. The API doesn’t pro- The SuperCollider server provides some fea- vide any support for resource synchronization, tures, that haven’t been implemented in super- which has to be added to ensure the consistency nova, yet. The most important feature, that is of buffers and buses. supernova is therefore still missing is non-realtime synthesis, that is shipped with a patched version of the SuperCol- implemented by scsynth. In contrast to the re- lider source tree, providing an API extension to altime mode, OSC commands are not read from cope with this limitation. With this extension, a network socket, but from a binary file, that is reader-writer locks for buses and buffers are in- usually generated from the language. Audio IO troduced, that need to be acquired exclusively is done via soundfiles instead of physical audio for write access. The impact for the ugen code devices. Other missing features are ugen com- is limited, as the extension is implemented with mands for specific commands to specific unit C preprocessor macros, that can easily be dis- generators and plugin commands to add user- abled. So the adapted unit generator code for defined OSC handlers. While these features ex- supernova can easily be compiled for scsynth. ist, they are neither used in the standard distri- In order to port a SuperCollider unit genera- bution nor in the sc3-plugins repository2. tor to supernova, the following rules have to be followed: 6.2 Synchronization and Timing As explained in Section 5.2 the audio callback ˆ ugens, that do not access any shared re- wakes up the worker threads before working on sources, are binary compatible. the job queue. The worker threads are not put ˆ to sleep until all jobs for the current signal block ugens, that are accessing buses have been run, in order to avoid some schedul- have to guard write access with ing overhead. While this is reasonable, if there a pair of ACQUIRE_BUS_AUDIO and are enough parallel jobs for the worker threads, RELEASE_BUS_AUDIO statements and read it leads to busy-waiting for sequential parts of access with ACQUIRE_BUS_AUDIO_SHARED the signal graph. If no parallel groups are used, and RELEASE_BUS_AUDIO_SHARED. Ugens the worker threads wouldn’t do any useful work, like In, Out and their variations (e.g. or even worse, they would prevent the operating ReplaceOut) fall in this category. system to run threads with a lower priority on ˆ ugens, that are accessing buffers this specific CPU. This is neither very friendly (e.g. sampling, delay ugens ) have to other tasks nor to the environment (power to use pairs of ACQUIRE_SNDBUF consumption). The user can make sure to par- or ACQUIRE_SNDBUF_SHARED allelize the signal processing as much as possi- and RELEASE_SNDBUF or ble and adapt the number of worker threads to RELEASE_SNDBUF_SHARED. In addi- 2The sc3-plugin repository (http://sc3-plugins. tion to that, the LOCK_SNDBUF and sourceforge.net/) is an svn repository of third-party LOCK_SNDBUF_SHARED can be used to cre- unit generators, that is maintained independently from ate a RAII-style lock, that automatically SuperCollider

144 match the demands of the application and the Nordic conference on Human-computer inter- number of available CPU cores. action, pages 441–444, New York, NY, USA. On systems with a very low worst-case ACM. scheduling latency, it would be possible to put James McCartney. 1996. SuperCollider, a the worker threads to sleep, instead of busy- new real time synthesis language. In Pro- waiting for the remaining jobs to finish. On a ceedings of the International Computer Music highly tweaked linux system with the RT Pre- Conference. emption patches enabled, one should be able to get worst-case scheduling latencies below 20 James McCartney. 2002. Rethinking the microseconds [Rostedt and Hart, 2007], which Computer Music Language: SuperCollider. would be acceptable, especially if a block size of Computer Music Journal, 26(4):61–68. more than 128 samples can be used. On my per- Miller Puckette. 1991. FTS: A Real-time sonal reference machines, worst-case scheduling µ Monitor for Multiprocessor Music Synthesis. latencies of 100 s (Intel Core2 Laptop) and 12 Computer Music Journal, 15(3):58–67. µs (Intel i7 workstation, power management, frequency scaling and SMT disabled) can be Miller Puckette. 2002. Max at Seventeen. achieved. Computer Music Journal, 26(4):31–43. Miller Puckette. 2008. Thoughts on Parallel 7 Conclusions Computing for Music. In Proceedings of the supernova is a scalable solution for real-time International Computer Music Conference. computer music. It is tightly integrated with Ulrich Reiter and Andreas Partzsch. 2007. the SuperCollider computer music system, since Multi Core / Multi Thread Processing in Ob- it just replaces one of its components and ject Based Real Time Audio Rendering: Ap- adds the simple but powerful concept of par- proaches and Solutions for an Optimization allel groups, exposing the parallelism explic- Problem. In Audio Engineering Society 122th itly to the user. While existing code doesn’t Convention. make use of multiprocessor machines, it can easily be adapted. The advantage over other Steven Rostedt and Darren V. Hart. 2007. multiprocessor-aware computer music systems Internals of the RT Patch. In Proceedings of is its scalability. An application doesn’t need the Linux Symposium, pages 161–172. to be optimized for a certain number of CPU cores, but can use as many cores as desired. It does not rely on pipelining techniques, so no la- tency is added to the signal. Resource access is not ordered implicitly, but has to be dealt with explicitly by the user.

8 Acknowledgements I would like to thank James McCartney for developing SuperCollider and publishing it as open source software, Anton Ertl for the valu- able feedback about this paper, and all my friends, who supported me during the develop- ment of supernova.

References St´ephaneLetz, Yann Orlarey, and Dominique Fober. 2005. Jack audio server for multi- processor machines. In Proceedings of the In- ternational Computer Music Conference. Thor Magnusson. 2006. Affordances and con- straints in screen-based musical instruments. In NordiCHI ’06: Proceedings of the 4th

145 146 Work Stealing Scheduler for Automatic Parallelization in Faust

Stephane Letz and Yann Orlarey and Dominique Fober GRAME 9 rue du Garet, BP 1185 69202 Lyon Cedex 01, France, letz, orlarey, fober @grame.fr { }

Abstract sophisticated methods to reorganize the code to Faust 0.9.10 1 introduces an alternative to OpenMP fit specific architectures can then be tried and based parallel code generation using a Work Steal- analyzed. ing Scheduler and explicit management of worker threads. This paper explains the new option and 1.1 The Faust approach presents some benchmarks. Faust [2] is a programming language for real- time signal processing and synthesis designed Keywords from scratch to be a compiled language. Being FAUST, Work Stealing Scheduler, Parallelism efficiently compiled allows Faust to provide a viable high-level alternative to C/C++ to de- 1 Introduction velop high-performance signal processing appli- Multi/many cores machines are becoming com- cations, libraries or audio plug-ins. mon. There is a challenge for the software com- The Faust compiler is built as a stack of code munity to develop adequate tools to take profit generators. The scalar code generator produces of the new hardware platforms [3] [8]. Various a single computation loop. The vector genera- libraries like Intel Thread Building Blocks 2 or tor rearrange the C++ code in a way that fa- ”extended C like” Cilk [5] can possibly help but cilitates the autovectorization job of the C++ still require the programmer to precisely define compiler. Instead of generating a single sam- sequential and parallel sub part of the compu- ple computation loop, it splits the computation tation and use the appropriate library call or into several simpler loops that communicates by building blocks to implement the solution. vectors. The result is a Direct Acyclic Graph In the audio domain, work has been done to (DAG) in which each node is a computation parallelize well known algorithms [4], but very loop. few systems aim to help in automatic paral- Starting from the DAG of computation loops lelization. The problem is usually tricky with (called ”tasks”), Faust was already able to gen- stateful systems and imperative approaches erate parallel code using OpenMP directives [1]. where data has to be shared between several This model has been completed with a new threads, and concurrent access have to be accu- dynamic scheduler based on a Work Stealing rately controlled. model. On the contrary, the functional approach used in high level specification languages generally 2 Scheduling strategies for parallel helps in this area. It basically allows the pro- code grammer to define the problem in an abstract The role of scheduling is to decide how to or- way completely unconnected of implementa- ganize the computation, in particular how to tions details, and let the compiler and various assign tasks to processors. backends do the hard job. By exactly controlling when and how state 2.1 Static versus dynamic scheduling is managed during the computation, the com- The scheduling problem exists in two forms: piler can decide what multi-threading or par- static and dynamic. In static scheduling usu- allel generation techniques can be used. More ally done at compile time, the characteristics 1this work was partially supported by the ANR of a parallel program (such as task processing project ASTREE (ANR-08-CORD-003) times, communication, data dependencies, and 2http://www.threadingbuildingblocks.org/ synchronization requirements) are known before

147 program execution. Temporal and spatial as- are organized as a sequence of groups of par- signment of tasks to processors are done by the allel tasks. Then appropriate OpenMP direc- scheduler to optimize the execution time. In tives are added to describe a fork/join model. dynamic scheduling on the contrary, only a few Each group of task is executed in parallel and assumptions about the parallel program can be synchronization points are placed between the made before execution, and thus, scheduling de- groups (Figure 1). This model gives good re- cisions have to be realized on-the-fly, and as- sults with the Intel icc compiler. Unfortunately signment of tasks to processors are made at run until recently, OpenMP implementation in g++ time. was quite weak and inefficient, and even unus- able in an real-time context on OSX. 2.2 DSP specific context Moreover the overall approach of organizing We can list several specific elements of the prob- tasks as a sequence of groups of parallel tasks is lem: not optimum in the sense that synchronization - the graph describes a DSP processor which points are required for all threads when part of is going to read n input buffers containing p the graph could continue to run. In some sense frames to produce m output buffers containing the synchronization strategy is too strict and a the same p number of frames data-flow model is more appropriate. - the graph is completely known in advance, 2.3.1 Activating Work Stealing but precise cost of each task and communication Scheduler mode times (memory access, cache effects...) is not known The scheduler based parallel code generation is - the graph can be computed in a single step: activated by passing the --scheduler (or --sch) each task is going to process p frames in a single option to the Faust compiler. It implies the step, or buffers can be cut in slices and the graph --vec option as the parallel code generation is can be executed several times on sub slices to built on top of the vector code generation. fill output buffers. (see ”pipelining” section) With the --scheduler option, the Faust com- piler uses a very different approach. A data-flow model for graph execution is used, to be exe- cuted by a dynamic scheduler. Parallel C++ code embedding a Work Stealing Scheduler and using a pool of worker threads is generated. Threads are created when the application starts and all participate in the computation of the graph of tasks. Since the graph topology is known at compilation time, the generated C++ code can precisely describe the execution flow (which tasks have to be executed when a given task is finished...). Ready tasks are activated at the beginning of the compute method and are executed by available threads in the pool. The control flow then circulate in the graph from inputs task to output tasks in the form of acti- Figure 1: Tasks graph with forward activations vations (ready task index called tasknum) until and explicit synchronization points all output tasks have been executed.

2.4 Work Stealing Scheduler 2.3 OpenMP mode In a Work Stealing Scheduler [7], idle threads The OpenMP based parallel code generation is take the initiative: they attempt to steal tasks activated by passing the ---openMP (or --omp) op- from other threads. This is possible by having tion to the Faust compiler. It implies the --vec each thread owns a Work Stealing Queue, a spe- option as the parallel code generation is built cial double-ended queue with a Push operation, on top of the vector code generation. a private LIFO Pop operation 3 and a public In OpenMP mode, a topological sort of the graph is done. Starting from the inputs, tasks 3which does not need to be multi-thread aware

148 FIFO Pop operation 4. The basic idea of work 2.5.2 Several outputs stealing is for each processor to place work when If the task has several outputs, the code has to: it is discovered in its local WSQ, greedily per- - init tasknum with the WORK_STEALING value form that work from its local WSQ, and steal - if there is a direct link between the given work from the WSQ of other threads when the task and one of the output task, then this out- local WSQ is empty. put task will be the next to execute. All other Starting from a ready task, each thread exe- tasks with a direct link are pushed on current cutes it and follows the data dependencies, pos- thread WSQ 5 sibly pushing ready output tasks into its own - otherwise for output tasks with more than local WSQ. When no more tasks can be exe- one input, the activation counter is atomically cuted on a given computation path, the thread decremented (possibly returning the tasknum of pops a task from its local WSQ using its pri- the next task to execute) vate LIFO Pop operation. If the WSQ is empty, - after execution of the activation code, the thread is allowed to steal tasks from other tasknum will either contains the actual value threads WSQ using their public FIFO Pop op- of the next task to run or WORK_STEALING, so that eration. the next ready task if found by running the work The local LIFO Pop operation allows better stealing task. cache locality and the FIFO steal Pop larger 2.5.3 Work Stealing task chuck of work to be done. The reason for this is The special work stealing task is executed when that many work stealing workloads are divide- the current thread has no more next task to and-conquer in nature, stealing one of the oldest run in its computation path and its WSQ is task implicitly also steals a (potentially) large empty. The GetNextTask function aims to find subtree of computations that will unfold once out a ready task by possibly stealing a task to that piece of work is stolen and run. run from any of the other threads except the For a given cycle, the whole number of frames current one. If no task is ready then GetNextTask is used in one graph execution cycle. So when returns WORK_STEALING value and the thread loops finished, a given task will (possibly) activate its until it finally finds a task or the whole compu- output task only, and activation goes forward. tation ends. 2.4.1 Code generation 2.5.4 Last task The compiler produces a computeThread method Output tasks of the DAG are connected and called by all threads: activate the special last task which in turn quits - tasks are numbered and compiled as a big the thread. switch/case block - a work stealing task which aim to find out void computeThread(int thread) the next ready task is created { TaskQueue taskqueue; - an additional last task is created int tasknum = -1; int count = fFullCount; 2.5 Compilation of different type of nodes // Init input and output FAUSTFLOAT* input0 For a given task in the graph, the compiled code = &input[0][fIndex]; will depend of the topology of the graph at this FAUSTFLOAT* input1 stage. = &input[1][fIndex]; FAUSTFLOAT* output0 2.5.1 One output and direct link = &output[0][fIndex];

If the task has one output only and this output // Init graph has one input only (so basically there is a single int task_list_size = 2; link between the two tasks), then a direct acti- int task_list[2] = {2,3}; vation is compiled, that is the tasknum of the taskqueue.InitTaskList( next task is the tasknum of the output task, and task_list_size, task_list , there its no additional step required to find out fDynamicNumThreads, the next task to run. 5The chosen task here is the first in the task output 4which has to be multi-thread aware using lock-free list, more sophisticated choice heuristics could be tested techniques and is thus more costly at this stage.

149 cur_thread, FAUSTFLOAT** output) { tasknum ); this->input = input; this->output = output; // Graph execution code for (fIndex = 0; while (!fIsFinished) { fIndex < fullcount; switch (tasknum) { fIndex += 1024) { case WORK_STEALING: { fFullCount tasknum = min(1024, fullcount-fIndex); = GetNextTask(thread); TaskQueue::Init(); break ; // Initialize end task } fGraph.InitTask(1,1); case LAST_TASK: { // Only initialize tasks with inputs fIsFinished = true; fGraph.InitTask(3,1); break ; // Activate worker threads } fIsFinished = false; case 2: { fThreadPool. // DSP code SignalAll(fDynamicNumThreads -1); PushHead(4); // Master thread participates tasknum = 3; computeThread(0); break ; // Wait for end } while (!fThreadPool.IsFinished()) {} case 3: { } // DSP code } PushHead(5); Listing 2: master thread compute method tasknum = 4; break ; } 2.7 Start time case 4: { // DSP code At start time, n worker threads are created and tasknum put in sleep state, thread scheduling properties = ActivateOutput(LAST_TASK) and priorities are set according to calling thread break ; parameters. } case 5: { 3 Pipelining // DSP code tasknum Some graphs are sequential by nature. Pipelin- = ActivateOutput(LAST_TASK) ing techniques aim to create and exploit paral- break ; lelism in those kind of graph to better use multi- } } cores machines. The general idea is that for a } sequence of two tasks A and B, instead of run- } ning A on the entire buffer size then run B, we Listing 1: example of computeThread method want to split the computation in n sub-buffers and run A on the first sub-buffer, then B can be run on A first sub-buffer output while A runs on 2.6 Activation at each cycle second sub-buffer and so on. The following steps are done at each cycle: - n-1 threads are resumed and start the work - the calling thread also participates - ready tasks (tasks that depends of the input audio buffers or tasks with no inputs) are dis- patched between all threads, that is each thread pushes sub-tasks in its private WSQ and takes Figure 2: Very simple graph rewritten to ex- one of them to be directly executed press pipelining, A is a recursive task, B is a - after having done its part of the work, the non recursive one main thread waits for all worker threads to fin- ish This is done in the following compute method Different code generation techniques can be called by the master thread: used to obtain the desired result. We choose to rewrite the initial graph of tasks, by duplicating void compute(int fullcount, n times each task to be run on a sub part of the FAUSTFLOAT** input, buffer. This way the previously described code

150 generator can be directly used without major changes. Karplus32 120 3.1 Code generation 100 The pipelining parallel code generation is ac- ) s tivated by passing -sch option as well as the / 80 B

--pipelining option (or -pipe) with a value, the M Scalar ( factor the initial buffer will be divided in. t 60 Vector u

p OpenMP Previously described code generation strat- h g 40 Scheduler u

egy has to be completed. The initial tasks graph o r is rewritten and each task is split in several sub- T 20 tasks: - recursive tasks 6 are rewritten as a list of 0 n connected sub-tasks (that is activation has to 1 2 3 4 go from the first sub-task to the second one and Run so on). Each sub-task is then going to run on buffer-size/n number of frames. There is no real Figure 3: Compared performances of the 4 gain for the recursive task itself since it will still compilation schemes on karplus32.dsp be computed in a sequential manner. But out- put sub-tasks can be activated more rapidly and possibly executed immediately is they are part As we can see, in both cases the paralleliza- of a non recursive task (Figure 2). tion introduces a real gain of performances. The - non recursive tasks are rewritten as a list of speedup for Karplus32 was x2.1 for OpenMP n non connected sub-tasks, so that all sub-tasks and x2.08 for the WS scheduler. For Sonik Cube can possibly be activated in parallel and thus the speedup with 8 threads was of x4.17 for run on several cores at the same time. Each OpenMP and x5.29 for the WS scheduler. It sub-task is then going to run on buffer-size/n is obviously not always the case. Simple appli- number of frames. cations, with limited demands in terms of com- This strategy is used for all tasks in the DAG, puting power, tend to perform usually better the rewritten graph enters the previously de- in scalar mode. More demanding applications scribed code generator and the complete code usually benefit from parallelization. is generated.

4 Benchmarks To compare the performances of these various compilation schemes in a realistic situation, we have modified an existing architecture file (Alsa- GTK on Linux and CoreAudio-GTK on OSX) to continuously measure the duration of the compute method (600 measures per run). We give here the results for three real-life applications: Karplus32, a 32 strings simulator based on the Karplus-Strong algorithm (figure 3), Sonik Cube (figure 4), the audio part software of an audio- visual installation and Mixer (figure 5), a multi- voices mixer with pan and gain. Instead of time, the results of the tests are expressed in MB/s of processed samples because memory bandwidth is a strong limiting factor for today’s processors (an audio application can never go faster than the memory bandwidth). Figure 4: Compared performances of different 6those where the computation of the frame n depends generation mode from 1 to 8 threads of the computation of previous frames

151 The efficiency of OpenMP and WS scheduler tasks. are quite comparable, with an advantage to WS - there is no special strategy to deal with scheduler with complex applications and more thread affinity, thus the performances can de- CPUs. Please note that not all implementations grade if tasks are switched between cores. of OpenMP are equivalent. Unfortunately the - the effect of memory bandwidth limits and GCC 4.4.1 version is still unusable for real time cache effects have to be better understood audio application. In this case the WS scheduler - composing several pieces of parallel code is the only choice. The efficiency is also depen- without sacrificing performance can be quite dent of the vector size used. Vector sizes of 512 tricky. Performance can severally degrade as or 1024 samples usually give the best results. soon as too much threads are competing for The pipelining mode does not show a clear the limited number of physical cores. Using ab- speedup in most cases, but Mixer is an example straction layers like Lithe [6]or OSX libdispatch when this new mode helps. 7 could be of interest. References [1] Y. Orlarey, D. Fober, and S. Letz. Adding Automatic Parallelization to Faust. Linux Audio Conference 2009. [2] Yann Orlarey, Dominique Fober, and Stephane Letz. Syntactical and seman- tical aspects of faust. Soft Computing, 8(9):623632, 2004. [3] J.Ffitch, R.Dobson, and R.Bradford. The Imperative for High-Performance Audio Computing. Linux Audio Conference 2009. [4] Fons Adriaensen. Design of a Convolution Engine optimised for Reverb. Linux Audio Figure 5: Compared performances of different Conference 2006. generation mode from 1 to 4 threads on OSX [5] Robert D.Blumofe, Christopher F. Joerg, with a buffer of 4096 frames Bradley C. Kuszmaul, Charles E. Leiser- son, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. 5 Open questions and conclusion In PPOPP 95: Proceedings of the fifth ACM SIGPLAN symposium on Principles In general, the result can greatly depends on and practice of parallel programming, pages the DSP code going to be parallelized, the cho- 207216, New York, NY, USA, 1995. ACM. sen compiler (icc or g++) and the number of threads to be activated when running. Simple [6] Heidi Pan, Benjamin Hindman, Krste DSP effects are still faster in scalar of vector- Asanovic. Lithe: Enabling Efficient Compo- ized mode and parallelization is of no use. But sition of Parallel Libraries. USENIX Work- with some more complex code like Sonik Cube shop on Hot Topics in Parallelism (Hot- the speedup is quite impressive as more threads Par’09), Berkeley, CA. March 2009. are added. But they are still a lot of opened [7] Blumofe, Robert D. and Leiserson, Charles questions that will need more investigation: E. Scheduling multithreaded computations by - improving dynamic scheduling, since right work stealing. Journal of ACM, volume 46, now there is no special strategy to choose the number 5, 1999, New York NY USA. task to wake up in case a given task has several output to activate. Some ideas of static schedul- [8] David Wessel et al. Reinventing Audio and ing algorithms could be used in this context. Music Computation for Many-Core Proces- - the current compilation technique for sors. International Computer Music Con- pipelining actually duplicates each task code n ference 2008. times. This simple strategy will probably show its limit for more complex effects with a lot of 7http://developer.apple.com/

152 A MusicXML Test Suite and a Discussion of Issues in MusicXML 2.0 Reinhold Kainhofer, http://reinhold.kainhofer.com, [email protected] Vienna University of Technology, Austria and GNU LilyPond, http://www.lilypond.org and Edition Kainhofer, http://www.edition-kainhofer.com, Austria

Abstract has been available for developers implementing MusicXML [Recordare LLC, 2010] has become one MusicXML support in their applications. The of the standard interchange formats for music data. only help were the comments and explanations While a ”specification” in the form of some DTD given in the specification to create test cases files with comments for each element and equiva- manually. The predominant advice is to use the lently in the form of XML Schemas is available, Dolet plugin for Finale, which is a proprietary no representative archive of MusicXML unit test Windows and MacOS application that is not files has been available for testing purposes. Here, easily available for many Open Source develop- we present such an extensive suite of MusicXML unit tests [Kainhofer, 2009]. Although originally in- ers employing Linux. Also, this approach invari- tended for regression-testing the musicxml2ly con- ably will lead to MusicXML being interpreted as verter, it has turned into a general MusicXML test behaving like the Dolet plugin instead of being suite consisting of more than 120 MusicXML test an application-independent specification. Fur- files, each checking one particular aspect of the Mu- thermore, the lack of a test suite means that sicXML specification. a lot of work is duplicated creating MusicXML During the creation of the test suite, several short- test cases. comings in the MusicXML specification were de- This lack of a complete MusicXML test suite tected and are discussed in the second part of this for testing purposes was our main incentive for article. We also discuss the obstacles encountered when trying to convert MusicXML data files to the creating such a semi-official test suite [Kain- LilyPond [Nienhuys and et al., 2010] format. hofer, 2009]. Due to the complexity of musical notation, a complete set of test cases, covering Keywords every possible combination of notation and all MusicXML, Test suite, Software development, Mu- possible combinations of XML attributes and sic notation elements, is apparently out of reach. However, we attempted to create representative samples 1 About the MusicXML test suite to catch as many common combinations as pos- MusicXML has become the de-facto exchange sible. Our main goal was to create small unit format for visual music data, supported by test cases, covering not only the most common dozens of software applications. It has even features of MusicXML, but also some less used been proposed as a tool for musicological anal- musical notation elements, like complex time ysis [Vigliante, 2007; Ganseman et al., 2008], signatures, instrument specific markup, micro- online-music editing [Cunningham et al., 2006] tones, etc. or evaluating OMR systems [Szwoch, 2008] The test suite together with sample render- among others. ings are available for download at its homepage: Recently, the first version of the Open http://kainhofer.com/musicxml/ Score Format specification [Yamaha Corpo- ration, 2009] was published together with a 2 Structure of the test suite PVG (Piano-Voice-Guitar) profile, which em- We identified twelve different feature categories, ploys MusicXML as its data format. Thus most each dealing with separate aspects of the Mu- problems with MusicXML will automatically sicXML specification (like basic musical no- carry over to this new specification, too. tation, staff attributes, note-related elements, Despite a full syntactic definition of the Mu- page layout, etc.). Each of these categories sicXML format [Recordare LLC, 2010], no offi- was further split into more specific aspects, for cial suite of representative MusicXML test files which we created several test cases each.

153 01-09 ... Basics .mxl for (ZIP-) compressed MusicXML archives 01 Pitches as defined in container.dtd. 02 Rests Every test case is supposed to test one par- 03 Rhythm ticular feature or feature combination of Mu- 10-19 ... Staff attributes sicXML. If a feature has multiple possible at- 11 Time signatures 12 Clefs tribute values or different uses within a score, 13 Key signatures the corresponding test file contains several sub- 20-29 ... Note-related elements tests, separated as much as possible by using 21 Chorded notes different notes or even different measures or 22 Note settings, heads, etc. staves for each of the values or combinations. 23 Triplets, Tuplets For example, parenthesized notes or rests use 24 Grace notes the parentheses attribute of a notehead XML 30-39 ... Notations, articul., spanners element. However, for an application it might 31 Dynamics and other single symbols make a difference if that note is a note on its 32 Notations and Articulations 33 Spanners own, a note with a non-standard note head or 40-44 ... Parts part of a chord. Similarly, parenthesized rests 41 Multiple parts (staves) can have a default position in the staff or an ex- 42 Multiple voices per staff plicit position given in the MusicXML file (using 43 One part on multiple staves the pitch child element of the note describing 45-49 ... Measure issues and repeats the rest). The test case for parenthesizing cov- 45 Repeats ers all these cases: 46 Barlines, Measures 50-54 ... Page-related issues 6   51 Header information 4       52 Page layout       55-59 ... Exact positioning of items 60-69 ... Vocal music 61 Lyrics This choice of combining closely related com- 70-75 ... Instrument-specific notation binations or aspects of the same feature into 71 Guitar notation one test case provides a nice balance between 72 Transposing instruments clearly separating test cases for different fea- 73 Percussion tures to avoid influences of bugs in one feature 74 Figured bass 75 Other instrumental notation on another feature, while still keeping the num- 80-89 ... MIDI and sound generation ber of test files relatively low, which is relevant 90-99 ... Other aspects if running a test suite cannot be automated. 90 Compressed MusicXML files 99 Compatibility with broken MusicXML 3 An example of a unit test The unit test files are kept as simple as possi- Table 1: Structure of the files, categorized by ble, so that they can best fulfill their purpose of file name checking only one particular aspect. For exam- ple, the unit test file 33b-Spanners-Tie.xml to The test suite currently consists of more than check the processing of simple ties – a trivial fea- 120 test cases, where each file represents one ture, which still needs to be tested in coverage particular aspect of the MusicXML format and and regression tests – reads: is named accordingly. The file name starts with − − 1 bal description of the test case is given in the file name. The file extension follows the stan- dard of .xml for normal MusicXML files and Two simple tied whole notes 1 A more detailed description is given in a description − element inside the XML file.

154 1.5 − − [...] 1  2 0  4 file is kept as simple as possible to avoid cross- 1 < clef number=”1”> interactions of bugs in different areas of an ap- G plication. 2 4 Sample renderings Originally, the files of the test suite were gener- F ated as test cases for the implementation of mu- 4 sicxml2ly, a utility to convert MusicXML files to the LilyPond [Nienhuys and et al., 2010] for- 4 mat. Using this utility, sample renderings of all 1 the test cases can be created automatically and whole are made available on the homepage of the test However, these renderings cannot be regarded as official reference renderings, either, since they represent only one particular interpretation (the one of the musicxml2ly converter), while the F MusicXML specification leaves several aspects 4 unclear, as detailed below. So several aspects 4 of the semantic interpretation of MusicXML are left to the importing application. Furthermore, 1 the musicxml2ly converter does not yet fully whole these sample renderings should be understood as an indication how one particular application understands the files. − 5 Shortcomings of the MusicXML 4 format 4 While generating the MusicXML test suite, we   encountered several problems or inconsistencies Other test files check for more exotic fea- with the MusicXML specification as given in the tures, like for example the test case for non- DTDs or XML Schemas. Those issues involve traditional key signatures with microtone alter- semantic ambiguities, suboptimal XML design ations (excerpt of 13d-KeySignatures-Micro- choices and missing features. tones.xml): 5.1 Semantic ambiguities of MusicXML First, while the MusicXML specification gives a 1 precise syntactic specification of the format, the 4 complexities of music notation inevitably lead − − 1.5 to additional semantic restrictions, that can not 6<−/key step−> 0.5<−/key a l t e r > properly be expressed in a DTD or an XML 0<−/key step−> Schema. Additionally, MusicXML is also de- − − 0 signed to cover the features of several commer- 1 0.5 cially available music typesetting applications, 3 where each application has implemented some − −

155 aspects differently. Unless the MusicXML spec- Such notes would need to be assigned to differ- ification clearly describes how certain combina- ent voices upon import. tions of attributes and/or elements are to be 5.1.3 Attributes understood, there cannot be a unique interpre- Another unclear part of the MusicXML specifi- tation of the exact musical or layout content of cation regards the display of the settings of the some MusicXML snippet. attributes element. Some applications export Thus, the first type of issues we identified the time and key signature for every measure, are semantic ambiguities, which might or rather even if it hasn’t changed. The specification is should be clarified in the specification itself. quiet on whether the presence of an attributes Several of these were answered or explained element is supposed to imply the presence of a by Michael Good in email threads on the Mu- graphical indication of these settings (i.g. ex- sicXML mailing list, but the official specifica- plicitly displaying the key or time signature) or tion remains ambiguous to new developers. only to assert specific values and leave the de- 5.1.1 Only a syntactical definition cision to implementations. In the latter case (which is apparently favored by Michael Good), The biggest problem with MusicXML is that the MusicXML file does not specify the layout it is a purely syntactical definition, while the uniquely and different applications will produce musical content requires additional semantic re- different output. strictions. For example, spanners like tuplets, ties, crescendi, analysis brackets etc. are simply 5.1.4 Chords marked with a start, possibly some continuation Chords are another unclear area in the speci- and an end element. Several spanners can be ar- fication: A sequence of notes with the chord bitrarily overlapping, however, it is not possible element can only appear after a note that does to properly specify that each of these spanners not have the chord element set and serves as the must be closed (at a position where it makes base note for the chord. This restriction cannot sense musically). Furthermore, it does not make be handled in a DTD, but was introduced in the sense from a musical standpoint to have e.g. a PVG profile of OSF. Even worse, the grammar crescendo and a decrescendo overlapping in the allows e.g. forward or backward elements be- same voice. However, a part can have several fore a note with chord, so that the note does no different voices, each with different hairpins, so longer appear at the same time as the chord’s this is not a restriction for a part, only for a base note. This additional restriction that no voice in the conventional sense. Adding such forwards or backward appear between the notes restrictions at the voice-level is simply not pos- of a chord needs to be clarified. Also, theoreti- sible in a pure syntax specification like the DTD cally the notes of a chord can belong to different or XSD. voices in the file, while this is not supported by most applications. 5.1.2 Voice-Basedness 5.1.5 Lyrics Many music notation and sequencer applica- tions are based on the concept of voices, which Lyrics in MusicXML can have both a stanza was also introduced in MusicXML through the number and a name attribute to distinguish voice element of note. Notes with the same different lyrics lines. But the specification is voice element value are assumed to be in the quiet whether the number, the name alone or same voice, but the voice element is not re- the combination of number and name should quired (in the OSF PVG profile, a voice ele- be used to determine, which syllables belong ment is finally required). It is not clear, whether together. Even worse, it seems that for the a missing voice element implicitly means voice page display, the vertical position of the lyrics one or whether notes without a voice element is the main factor to associate lyrics syllables should be placed in a voice of their own. In any into words. This is a violation of the otherwise case, an importing application needs to check rather strict separation of content and display. whether the imported assignment of notes to 5.1.6 Figured Bass voices is possible in the application at all, and Figured bass numbers in MusicXML are always thus the voice element can only be used as a assigned to the ”first regular note that follows” strong indication, but not as a definitive assign- per specification, but it does not say if this is ment to voices. In particular, many applications meant in XML order or in time-order. In par- don’t allow overlapping notes in the same voice. ticular, there might be a forward or backward

156 element immediately after the figured bass. In element appears. With an XML Schema this this case, the XML-order and time-wise or- would be possible to model, but the XSD for der is clearly different. Also, the slash value MusicXML still enforces a fixed ordering to pro- of the suffix child element does not distin- vide backwards compatibility. guish forward slashes and back ticks (usually 5.2.2 Names of XML Elements through a ”6” to indicate a diminished chord). Containing Pitch Information The DTD only says ”The orientation and dis- Another suboptimal design choice regards all el- play of the slash usually depends on the figure ements that contain some kind of pitch infor- number.” While forward and backward slashes mation or alteration. Their names are chosen might mean the same from a musical point of to include the name of the enclosing XML ele- view, the display of the figure will thus vary ment, even though this can already be deduced from application to application. In most other from the enclosing context. For example, the cases, in contrast, MusicXML tries to be as pre- alteration of a normal note uses the alter el- cise possible as far as the display is concerned. ement inside a pitch, while the alteration of a 5.1.7 Harp Pedal Diagrams chord root uses the root-alter element inside Finally, the harp-pedals element for harp pedal a root and the alteration of a chord note uses diagrams lists the pedal states for the D, C, degree-alter inside degree. As all of them B, E, F, G, and A harp pedals, but only rec- simply indicate an alteration of the enclosing ommends to give the pedals in the usual order. element, it would be cleaner to use the same el- For different orders, it is not clear whether the ement name alter for all these uses and instead harp pedal should be displayed in the usual or- use the enclosing context, too. This would also der or in the order given. Also, there is no way make implementations cleaner, as they can base to indicate the usual vertical delimiter for other all pitch and alteration information on one com- orders. mon base class, using the same names for the children. This issue appears not only with the 5.2 Sub-optimal XML design alter and (key|tuning|root|degree)-alter A further type of issues with the MusicXML elements, but also with the step and (key| format concerns the general design of the XML tuning|root)-step as well as with the octave format. As the MusicXML format is supposed and(key|tuning)-octave elements. to be backward compatible, these design choices 5.2.3 Metronome Marks and cannot be undone. However, for completeness, Non-Standard Key Signatures we will still discuss how they could have been done better. Contrast this over-correctness in naming and ignoring the context in the XML tree to the 5.2.1 Strict Order of XML Child Nodes metronome element. Tempo changes of the form The MusicXML DTDs specify that the children “old value = new value” using the metronome of a note element (and of several similar ele- element are defined in the DTD as ments) have to be in a fixed order. In particu- lar, the duration, the optional voice and the (beat-unit, beat-unit-dot*, type elements have to be in this exact order, beat-unit, beat-unit-dot*) although one would intuitively place the dura- tion (the length in time units) and the type of where the two beat units indicate the value the note (the visual representation of the dura- before and after the tempo change. As a re- tion) together. Also, from a theoretical point of sult of the optional dots, one cannot simply view, there is no need to force a fixed ordering, access the old and the new tempo unit sepa- as all child elements simply specify properties of rately by their child index of the metronome ele- that note. The only reason for this fixed order- ment, but has to iterate through the children se- ing is – according to Michael Good2 – a tech- quentially, checking for existing dots. For most nical one, namely that the DTD does not pro- other purposes MusicXML tries to give each vide a proper way to allow arbitrary ordering of item with even the slightest different function the XML child elements while at the same time its own name. Similarly, custom key signatures adding restrictions on the number of times an are defined as

2Mail message to the MusicXML mailing list on ((...|((key-step, key-alter)*)), March 11, 2008 key-octave*)

157 where steps and alterations alternate. This de- positioning and offsets, there are several settings sign has the additional problem that the (op- for professional music typesetting, which cannot tional) octave definitions for each of the key el- be encoded in MusicXML. For these issues, we ements are given after the step/alter pairs in give some suggestions for a possible inclusion in left-to-right order. As pointed out by Tulgan3, MusicXML 2.1 or 3.0. this would be better represented by enclosing 5.3.1 Document-wide Header and each of the key element information (step, alter Footer Lines and optional octave) in a separate key-element child element. First, while MusicXML allows one to completely describe the page layout of the music sheet, 5.2.4 Shortcomings of DTD and XSD there is no way to specify a document-wide Formats header or footer line. The credit element al- In the original DTD for MusicXML, enumerated lows to place arbitrary text on any page, but element values cannot be properly modeled, but it refers to only one page (page 1 by default). have to use the #PCDATA type and explain the Thus, page headers and footers need to be de- possible values in the comments. Thus, all enu- fined for each page separately. We propose to merations are badly defined in the DTD, as only add values ”all”, ”even” and ”odd” to the data some possible values are mentioned, which are type of the page attribute for credit (currently unaccessible to any syntax checker. In the XSD xsd:positiveInteger): specification for MusicXML, the possible val- ues are mostly properly specified using an enu- Even− f o o t e r <−/c r e d i t words> MusicXML to non-western music notation very − hard (for example, displaying microtone acci- dentals used in Turkish music). Additionally, the possible values are not further explained 5.3.2 No Information about Purpose of (for example sharp-sharp vs. double-sharp Credit Elements for accidentals). Another problem with the credits elements is A similar problem are attributes, where the that they are simply text labels, but do not DTD allows a text value, which is then under- store its function (i.e. whether such an element stood as an integer or a real number. These gives the composer, poet, title, etc.). For a pure attributes are mainly fixed to integer values in layout-based application this might suffice, but the OSF PVG profile. any application, that tries to extract metadata 5.2.5 Staff-assigned Elements from the MusicXML file or that has a different handling for score titles and contributor names, Markup text in MusicXML is mostly assigned needs to extract that information. We propose to a whole staff. In reality, however, a text to add an enumerated type attribute to the might be inherently assigned to one particular credit element with possible values title, sub- note or voice. For example, when two instru- title, composer, poet or lyricist, arranger, pub- ments are combined as two voices on one staff, lisher, page number, header, footer, instrument, a markup “dolce” or “pizz.” might apply only copyright, etc. to one of the two instruments. Similar cases are instrument cue names given in a piano reduc- tion, as can be seen for example in the Tele- Composer− dare. Unfortunately, the voice child element − of direction is optional and thus most Mu- sicXML exporters ignore it, causing problems 5.3.3 System Delimiters for voice-based applications. In orchestral scores the systems are typically 5.3 Missing features in MusicXML 2.0 separated with a system delimiter consisting of two adjacent thick slashes. This is cur- While MusicXML 2.0 already added several im- rently also not possible to express in Mu- portant features over MusicXML 1.1, like exact sicXML. One possible solution would be to add 3Mail message to the MusicXML mailing list on Oc- it as a system-separator element inside the tober 7, 2006. defaults -> system-layout block of a score.

158 many notation elements like dynamic markings or text markup is assigned to a staff, while in double s l a s h LilyPond they must be assigned to a particular − − note. If a staff contains two instruments (i.e. two voices, one for each instrument), one has to determine to which note to assign an encoun- 5.3.4 Cadenzas tered dynamic marking or text markup. In most case assigning it to the nearest note will produce Finally, there is no proper way to encode a ca- the desired output, but there are many cases denza in MusicXML. While a measure can have where it is not possible to determine whether a an arbitrary number of beats in MusicXML, ir- marking belongs to both instruments or just to respective of the time signature, the information one of them. As an example, a dynamic sign like about where a cadenza starts cannot be repre- ”p” or ”ff” might either apply to both instru- sented. This will cause problems with applica- ments at the same time, or (e.g. if one of them tions and utilities that check MusicXML files has a short solo) only to one of them. Similarly, for correctness, as they have no way to distin- each instrument cue name provided as a help for guish incorrectly encoded files from files with a the conductor in a piano reduction apply only cadenza. to one particular voice in the (multi-voice) pi- 6 Issues in MusicXML translation to ano part, while other text markups will apply to all voices simultaneously. the LilyPond If one is only interested in generating the ex- In [Good, 2002] Michael Good discusses issues act layout as provided in the MusicXML file, that appear in the translation of MusicXML a misassignment will still lead to correct visual files from and to the MuseData, NIFF, MIDI output, although the extracted music informa- and Finale file formats. In this section, we will tion is not entirely correct. However, if one also discuss issues that appear in the conversion of wants to create separate instrumental parts for the MusicXML format to the LilyPond file for- the two instruments in the first case mentioned mat. above, then it is crucial to correctly assign each 6.1 Musical Content vs. Graphical staff-assigned element to either one or both in- Representation strumental voices. On the other hand, assigning an element to both voices will then lead to du- As LilyPond [Nienhuys and et al., 2010] is a plicated items in the LilyPond output. WYSIWYM (what you see is what you mean) While it is true that MusicXML defined application, its data files describe the musical the direction element to optionally contain a contents of a score, rather than its graphical voice element for that exact purpose, in real- description. Thus a converter from MusicXML ity most GUI applications for music notation to LilyPond needs to extract the exact musi- will not cater for this functionality and thus cal contents from a file. In MusicXML, several produce MusicXML files without proper voice- elements – most notably dynamic signs like p, assignment. f, crescendi etc. – are mainly tied to a posi- tion on the staff and in some cases its onset and 6.3 Voice-Based vs. Measure-Based end can only be deduced by the horizontal off- set of the dynamic sign in the MusicXML file. A further problem is that LilyPond is voice- In LilyPond, however, almost all notation ele- based, where the measure length is determined ments are attached to a note or rest, so these by the current time signature, while MusicXML elements need to be quantized and correctly as- is measure-based, where a measure can contain signed to a note or rest in LilyPond. This is an arbitrary number of beats, irrespective of the often not easily possible, due to explicit offsets time signature. In LilyPond, voices are inde- on the page, effectively assigning the element pendently split into measures during processing to a different position than the position in the rather than in the input. Only later are those MusicXML file. measures synchronized. As a consequence, if one voice has more beats in a measure than an- 6.2 Staff-Assigned Items other voice, LilyPond will not be able to prop- Another problem in the conversion to LilyPond erly synchronize them, so some skip elements is caused by the same fact that in MusicXML need to be inserted to line the voices up. The

159 voice-basedness of LilyPond also causes prob- References lems with MusicXML’s optional voice element, Stuart Cunningham, Nicole Gebert, Rich which allows a voice to have overlapping notes, Picking, and Vic Grout. 2006. Web-based which is not allowed in LilyPond, so that a Mu- music notation editing. In Proc. IADIS sicXML voice will possibly need to be split into Int. Conf. on WWW/Internet 2006. Murcia, multiple voices. Spain, October 5-8, 2006. 6.4 Page Layout Joachim Ganseman, Paul Scheunders, and Concerning the page layout, LilyPond needs Wim D’haes. 2008. Using XQuery on Mu- metadata about the score title and the contrib- sicXML databases for musicological analysis. utors and will create its own labels on the ti- In Proc. ISMIR 2008, pages 427–432. 9th tle page and the header and footer bars. As Int. Conf. on Music Information Retrieval. discussed above, the credit element does not Philadelpha, September 14-18, 2008. contain that information, so that the title page Michael Good. 2002. MusicXML in prac- from the MusicXML file cannot be reproduced. tice: Issues in translation and analysis. In Even worse, page headers and other markup is Proc. First Int. Conf. MAX 2002: Musical placed at absolute positions in the MusicXML Application Using XML, pages 47–54. Milan, score, which is not possible in LilyPond. September 19-20, 2002. 6.5 Workarounds in MusicXML Files Michael Good. 2006. Lessons from the adop- Finally, even some sample files provided by tion of MusicXML as an interchange stan- Recordare on the MusicXML homepage mix the dard. In Proc. XML 2006. graphical display with the musical information. For example, in the Chant.xml sample file a di- Reinhold Kainhofer. 2009. Unofficial Mu- visio minima (a short tick through the topmost sicXML test suite. http://kainhofer.com/ staff line) is not encoded as a barline with the musicxml/. Representative set of MusicXML proper tick bar-style attribute, but as a words test cases. direction element with text ” ”, which is then Han-Wen Nienhuys and Jan Nieuwenhuizen | shifted manually so that it appears at the cor- et al. 2010. GNU LilyPond. http://www. rect position. Such hacks cannot work with any lilypond.org/. The music typesetter of the application that tries to extract the musical con- GNU project. tents and not the exact page layout. Recordare LLC. 2010. MusicXML 2.0. 7 Conclusion Document Type Definition (DTD): http: //musicxml.org/dtds, W3C XML Schema The MusicXML test suite presented in this ar- Definition (XSD): http://musicxml.org/ ticle finally provides software developers in the xsd. area of music notation with an extensive set of representative test cases to check conformance Mariusz Szwoch. 2008. Using MusicXML to the MusicXML specification and to perform to evaluate accuracy of OMR systems. In regression and coverage tests. Diagrammatic Representation and Inference: Even though MusicXML has established it- Proc. Diagrams 2008, volume 5223 of Lec- self as an industry standard for exchanging mu- ture Notes in Computer Science, pages 419– sic notation, it is still encumbered with several 422. Springer Verlag, Berlin. Herrsching, Ger- minor issues. Our discussion in the second part many, September 19-21, 2008. of the paper attempts to provide implementors Raffaele Vigliante. 2007. MusicXML: An of MusicXML import and export features with XML based approach to automatic musico- some hints about possible pitfalls and ambigui- logical analysis. In Proc. Digital Humanities ties in the format. 2007. Urbana-Champaign, IL, June 4-8, 2007. Nonetheless, MusicXML is a very useful file Urbana-Champaign, IL, June 4-8, 2007. format for the extremely hard and complex task of music notation exchange. As the OSF speci- Yamaha Corporation. 2009. Open Score For- fication has already shown, one can expect that mat (OSF, ver. 1.0), packaging specification. future versions of MusicXML will clarify, solve http://openscoreformat.sf.net/. or at least soften most of the issues we discuss here.

160 "3DEV: A tool for the control of multiple directional sound source trajectories in a 3D space”

Oscar Pablo Di Liscia Esteban Calcagno Carrera en Composición con Medios Carrera en Composición con Medios Electroacústicos Electroacústicos Universidad Nacional de Quilmes Universidad Nacional de Quilmes Roque Saenz Peña 352 Roque Saenz Peña 352 Bernal, Argentina, B1876BXD Bernal, Argentina, B1876BXD [email protected] [email protected][email protected]

Abstract control of the spatial quality of the sound through This Paper presents a GNU software (3DEV) technological tools that allow accurate temporal developed for the creation, transformation and coordination between multiple audio signals and temporal coordination of multiple directional their trajectories or positions in a virtual space. sound source trajectories in a three- dimensional space. 3DEV was conceived as a general tool to be 1.2 Sound spatialisation software development used in electroacoustic music composition, and the data generated by it may be transmitted on a simple Sound spatialisation is developed in several and effective way to several spatialisation technologic and scientific areas mutually related. programs. Among the most important can be mentioned: Acoustics (particularly room acoustics), Keywords Psychoacoustics (specifically Spatial Listening), Sound Spatialisation, Electroacoustic Music, GNU Digital Signal Processing and Audio Engineering. Software Particularly, and together with the development of computer music, many of the results of the 1 General scientific research on this field were applied in the creation of software applications with the purpose 1.1 The spatial sound in electroacoustic of providing concrete tools for musical and sonic composition production to the composers and audio engineers. From its very beginning, avant-garde music The works of Chowning [3] and Moore ([4], [5] (both instrumental and electroacoustic) favoured a and [6]) must be regarded as a valuable and significant development of timbre, texture and unavoidable initial reference which was followed sound spatial quality together with the by other significant developments. Among of organisation of pitch and duration which are, by them, the ones by Kendall [7], Furse [8][9], López- the other hand, structural features of traditional Lezcano [10], Karpen [11], Pulkki [12] and Varga classical music. In the electroacoustic music, the [13] can be mentioned. At present, a growing spatial quality of sound plays a very relevant role amount of sound spatialisation programs exist, (if only because of the fact that this music must be most of them as high level environments achieved reproduced by loudspeakers). Because of this, the by the combination of general purpose audio analysis of the aesthetic influence of the spatial programs software units (such as Csound, Super quality of sound and its relations with other sonic Collider or Pure Data, to mention only some of organisations become an essential part of them). Two of them are AudioScape (Wozniewski electroacoustic music theory (See, as an example et al [14]), and Bin-Ambi (Musil, [15]). of this, Wishart [1], Cap. X). From a general point of view, the spatial treatment A sound trajectory may determine clearly the end of sound involves three basic features: of a musical motto, or a formal articulation of a 1-Environment: the physic space into which the musical composition, just changing its direction sound sources are located. suddenly or stopping abruptly its movement and 2-Localization - movement: the points of the developing a particular sonority from a steady physic space where the sound sources are located, position (See a classification of different kind of or the trajectories that they follow in this space movements in Stockhausen, 1955 [2]). It arises when moving. then in the composer the necessity of gaining 3-Directivity: the pattern of the acoustic energy spreading of the sound sources according their

161 nature and orientation with respect to a listening 3DEV is its capacity of drawing a precise point (see, for example, Causse [16]). temporal relation between any of the generated All the already mentioned programs were designed trajectories and the audio signals with which these for working on the features 1 and 2 (Environment are associated. Additionally, it is also possible to and Localization - Movement). Only some of create and modify a rotation vector which may be them, however, (see Moore [4], [5] and Furse [9]) also accurately coordinated with the temporal can perform sound source directivity simulation. evolution of the audio signal. If the Audio Engine By the other hand, from a design point of view, to be further used allows the work with directional three basic sections can be distinguished among all sound sources, it will then be possible transmit to the different spatialisation programs: it this data indicating how the virtual sound source 1-A general setting of configuration parameters eventually changes its orientation. usually related to the virtual room or environment, The control of the time between the different such as: size, shape, listener position, amount of spatial nodes (indicated as a percent of the total echoes to be computed, dense reverberation duration of the signals with which the trajectories quality, etc. are associated) is at present the only way of 2-A sound trajectory and/or sound position handling movement velocity. generator for the virtual acoustic sources. 3-A D.S.P. block (Audio Engine) dedicated to the 3 Description of the 3DEV program processing of the digital audio signals to be 3DEV uses the GTK+ C Library spatialised accordingly with the data of 1 and 2 (http://www.gtk.org/) for its graphic interface and the techniques which are to be used design, as well as the Pthread (Ambisonics, Intensity Panning, VBAP, etc.) (https://computing.llnl.gov/tutorials/pthreads/) and The work presented here address the item 2 (sound Sndlib (Schottstaedt,[18]) C Libraries for the trajectories generation) and discuss the design of a Audio I/O handling. This program is a self- general purpose Graphic Interface for the shaping sufficient executable and, at present, cannot be a of multiple three-dimensional trajectories, module of any other external program nor can independently of whatever spatialisation audio support external modules. engine may be chosen to be used. This separation 3DEV is divided in three different areas: the will allow the composer to further select the Spatial Trajectories window, the Audio Files technique and software environment most window and the Audio Engine connection appropriate to process the audio signals to be window. The two first parts took the major part of spatialised and, furthermore, even to use several the work, whilst the Audio Engine is still in different Audio Engines with the same data, if development, using at present the Csound desired. program. However, as already mentioned, the independence between movement design and 2 Basic features of 3DEV audio processing will allow the use of several 3DEV is conceived to bring assistance to other programs for sound spatialisation. electroacoustic music composers in the creation, transformation and temporal coordination of 3.1.1 Spatial trajectories window. multiple directional sound source trajectories in a The graphic interface for the trajectories, whose three-dimensional space. It is being developed by basic functions for 3D graphics were taken and Esteban Calcagno under the supervision of Oscar adapted from (Harlow [17]), was conceived as a Pablo Di Liscia, as a part of the Research Project main window with four menus which are briefly “Desarrollo de software para espacialización de described below: sonido y música” (Teatro Acústico Research 1-The “General View” window is divided in four Program, Universidad Nacional de Quilmes, equal size sections. One of them (the top leftmost Argentina). The binaries for Linux as well as the one) shows (with 3D rotation and zoom options) a source code and documentation can be obtained at: 3D plot of the trajectory that is actually been http://lapso.org/EstebanCalcagno/3dev.html . designed. The other three sections show the 3DEV allows to generate any amount of trajectory projection in the XY, XZ and YZ planes trajectories and to work on them using several in two dimensions, and allow creating and/or editing tools. The trajectories are delimited by editing new nodes. points (nodes) that can be visualised individually 2-The “Full 3D View” shows a 3D plot of all the and globally. An especially valuable feature of trajectories created or a selected group of them.

162 3-The “Point List” menu shows a list of the of the selected audio signal (in seconds and Cartesian coordinates of a selected trajectory as samples) with zoom and reproduction controls. well as allows editing them numerically. 4-The “Play List” shows a list of all the 3.1.3 Audio Engine Connection Window (Fig. 3) trajectories, and allows the association of any of At present, 3DEV uses the Csound program them with an Audio File. Through this menu it is (Barry Vercoe et al) as Spatialisation Engine. In possible to access the Audio File Window, which order to do this, 3DEV generates a text file with will be described in the next section. the format of a Csound score. This text file is a kind of script which may be further used by Csound to synthesize a spatialised signal based on a pre-designed orchestra which constitutes the basic DSP module. In this case, the Csound´s orchestra uses basically the spat3d Unit Generator (ItzvanVarga [14]), that implements the Ambisonic spatialisation technique (see Malham[19]). The designed orchestra provides 3D spatialisation of both the direct signal and the early reflections of it into a virtual room whose features are provided by the user. The Fig. 1: 3DEV Main Window following figure shows the data entries needed for specifying the characteristics of such virtual room. 3.1.2 Audio Files Window (Figure 2) The connection of the sound trajectories is very From top to bottom, this window is made with: easily achieved by means of triplets of Tables 1-A plot of the waveform of the selected audio (numerical vectors readable by Csound) containing signal. each one the Cartesian coordinates (X, Y, Z). Each 2-A horizontal line that relates the audio file sound trajectory is represented in the Csound score duration with each one of the nodes (the 3D as a “note” with its corresponding start time, duration, amplitude scaling and tables indicating points between which the virtual source will its movement. This way, it is possible to spatialise travel). The horizontal position of each node and mix multiple virtual sound sources moving in can be modified to change the duration of each different ways. segment of the trajectory. This way, accurate time synchronization between movement and audio may be achieved. 3-A system of two editable envelopes indicating the orientation of the virtual sound source (azimuth and elevation angles). By this system, the orientation of the sound source can be changed dynamically also accurately synchronized with the audio signal. Fig. 3: Audio File Connection Window

4 Conclusions 3DEV is still under development but already constitutes a very useful tool for the electroacoustic music composers. Further developments will address the design of more connections with other Spatial Audio Fig. 2: Audio Files Window Software, as the one shown in the point 3.1.3. Also, other development possibilities would be to 4-Lastly, a section with a temporal reference include several drawing and transformation

163 routines of 3D trajectories as well as the storing of [7] Kendall et al. 1989 . Spatial reverberation. the data in diverse formats for their use with other discussion and demonstration. In: M. Mattews and spatialisation programs. Having the data accurately J. Pierce: Current Directions in Computer Music generated and stored, it would be a simple matter Research , MIT Press, Cambridge, Mass. to convert it to different coordenate systems, as [8] Richard Furse. 1995). Spatialisation Tools . well as to adapt it to the different syntax (http://www.muse.demon.co.uk/csound.html ) requirements and data types used by other spatialisation software. Finally, as the visualization of many complex [9] Richard Furse. 1999). Vspace. spatial trajectories operating together may be (http://www.muse.demon.co.uk/vspace/vspace.htm difficult to achieve, a real-time animation plot of l) them sinchronized with their audio files associated is also a feature to be further developed. [10] F. Lezcano. 1992. Quad sound playback Acknowledgements hardware and software. C.C.R.M.A., Stanford To the Universidad Nacional de Quilmes and the University. National Agency of Scientific and Technological Promotion of Argentina, for granting financial support of this research. [11] Richard Karpen. 1998. Space and Locsig Ugs. In The Csound Manual, http://www.csounds.com/manual/html/locsig.html , http://www.csounds.com/manual/html/space.html . References [1] Trevor Wishart. 1996. On Sonic Art. [12] Ville Pulkki. 2001. Spatial sound generation Harwood academic publishers, UK and perception by amplitude panning techniques , Technological University of Helsinki, Report N°62, Finland. [2] Karlheinz Stockhausen. 1955. Kontakte , text published in the CD Nº3 Electronische Musik 1952-1960, Stockhausen Verlag, Alemania. [13] Itzvan Varga. 2000. Spat3d Unit Generator , In The Csound Manual , http://www.csounds.com/manual/html/spat3d.html [3] John Chowning. 1971. The simulation of Moving Sound Sources. Journal of the Audio Engineering Society, 19. [14] M. Wozniewski, Et al. 2007. AudioScape: A Pure Data library for management of virtual environment and spatial audio . Pure Data [4] F.R. Moore. 1983. A General Model for Spatial Convention, Montreal, Canada. Processing of Sounds. Computer Music Journal, http://www.audioscape.org/twiki/pub/Audioscape/ Vol. 7, No.3. AudioscapePublications/audioscape_pdconv07_fin al.pdf [5] F.R. Moore. 1989. Spatialisation of sounds over loudspeakers. In: M. Mattews and J. Pierce: [15] Thomas Musil. 2007. Binaural-Ambisonic Current Directions in Computer Music Research , 4.Ordnung 3D-Raumsimulationsmodell mit MIT Press, Cambridge, Mass. ortsvarianten Quellen und Hörerin bzw. Hörer für PD , IEM Report Nº38/07 [6] F.R. Moore. 1990. Elements of Computer Music. Prentice Hall., New Jersey. http://iem.at/projekte/publications/iem_report/repo rt38_07/report38_07

[16] Rene Causse, et al. 2002. Directivity of [17] Eric Harlow. 1999. Desarrollo de musical instruments in a real performance Aplicaciones Linux con GTK+ y GDK , Prentice situation , ISMA Proceedings. Hall, Madrid.

164

[18] Bill Schottstaedt. (last accessed 01-01-2010): SNDLIB , http://www.ccrma.stanford.edu/software/snd/sndli b/

[19] Dave Malham. 2009 (Spanish translation by Oscar Pablo Di Liscia). El espacio acústico en tres dimensiones y su simulación por medio de Ambisonics , In the book: Música y espacio: Ciencia, tecnología y estética, Colección Música y Ciencia, Editorial UNQ, Buenos Aires, Argentina.

165 166 Sense/Stage — low cost, open source wireless sensor and data sharing infrastructure for live performance and interactive realtime environments

Marije A.J. BAALMAN Joseph MALLOCH Harry C. SMOAK Joseph THIBODEAU Vincent DE BELLEVAL Marcelo M. WANDERLEY Brett BERGMANN Input Devices and Music Interaction Lab Christopher L. SALTER McGill University Design and Computation Arts Montr´eal,Qu´ebec Concordia University Canada, Montr´eal,Qu´ebec [email protected] Canada, [email protected] and [email protected]

Abstract an open source software environment that • SenseStage is a research-creation project to develop enables the real-time sharing of such sensor a wireless sensor network infrastructure for live per- data among designers and formance and interactive, real-time environments. plug in modules that enable the analysis of The project is motivated by the economic and tech- • such sensor data streams in order to pro- nical constraints of live performance contexts and vide building blocks for the generation of the lack of existing tools for artistic work with wire- complex dynamics for output media. less sensing platforms. The development is situated within professional artistic contexts and tested in The project emerged from a desire to address real world scenarios. a novel, emerging research field: distributed, In this paper we discuss our choice of wireless plat- wireless sensing networks for real-time composi- form, the design of the hardware and firmware for tion using many forms of output media includ- the wireless nodes, and the software integration of ing sound, video, lighting, mechatronic and ac- the wireless platform with popular media program- tuation devices and similar. ming environments by means of a data sharing net- work, as well as evaluation and dissemination of the Three specific factors have motivated the technology through workshops. Finally, we elabo- SenseStage project: rate on the application of the hardware and soft- 1) Economic and technical constraints of live ware infrastructure in professional artistic projects: performance: There is an increasing interest two dance performances, two media projects involv- in the use of sensing technology in live per- ing environmental data and an interactive, multi- formance contexts. The economic and cultural sensory installation. constraints of live performance, however, make the integration and use of such technologies dif- Keywords ficult. It is seldom possible to have long re- Wireless sensing, mesh networks, data sharing, real- hearsal periods with full access to a technical time performance, interactive environments. setup that is equivalent to the eventual perfor- 1 Introduction mance space due to the industrial model of cul- tural production — show in, show out – leav- SenseStage is a research-creation project to de- ing no room for exploration of new technological velop small, low cost and low power wireless sen- possibilities and the artistic impact. By provid- sor hardware together with software infrastruc- ing a solution for easy application of the tech- ture specifically for use in live theater, dance nology, the short rehearsal periods and “tech- and music performance as well as for the design weeks” can be spent more on the artistic ex- of interactive, real-time environments involving ploration of the technology, rather than solving distributed, heterogeneous sensing modalities. technological problems. The project consists of three components: 2) Lack of tools for artistic use: There are a series of small, battery powered wireless many groups currently researching the appli- • PCBs that can acquire and transmit input cations of wireless sensing networks, but their from a range of analog and digital sensors, design decisions are normally motivated by en-

167 gineering innovations, thus leading to efficient Long battery life yet, prohibitively expensive and complex sys- • Ease of use tems. Also, despite the large number of research • Programmable, so that the board can take projects in the field, there are few wireless sens- • care of more logic and processing of data, ing platforms that are actually available for real if desired by the user world use, or that are affordable for artists. In We decided to use the XBee in combination addition these wireless sensing platforms rarely with the Arduino5 platform, as the XBee al- integrate with the software tools that artists use ready provided us with the needed ad-hoc net- for making music, sound and media. work structure. Additionally, several other de- 3) Real world testing scenarios: Much of the velopers have documented their experience us- research agenda for the project was driven by ing XBees in conjunction with Arduinos6 allow- many years of artistic work and technological ing us to skip some common development pit- development of tools to facilitate the creation falls. of interactive performances and installations. We based our board design on the Arduino These events employed distributed sensing and Mini Pro, being able to tap into many available mapped such input data to complex parameter firmware libraries, as well as the development spaces for the control of sound and other me- and programming environment. Furthermore, dia in real-time (e.g. Schwelle [Baalman et al., Arduino is widely used in open source, artistic, 2007] and TGarden[Ryan and Salter, 2003]). A physical computing contexts, so our board will key design element of the SenseStage project be easy to use for this community. is thus to deploy SenseStage technologies into The main focus then was to design a PCB real world, professionally driven testing envi- that was small, and to develop standard ronments to see how such tools function “in the firmware that makes it easy to setup and use wild” and outside of the standard lab, demo- the boards, as well as exploring the use of and driven mode normally given to the presentation integration with the XBee wireless chips. of new technologies. The PCB layout of the SenseStage MiniBee is shown in figure 1. The first board revision came 2 Hardware to a unit cost price of about 32 CAD, exclud- For the hardware wireless sensing node for ing the XBee chip, for a manufacturing run of Sense/Stage, our main requirements were cost 100 boards (PCB creation, assembly and parts). per unit — since low costs allow experimenta- With a larger manufacturing run and allowing tion with large numbers of nodes — and im- for a longer assembly time, this per unit cost mediate availability. We investigated several will drop considerably. options for wireless nodes developed by other groups, such as the µParts1 [Beigl et al., 2005], 2.1 Firmware the EcoMote2 [Park and Chou, ], the Tyndall The firmware is a collection of functions to Motes3 and the Intel Motes4[Nachman et al., handle wireless transmission and communica- 2005], but none of these satisfied our needs, or tion, sensor reading and basic read/write op- more importantly they were simply not avail- eration on any available pin of the MiniBee. able. The firmware is built using our own library Our design goals were: for the Arduino environment which allows users to quickly build their own application for the Low cost MiniBee. • Small form factor • Flexible sensor configuration Currently the following sensors/actuators are • Usable for control of motors, LEDs, and supported by the firmware: • other actuators. Analog sensors (connected to the analog in- • Operable in large groups (10+ nodes) put pins, e.g. resistive sensors, analog ac- • celerometers, infrared distance sensors) 1http://particle.teco.edu/upart/ 2http://www.ecomote.net/ 5http://www.arduino.cc 3http://www.tyndall.ie/mai/ 6e.g. the ArduinoXbeeShieldhttp://www.arduino. WirelessSensorNetworksPrototypingPlatform_25mm. cc/en/Main/ArduinoXbeeShield, the Arduino Xbee htm Interface Circuit (AXIC) http://132.208.118.245/ 4 http://techresearch.intel.com/articles/ ~vitamin/tof/AXIC/ and the Blushing Boy MicroBee Exploratory/1503.htm R3 http://blushingboy.org/content/microbee-r3

168 in the EEPROM of the MiniBee. Each time the MiniBee boots up, it reads the serial number of the attached XBee11 and re- lays this information to the coordinator node connected to the host computer. The host then assigns a unique node ID to the MiniBee and optionally sends a new configuration to the MiniBee. If no new configuration is re- ceived, the MiniBee can access its configuration through its EEPROM. The host computer soft- ware remembers the known boards and XBee serial numbers so node IDs and configurations are maintained for a project. Using an 8 MHz clock on the board, the max- imum baud rate that can be achieved is 19200 baud. In a next revision we will include a faster Figure 1: The SenseStage MiniBee PCB, rev. (up to 20MHz) crystal, which will allow the use A. The XBee is mounted on the other side of the of higher baudrates. board. Changes in the second revision include Future work also includes writing a wireless smaller board-size and the footprint for a coin- bootloader to fully reprogram the SenseStage cell. MiniBee without the need to manipulate the boards (see for example LadyAda12). The extra Digital sensors (on/off, e.g. buttons and components needed to do this will be added in • switches) the next revision of the board. LIS302DL accelerometer7, using I2C • Relative humidity and temperature sensor8 3 Software • Ultrasound sensors9 In order to make the data from the wireless sen- • PWM output (e.g. dimmable LEDs, mo- sor nodes available to several collaborators on a • tors) project simultaneously, we developed the Sense- Digital output (on/off) World DataNetwork. It is intended to facilitate • the creation, rehearsal and performance of col- The serial protocol is loosely based on the Se- laborative interactive media art works, by mak- rial Line Internet Protocol (SLIP)10, and is set ing the sharing of data (from sensors or internal up as simply as possible to ensure that data processes) between collaborators easy, fast and packets are small. A regular package is built up flexible. Our aim is to support multiple media as follows: practices and allow different practitioners to use escape character, followed by message type the software to which they are accustomed. The • message ID framework is intended to support coordinated • node ID collaboration with real-time data and multiple • data bytes media types within a live interactive perfor- • delimiter character mance context. • A full reference of the different messages sup- The framework is different from the Key- 13 ported is given in table 1. Worx framework [Doruff, 2005], which The firmware is configured through a host emphasizes net-based art and collaborative computer which allows to quickly change its op- projects between different locations, and the 14 eration without having to physically reprogram McGill Digital Orchestra Tools [Malloch et the microcontroller. This approach is not un- 11using the AT command mode. Other people like Firmata [Steiner, 2009] with the difference have made an Arduino library for communicating with that our firmware stores its latest configuration XBees in API mode. See http://code.google.com/p/ xbee-arduino/ 7There is a footprint on the board for this sensor. 12http://www.ladyada.net/make/xbee/arduino. 8Sensirion SHT1x series html 9The “Ultrasonic Ranger”, http://www. 13http://www.keyworx.org robot-electronics.co.uk/htm/srf05tech.htm 14There is a Max/MSP bridge between the SenseWorld 10http://tools.ietf.org/html/rfc1055 DataNetwork and the Digital Orchestra Tools so they

169 description type data sender Announce ’A’ server Quit ’Q’ server Serial number ’s’ Serial High (SH) + Serial Low (SL) node ID assignment ’I’ msg ID + SH + SL + node ID + (*config ID*) server Configuration ’C’ configuration bytes server PWM ’P’ node ID + msg ID + 6 values server Digital out ’D’ node ID + msg ID + 11 values server Data ’d’ node ID + msg ID + N values node

Table 1: Message protocol between host and MiniBee nodes. Type is preceded by the escape character (92), and messages are delimited with the delimiter character (10). The escape character is used to escape to the message type, and whenever the delimiter, the escape character, or the carriage return character (13) occurs in any of the other bytes. General convention for the message type: server message in upper case, node message lower case. (*...*) indicates an optional byte. al., 2008], which primarily focus on the map- in programming SC), so long as their software ping and performance of monolithic digital mu- environment supports OSC and is able to com- sical instruments. ply with the OSC namespace conventions in or- The final design criteria were to: der to communicate with the network. Thus, Tight integration with the wireless sensing the framework is designed to allow for ease of • platform use within a designer’s own creative practice. Allow reception of data from any node by A central host receives all data messages and • any client (subscription) manages the client connections (see figure 2). Allow transmission of data to any node by Each client can subscribe to one or more data • any client15 (publication) nodes in order to use that node’s data in its own Restore network and node configuration internal processes. Furthermore, each client can • quickly publish data onto the network by creating a Be usable within heterogeneous media soft- node. A new client can query the network con- • ware environments cerning which nodes are present and is informed Enable collaboration between heteroge- when new nodes appear after the client has been • neous design practices registered. Enable efficiency of collaboration within A data node can be understood as a collection • the limited timeframe of rehearsals of data that belongs together, e.g., data coming from the same sensor device or the output of 3.1 The SenseWorld DataNetwork a particular device such as the DMX control framework stream for theatrical light. Within each node The framework’s core is implemented in Super- there are slots which represent single data val- Collider (SC), which is available as open source ues, for example, a data node representing a software and runs on several platforms (Linux, 3-axis accelerometer has three slots, with each OSX, Windows and FreeBSD). Clients have slot corresponding to one axis. If a client is only been implemented in SC, PureData, Max/MSP, interested in one slot of a node, he can subscribe Processing, Java, and C++ so far. The OSC specifically to that slot. namespace is well defined, so it should be trivial to implement clients for other software environ- 3.2 Integration with MiniBee mesh ments. network What follows is a technical description of the We have specifically integrated the connection implementation of the framework. Users of the to the SenseStage MiniBee network, by provid- framework need not be familiar with the inner ing options to configure, send and receive data workings of the framework (or have experience from the sensor network, through the host of the DataNetwork. can be used together. 15only one client can set data to a specific node at a The host communicates with the wireless net- time. work through a serial protocol. It manages all

170 have a reply, which is either the requested info, a confirmation message or a warning or error. In comparison, the Digital Orchestra tools provide a decentralized network using one gen- eral multicast port on the network to settle the ports and namespaces of clients. Since not all interactive media software environments sup- port listening to multicast channels, we elected not to use this approach. Rather our assump- tion is that each client will settle its listening Figure 2: Diagram of the SenseWorld DataNet- port itself within the operating system of the work structure. computer on which it runs. The host can then the incoming and outgoing data from and to the distinguish each client by its IP address (auto- wireless nodes. The data from a MiniBee will matically included in the OSC message) and a appear onto the network as a DataNode, while port which is an argument of each OSC mes- each attached sensor is a data slot in this data sage in the namespace. The latter is necessary node. Clients can then subscribe to this node as several clients (Max/MSP among them) do to receive its data. Furthermore, a client can not send OSC messages from the same port as send a message to map the data that is on a they are listening to, or cannot configure the data node the client has created to the digital port they are sending from. outputs or pulse width modulation outputs of a 3.4 Auto-recovery MiniBee, in order to control, for example, lights Practical lessons derived from rehearsal and or motors. performance experiences remind us that soft- 3.3 OSC implementation ware applications and processes can be unex- pectedly and fatally interrupted (i.e., crash). The network is accessed through an OSC in- 16 For this reason, a fast and automatic recov- terface , which allows a client to join the data ery of all previously instantiated connections network, access its data and also create its own is critical. The following methods are imple- data nodes on the network. mented to enable fast and automated recov- The general setup is as follows: a client first ery in such situations. Following (re)start of sends a registration message to the data net- the host server and (re)establishment with the work server. The client will immediately begin network, an announce message is broadcast on receiving ping messages to which it must re- several ports. In addition, the server updates ply with pong messages confirming the client’s a publicly readable file with the current ac- presence. Following the initial registration, the tive listening port. Moreover, the host can re- client can submit a query message in order to re- store previously connected clients from informa- ceive a complete list of nodes and slots currently tion stored in a server readable file. In turn, available from the network. The client can then the client has read access over the network to subscribe to selected nodes and slots, and sub- the host configuration file and automatically sequently will receive data from the nodes and retrieves the port information to which it has slots it is subscribed to via data messages cor- to register. Further, the client implements an responding to the subscribed data sources. auto-configuration process triggered upon re- The client can supply a new node to the net- ceipt of a host announce message and a response work by using the /set/data message (which from the host indicating the client has success- is also used subsequently to set new data). A fully registered. client can also label the nodes and slots it has created. Whenever a new node or slot is added 3.5 SuperCollider implementation (by any client) or changed (e.g., when it gets The SuperCollider (host) implementation is a label), the client will receive a new info mes- done via a set of custom written classes: sage automatically. All messages to the server SWDataNetwork base class for the network. SWDataNode base class for a data node. 16assumed to be used via the UDP layer. The OSC protocol in itself is not dependent on the underlying pro- SWDataSlot base class for a data slot. tocol, but is implemented on top of TCP and/or UDP SWDataNetworkSpec implements the la- in most software systems belling of the nodes and slots of the net-

171 work. SWDataNetworkOSC implements the OSC interface SWDataNetworkOSCClient keeps track of a connected client Data nodes have both unique IDs (integer numbers) and human readable labels (e.g., node “3” has the label “accelerometer”). Data slots are automatically numbered, according to the order in which they appear, as they are set in the network; they can also be given a label. The labelling is not done automatically, so that the naming becomes a conscious and integral part of establishing shared nomenclatures for the col- laboration. This encourages consideration by the users of what the data represents and its po- Figure 3: The DataNetwork SC Client interface. tential use. The label specification (the “spec”) can be stored between sessions so it can be re- the data it receives, and thus has the same inter- called again upon startup. face in its use within SC, with some additional Each data node and slot has methods to print methods for interaction with the host. debugging messages, set an action to be per- In a setup with several collaborators using formed upon new incoming data, scale and/or SC, one of them will act as a host for the net- remap the data and create a control bus on the work, while the other SC participants register audio server17 with the data. Each data node as clients to that host. Since the code interface also can specify an action to be taken in case for both is almost the same, apart from the ini- there has been no input to the node for a cer- tialisation, it is very easy for SC participants to tain amount of time (e.g., trying to reconnect prepare and test their own inputs and outputs to an external device). to the DataNetwork individually, and switch to If a client creates a node, that node is linked the shared network during collective rehearsals. to the client (the client becomes the “setter” In figure 4 an example is given of the code in- of the node), and no other client can set data terface to both the host and the client. to that node. Client configuration can also be 3.5.2 Derived data stored to a file and be used for recovery on Deriving data from existing nodes can be done startup. for example by combining data from various All data from the network can be written to nodes, calculating statistical properties of data, a log-file in a text format containing lines for or smoothing data. each time step with tab-separated data values. To facilitate this, we have developed methods The log can be opened and played back with the to do this either making use of the language class SWDataNetworkLog. features of SC, or by making use of the server’s Finally, a graphical user interface has been unit generators to perform these calculations. implemented to monitor the status of the data network and the state of each node (see figure 3.6 PureData and Max/MSP clients 3 for the client’s version of this GUI). Two abstractions provide access to the Sense- 3.5.1 The client World DataNetwork within a Pd or Max patch: dn.node and dn.makenode. Both implemen- The class SWDataNetworkClient imple- tations require a private variable for the host ments an OSC client to the SenseWorld IP address and for the client name (shared be- DataNetwork so that an external SC client can tween all instances on a client), but otherwise also be part of the network. It implements the take care of the details of talking to the network. client side of the OSC interface. It inherits dn.node subscribes to an existing node, out- from the class SWDataNetwork to manage putting the received data as a list. In Pure 17SC consists of two parts: sclang, which is the pro- Data each instance subscribes to a single node, gramming language, and scsynth, which is a dedicated whereas in Max it is possible to receive data audio server. from multiple nodes using a single dn.node ob-

172 Example of setting up the host in SuperCollider: import datanetwork.*; DNConnection dn; // DNConnection instance DNNode node; // DNNode instance // create an instance of the DataNetwork x = SWDataNetwork.new; void setup() { // adds the host OSC interface dn = new DNConnection(this, "192.168.0.104", x.addOSCInterface; dn.getServerPort("192.168.0.104"), // create an instance of the class 6009, "p5Client"); // managing the MiniBees: node = new DNNode(2000, 5, 0, "p5Node"); q = SWMiniHive.new( x ); } // make GUIs for the DataNetwork and the hive: void stop() { x.makeGui; dn.unsubscribeAll(); q.makeGui; dn.removeAll(); dn.close(); An example of using the client in SuperCollider: } void keyPressed() { // create a DataNetwork client with the name ‘‘sc’’: if(key == ’r’) dn.register(); x = SWDataNetworkClient.new(‘‘192.168.0.104’’,‘‘sc’’); else if(key == ’q’) dn.queryAll(); // see what is in the network else if(key == ’f’) dn.subscribeNode(401); x.queryAll; else if(key == ’d’) dn.setData(node, // we know that node 2 has interesting data coming new float[] { 4.0, 2.0, 1.0, 2.3, 4.4 } ); // from a floor pressure sensor, so we subscribe: } x.subscribeNode( 2 ); // receive and print data: // we want to create a node with a measure of the void dnEvent(String addr, float[] args) { // sample variance on node 102 and give a label print("Float: " + addr); x.addExpected( 102, \floorVar ); for(int i = 0;i < args.length;i++) // node 2 has the label \floor to access it print(" "+args[i]); // we create a bus for the data (s is default server): println(); x[\floor].createBus( s ); } // we create the sample variance node: ~floorVar = StdDevNode.new( 102, x, x[\floor].bus, s ); ~floorVar.start; Figure 6: An example of using the Processing // now we have raw data available on node 2, x[\floor] client. // and sample variance data on node 102, x[\floorVar] Figure 4: The DataNetwork SC code interface. 3.7 Processing and Java The Processing client is based on the JavaOSC library18, and is based around two classes: DNConnection, dealing with the communi- cation to and from the host, and DNNode, dealing with the properties of a DataNode. As the implementation is basically a Java class, the client can not only be used for Processing, but also for any Java program. An example of use of this library in a program is shown in figure 6. 3.8 C++ library The C++ library is based upon liblo19 for han- dling OSC communication and consists of a set of classes to deal with the various components of the DataNetwork: DataNetwork base class for the network. DataNode base class for a data node. DataSlot base class for a data slot. Figure 5: Screenshot of the PureData client. DataNetworkOSC implements the OSC in- terface. ject (in which case each node’s data list is pre- OSCServer C++ class to implement an OSC ceded by its node ID). Server (taken from swonder 20 and slightly dn.makenode does the opposite, publishing expanded) data to the network using a unique node ID. The 18http://www.illposed.com/software/javaosc. data must be formatted as a list with a number html of elements equal to the number of data slots 19http://liblo.sourceforge.net given as a creation argument to the instance. 20http://swonder.sourceforge.net

173 DataNode * node; DataNetwork * dn; ways of dealing with sensor data, for combining, conditioning, and processing of multiple data // create a data network: streams. dn = new DataNetwork(); // create an osc client interface for it: Several more SenseStage workshops are // arguments (host IP, client udp port, client name) planned for 2010 and 2011 as a means to fa- dn->createOSC( ‘‘127.0.0.1’’, 7000, ‘‘libdn’’ ); miliarize people with the SenseStage infrastruc- // register with the host: dn->registerMe(); ture, as well as further explore issues in mapping // query all there is to know about the network and using large numbers of datastreams. dn->query();

// subscribe to a node: 5 Usage cases dn->subscribeNode( 5, true ); In this section, we will discuss several projects // create a node: in which the SenseStage technology has been dn->createNode( 4, "world", 5, 0, true ); used. All of these projects have been, or are // label one of its slots: dn->labelSlot( 4, 2, "hithere", true ); being shown at international festivals and mul- tiple venues, thus fullfilling the criterium that float dummydata[] = {0.1, 0.3, 0.4, 0.5, 0.6}; the technology has to be useable in a real world, // get a reference to the node: node = dn->getNode( 4 ); professional artistic environment. // set data to the node: node->setData( 5, dummydata); 5.1 Dance: Schwelle and Chronotopia // send the data to the network: node->send( true ); The dance performance Schwelle [Baalman et al., 2007] had been developed before the Sens- Figure 7: An example of using the C++ library. eStage project began and has informed many of the design decisions made during the Sens- An example of use of this library in a program eStage project, so we are currently discussing is shown in figure 7. how the infrastructure can be adapted and im- proved to use our new technology for future 4 SenseStage Workshop performances. The performance involves 3 ac- The first SenseStage workshop21 was held in celerometers on the body (originally wireless May 2009 at Concordia University, as a test case based upon a Create USB interface with Mi- for using many sensor nodes in one space, as well croChip RF chips), 1 accelerometer in an ob- as having a number of artists, unfamiliar with ject (originally a WiiMote), and 3 light sensing the specifics of the technology and coming from boards (originally wired Create USB interfaces) diverse artistic and technical backgrounds, use placed at various places in the room. Further- the DataNetwork simultaneously. Participants more, there is activation of custom lights and were able to employ all available data in various motors in one part of the stage. Using the Sens- projects, which were developed over the course eStage MiniBees, we will be able to use now 3 of one week. While this workshop served as a separate sensing boards on the body, saving us test case for evaluating the use of our hardware problems of wiring along the body; instead of and software, we were also interested in how using the WiiMote for the accelerometer in the participants would make artistic use of the po- object, we can now use a SenseStage MiniBee, tentials of the system. which we can wake up from a sleep mode, once The workshop resulted in five group projects, we need the sensing in the box; this will save with groups consisting of 3 to 5 collaborators, us various problems of having to make the Wi- using light, sound, animation and video as out- iMote set up a BlueTooth connection by push- put media. The groups were able to go very ing buttons, while it is packed inside a box. For quickly from concept to experimenting with var- the sensing boards inside the room, we will also ious sensor modalities and using the data to be spared of the wiring. Where we were previ- drive the output media. Encouragingly, the par- ously using a custom OSC-namespace for this ticipants seemed to be primarily concerned with piece, we can now use the DataNetwork clients “what to do with the data” rather than “how instead, and gain much more robustness with re- to get the data”. The results of this workshop gard to reconnecting; also it will be much faster also highlighted the need for more sophisticated to add new data to exchange between the in- teractive light controls and the sound control, 21http://sensestage.hexagram.ca/workshop/ should we feel the need to do so.

174 5.2 Environmental: MARIN and Arctic Perspective In two artistic projects dealing with environ- mental data, MARIN22 and Arctic Perspective Initiative23, the SenseStage MiniBees were used to gather environmental data, such as temperature, hu- midity, light and air quality, as well as 3-axis acceleration. The DataNetwork was used to ac- cess the data for real-time use, and to gather and log all the data to file for artists to use at a later time for visualisation and sonification. From an expedition to Nunavut in North- Figure 8: A still from the dance performance ern Canada in Summer 2009 as part of the Chronotopia with the Attakkalari Centre for Arctic Perspective Initiative project we learned Movement. that the range of the XBee and XBeePros is very much dependent on the environmental con- ditions. In the outside conditions there, the achieved range of transmission was only a few hundred meters, only about a fifth of the range In Chronotopia, a dance performance by the specified on the datasheet of the XBeePro. At Bangalore (India) based Attakkalari Centre for ranges larger than about 50 m. they are ex- Movement, and in collaboration with visual tremely affected by blockage, e.g. from the artist Chris Ziegler, we used the wireless tech- body. In indoor situations, the radio waves will nology for controlling a matrix of 6 by 6 cold be reflected by objects and walls and such block- cathode fluorescent lights (CCFL), and 3 hand- age may be mitigated. held CCFL lights. Since the power required Additionally, the batteries lost charge faster for the light matrix is quite high, it cannot be in colder conditions (about 6◦C) resulting in battery-powered, but the use of wireless tech- shorter battery life. nology freed us from running cable between the Another issue was that it was difficult to light matrix and the computer controlling it. power the host computer, receiving the data Given the short setup time in theaters (usu- from the MiniBees, using solar power, because ally just one day), and especially in technically of foggy and cloudy days. For this reason a kind challenging environments as India, this was a of datalogger approach for the MiniBees, which considerable advantage. For the 3 handheld ob- stores data locally and sends it to a host when jects, wireless control was critical, as the objects the host is online, could be useful for this kind are carried across the stage by the performers of application. during the show as part of the dramaturgy of Other challenges include finding waterproof the piece. Within the light control setup itself, housing to protect the electronics from ex- the DataNetwork was used extensively to ex- tremely wet weather conditions on sea. change data between different portions of the setup, such as the motion tracking data (from 5.3 Installations: JND/Semblance a camera looking down at the stage), and pitch JND/Semblance is an interactive installation and beat tracking data extracted in real-time that explores the phenomenon of cross modal from the soundtrack. We also exchanged data between the light control and the interactive 22“M.A.R.I.N. (Media Art Research Interdisciplenary video, both for synchronisation of cues with the Network) is a networked residency and research initia- tive, integrating artistic and scientific research on ecol- soundtrack (using frametime of the playback, ogy of the marine and cultural ecosystems.” (from published as data on the network), and for con- http://marin.cc/). necting the intensity of the lights to the video 23“The Arctic Perspective Initiative (API) is a non- image (the light control was publishing the max- profit, international group of individuals and organiza- tions whose goal is to promote the creation of open au- imum output value of all the lights in the matrix thoring, communications and dissemination infrastruc- onto the network, which was used to control the tures for the circumpolar region.” (from http://www. brightness of the video image). arcticperspective.org).

175 perception — the ways in which one sense im- leased as open source software under the pression affects our perception of another sense. GNU/General Public License. The installation comprises a modular, portable environment, which is outfitted with devices References that produce subtle levels of tactile, auditory, Marije A.J. Baalman, Daniel Moody-Grigsby, visual and olfactory feedback for the visitors, and Christopher L. Salter. 2007. Schwelle: including a floor of vibrotactile actuators that Sensor augmented, adaptive sound design for participants lie on, peripheral levels of light, live theater performance. In Proceedings of scent and audio sources, which generate fre- NIME 2007 New Interfaces for Musical Ex- quencies on the thresholds of seeing, hearing pression, New York, NY, USA. and smelling. M. Beigl, C. Decker, A. Krohn, T. Riedel, and In JND/Semblance the SenseStage MiniBees T. Zimmer. 2005. uParts: Low cost sensor are used for gathering floor pressure sensing networks at scale. In Ubicomp 2005. data. The SenseWorld DataNetwork is used to gather the raw sensor data, to extract features Sher Doruff. 2005. Collaborative praxis: The from it, and to establish flexible mappings to making of the keyworx platform. In Joke light, sound, and vibration on a platform on Brouwer, Arjen Mulder, and Anne Nigten, which the visitor is lying down. editors, aRt&D: Research and Development in the Arts. V2/NAI Publishers, Rotterdam. 6 Conclusions Joseph Malloch, Stephen Sinclair, and We have presented SenseStage, an integrated Marcelo M. Wanderley. 2008. A network- hardware and software infrastructure for wire- based framework for collaborative develop- less mesh-networked sensing, actuating, data ment and performance of digital musical in- sharing and composition within interactive me- struments. In Richard Kronland-Martinet, dia contexts. The infrastructure is unique as Solvi Ystad, and Kristoffer Jensen, editors, it integrates hardware and software, and makes Computer Music Modeling and Retrieval. sensor and other data easily available for all Sense of Sounds: 4th International Sympo- collaborators in a heterogeneous media project, sium, CMMR 2007, Copenhagen, Denmark, within each collaborator’s preferred software en- August 2007, Revised Papers, number ISBN vironment. 978-3540850342 in Lecture Notes in Com- We are currently revising the hardware and puter Science. Springer. firmware design, including options to configure and program the boards wirelessly. We plan to L. Nachman, R. Kling, R. Adler, J. Huang, have the board available for sale in the second and V. Hummel. 2005. The intel mote plat- half of 2010. Our future research will focus on form: a bluetooth-based sensor network for techniques for composing and creating with the industrial monitoring. In Proceedings of the many streams of realtime data available from 4th International Symposium on Information such a dense network of sensors. Processing in Sensor Networks (IPSN 2005) (Los Angeles, CA, April 2005). 7 Acknowledgements Chulsung Park and Pai H. Chou. Eco: Ultra- This work was supported by grants from the wearable and expandable wireless sensor plat- Social Sciences and Humanities Research Coun- form. In Third International Workshop on cil of Canada and the Hexagram Institute for Body Sensor Networks (BSN 2006). Research/Creation in Media Arts and Sciences, J. Ryan and Christopher L. Salter. 2003. Montr´eal,QC, Canada. Tgarden: Wearable instruments and aug- Thanks to Matt Biedermann for his feedback mented physicality. In Proceedings of the 2003 on the use of the SenseStage MiniBees in the Conference on New Instruments for Musical Arctic Perspective Initiative project. Expression (NIME-03), Montreal, CA. Thanks to Elio Bidinost, as well as all the SenseStage Workshops participants, for their in- Hans-Christoph Steiner. 2009. Firmata: To- put and feedback. wards making microcontrollers act like ex- tensions of the computer. In Proceedings of 7.1 Download NIME 2009 New Interfaces for Musical Ex- The SenseWorld DataNetwork is available from pression, Pittsburgh, PA, USA. http://sensestage.hexagram.ca. It is re-

176 EDUCATION ON MUSIC AND TECHNOLOGY,

A PROGRAM FOR A PROFESSIONAL EDUCATION

Hans Timmermans, Jan IJzermans, Rens Machielse, Gerard van Wolferen, Jeroen van Iterson. Utrecht School of the Arts, Utrecht School of Music and Technology, PO-BOX 2471, 1200 CL Hilversum, The Netherlands.

Arts; some have a more technical focus. The programs ABSTRACT partially overlap. Currently the School has over 350 students and a teaching staff of 35; we offer bachelor, We describe the development and the maintenance of a master and research degree programs. The mission of the program for a professional education at Utrecht School of School is to educate students up to a level of skills and Music and Technology. The program covers most of the knowledge and to a level of professional attitude good field and offers various degrees up to PhD level. The enough to work and survive within the Dutch context and program was developed over the last 23 years and is the international context. The emphasis of the different updated on a yearly basis. We deliver about 80 graduates a programs is on the personal development of the student; to year to work, survive and keep up with developments. 92 help her to earn a living and to establish a professional % of our students develop a healthy career after career in which she is able to continue the professional graduation. Music technology as a field of studies is in development and keep up with the constant and rapid constant and rapid development and because of that the developments in the field of Music and Technology. characteristics of ‘the professional’ in the field are changing very rapidly too. For this reason we have built in 1.1. History mechanisms to enforce regular updates of the program and to develop the knowledge and skills of the teaching staff. Our School started as a department of Utrecht Conservatory of Music in 1985. Our current program 1. INTRODUCTION started in 1987. From then on we have been updating to synchronize with the outside world on a very regular basis Utrecht School of Music and Technology is a Dutch to obtain a good match between the educational program Professional Education in the broad field of Music and and the demands of real existing professional careers. For Technology. Our School is part of the Utrecht School of the last 10 years we get figures like the 92% of our the Arts, a University with 5 faculties (Visual Arts, Music, graduates that have a healthy career within music Theatre & Drama, Art & Economy and our faculty of Art, technology within two years after their master graduation. Media & Technology), around 3600 students and 600 Pro year the School of Music and Technology admits lecturers. Within the School of Music and Technology we around 90 students of whom around 85 will graduate in offer several tracks in the fields of: one of the many tracks we have in our program. On ICMC2003 we presented a paper on our program, on our • Sound Design. School and on our educational philosophy [1]. • Composition for the Media. • Composition for non-linear Media 1.2. Developing curricula and maintenance. • Composition of Electronic Music. In the past six years some major changes have occurred, • Composition & Music Production. which made us adapt the curricula for Composition and the • Composition & Music Technology curriculum for Music Production. We are developing a • Music Production & Performance. curriculum for Composition for non-linear Media • Music Technology & Performance. (Composition for Games fits in that track). Research • Audio Design, Music System Design within our School got a strong focus on applications of • Audio Design, Recording and Production Music technology for grownups as well as children with special needs; research was integrated as a topic in most Some of these programs have a strong emphasis on the curricula.

177 Music technology as a field of studies is in constant and developments, to gain new insights and develop new rapid development and because of that the characteristics concepts. of ‘the professional’ in the field are changing very rapidly too. We developed a very active alumni policy, we track 3. COMPOSITION FOR THE MEDIA all of our graduates, we make sure we know what types of work they do and in what type of context they work and 3.1. Contexts within the media industry we invite most of our alumni on conferences and as guest lecturers and advisors. We follow developments in the Looking at different contexts within the media industry different contexts we educate for and we redesign parts of such as television, advertisement, film (fiction, our curricula if changes in the actual contexts urge us to documentary, animation) and games, they all have some do. We have a small group of very experienced specific characteristics in common i.e. the speed and the professionals acting as a taskforce to deal with misfits unpredictability of developments. Both are caused by the between our curricula and actual contexts, the total of availability and accessibility of digital means and curricula and teaching methods is adapted once every four distribution channels. years (every year we adapt parts of each degree-program). It’s quite easy using free software to make your own radio- Every member of our teaching staff is involved in or TV-program and broadcast it through the internet. implementation and evaluation of newly designed curricula Consumers are becoming producers or – as they are called and methods. nowadays – prosumers. Another element of importance is the individualization of consumers. The traditional 2. VISION AND MISSION. approach of the media industry through target groups is not valuable anymore because the average consumer has developed a very individual and therefore unpredictable In our vision the most important topic to teach to a student behaviour. Additional is the given fact that media is how to keep up with the constant developments of the landscapes are constantly subject of discussion in politics field. We teach our students how to study and how to on a national and on a European level. Parties concerned develop as a professional after graduation. To establish this such as the advertisement industry and the television attitude and to develop the skills to do so the teaching staff industry try to find solutions for the problems as described developed some formats for parts of the program: above in different ways. On the one hand one uses a very

individual approach of the potential consumer (narrow • Lectures and classes, as they have been there for ages. casting). On the other hand a ‘product’ is published • Workshops in which a student learns from exercises through all possible media channels and –forms so the accompany the lectures. consumer cannot escape from it (cross media publishing). • Learning-projects in which the focus is on some In addition all kind of mechanisms and moments for aspects of Music technology. measuring the effectiveness of a product are developed so • Hands-on sessions and practical assistance by older the product can be re-adjusted: stand a chance of a students. disappointing commercial result should be minimal. • Study-groups in which students work together on

theoretical and research issues. 3.2. Role and function of music in the media industry • Study-groups in which students discuss each other’s work in progress and reflect on their own work in Taking a closer look at the role and function of music in progress. the media industry, it is precisely this effectiveness that has • Industrial placement in companies or research been a problem in the past. There was no way to hear and Institutions. check a final score for a film or TV-commercial in advance • Real world projects with (paid) assignments from because usually only a piano version existed to decide outside the School. upon (the composer would play the score on the piano and • Interdisciplinary projects with a strong emphasis on would explain the orchestrations verbally to the director or the production processes that are typical for the advertisement agency). Along with digital technology multidisciplinary setting of the specific project. came the possibility for the composer to use a ‘virtual’ (Some examples: Music Composition & ensemble or orchestra to produce a demo-version of the Contemporary Dance, Interactive Composition & final score in close detail and perfection and to allow Theatre, Sound Design & Games) changes until the very last minute. Apart from using this technology in the actual composition process, a way to The overall characteristics of our program: obtain more control for the ‘people in charge’ on the • Students learn by doing, concept of music for audiovisual media and its outcomes, • Students learn to be their own teacher. is the use of so-called temp track or temp music (the We educate the student to professional independence with expression ‘temp track’ stands for the use of existing self-reflection as an important tool to keep up with new music during postproduction to create a temporary score

178 for a scene, a film or any other audiovisual production) 1. to the role and function of music and sound in audiovisual The process of developing a concept for a score through media. the use of temp track can be done with or without the Another important item in the curriculum is the composer. Both possibilities however end up with a final contextual awareness that refers to the understanding of temp track that is the model for the appropriate music for specific contexts, its participants and the related design and this specific scene or film or TV-commercial because ‘it production processes. Such awareness is needed to keep up works’. In applying this model, the outcome is often an with the ongoing developments as described in the first assignment for the composer to create a so-called sound- paragraphs of this article. alike, which is a copy of the original temp track (within the In addition cognitive skills are needed to understand the limitations set by the copyright). Such an assignment design and production processes (not in the least one’s doesn’t exactly appeal to the creativity of the composer own design and production process) and to be able to and is one of the reasons for composers2 to reject the use of reflect on it, to adapt it if needed and to use it as a possible temp track in the scoring process. Despite this rejection, means for innovation. the use of temp track has become a regular ‘tool’ in media Designing the curriculum composition for media around production. these core elements, we aim to provide the student with Another important element with regard to the role and tools that help them succeed in building a professional function of music in the media industry is the practice in a rapidly changing media industry. emancipation of sound. Ever since Debussy taught us that timbre is as important as pitch, the actual ‘sound’ of film 4. MUSIC PRODUCTION music became more important as digital technology allowed a far better reproduction of film sound. And when 4.1. Production and distribution. the same technology opened up the world of electronic sounds and electronic music, the possibilities for sound to During the past two decades several major technological be used in and as film music have become endless. innovations have fundamentally changed the music industry in general and the record industry in particular. Disruptive technologies, in particular those in the areas of 3.3. Consequences for education. networking and digital audio, have caused fundamental Despite these developments, professional education in shifts in working-methods of music production and composition for media usually focuses on music for film distribution of music. Professional music production has using the traditional symphony orchestra. Other musical rapidly changed from being based on more centralised, genres and/or other contexts that combine visuals, sound industrial models of organisation towards decentralised, and music like the advertisement industry or the game networked models of organisation. The democratisation of industry are most of times not taken into account. music production tools has caused a shift from technology Frequently the perspective of these courses and curricula is towards design strategy as a distinctive aspect for music that of the composer as an autonomous artist. production professionals. The Utrecht School of Music and Technology regards the soundtrack in audiovisuals not as a combination of 4.2. The physical product and the virtual. separate disciplines but as a coherent object that is made up of the ingredients called dialogue, music, sound effects At the same time, the move from physical product (e.g. and sound atmospheres. The curriculum composition for CD) towards virtual product (e.g. mp3) has caused major media aims therefore at compositional skills and changes in the music industry, changes that are still knowledge that are not restricted to certain musical genres ongoing. These range among others from new business but refer to specific principles and methods that are related models in music production en music distribution, the shifting emphasis in commercial sectors from recorded music to live performance, the ongoing discourse on intellectual property, to the rise of new products such as in- 1 According to Karlin & Wright [2] filmmakers use temp track a.o.: game music and other forms of non-linear music. All these • to help them screen their film for the producer(s), studio, and/or network executives and preview audiences during various stages of developments pose challenges to education in music postproduction; production. • to establish a concept for the score; • to demonstrate that concept to the composer. 4.3. Consequences for education.

2 Film composers in general are not very keen on temp track. A lot of Traditional professional education on music production them have encountered the drawbacks of temp track: tends to focus on the technological aspects of music • If it works, it might be difficult to imagine a better way. And if it doesn’t work, then the chances are it will have spoiled your first production, especially studio engineering. It has difficulties emotional reactions to the film. And what’s worse, you may still have in dealing with theoretical and artistic aspects of popular trouble getting it out of your head [2]. music with its complex web of subgenres, all having a high

179 rate of circulation, while at the same time maintaining a 5.2. Research & music production / design high standard of academic integrity. Consequently In the field of music production and music design the traditional curricula on music production tend to display importance of research is developing. The field is in rapid implicit or explicit preferences for certain musical genres. development driven by technological, commercial and Traditional professional education has difficulties in social developments. Research was not a topic dealt with dealing with creative aspects of music production; relevant by individuals or small companies until recently. We see a for the contexts it is aiming at. It also has difficulties in growing need for research into music design processes, dealing with the current high rate of change in the music methods, and ways of co-operation and business models, industry, and organising a continuing and significant both mono-disciplinary and multi-disciplinary. This type interaction of these changes with its curricula. It has of research is one of the factors that can make a music difficulties with assessing and subsequently incorporating designer stand out in the crowd of users of the innovative developments in products, tools and strategies democratized production means. Also important is the in music production into their curricula. development of methods of research through music design.

Another import aspect of research is that building up The curriculum Sound & Music Production is designed to experience, however necessary, is not enough anymore in reflect and respond to the changes in the music industry the scattered field that music design is with average and their consequences for the music professional. company sizes that are lowest in all creative industries. A Furthermore, it is designed to balance technological research attitude and the willingness and, especially, the aspects of music production, design strategies and its capability of music designers to share results will be one of creative aspects, the ongoing development in various the major forces for further development of the field. sectors of the music industry and emerging new contexts The results of the research into and through music design of music production, and innovative developments in have a direct relevance for the professional development of product and distribution. We do this by approaching music students and teaching staff. Through this research we production from three angles, and directing the overlap of augment and develop our knowledge of music design to be these strands: the engineering producer, the organising able to contribute to the development of important new producer and the composing producer. The curriculum is fields like composition and sound design for adaptive student aimed, and takes as a main point the potential of systems, the improvement of music design processes in the each individual student as a reflective practitioner in the creative industries, the improvement of music (design) field of music production. We aim at a curriculum which education and the support for various social themes like exceeds the purely technological aspects of music inclusion, interculturality, application of design, game and production, exceeds specific musical genres and contexts, play concepts in new contexts like culture tradition, and provides students with tools which help them succeed education, care and the like. in building their professional practice in a rapidly changing music industry. 6. CONCLUSIONS 5. RESEARCH The period 1985 up to present we have developed a continuously adapting educational program. This program 5.1. Software development and system design is successful in preparing students for the existing and In the curriculum for Audio Design one of the tracks is future professional practice. Both the program and (the focused on music system design and music software knowledge and skills of) the teaching staff are being development. Most of the students graduating in this track adapted on a yearly basis to the developments in the field. build a professional career in research at the well-known We have learned and gathered evidence that it is necessary research centers in Europe (i.e. IRCAM Paris, MTG to build in mechanisms to guarantee such a development, Barcelona) or elsewhere. A growing number of graduates to guarantee the quality of the programs and to monitor the find their way into system design for theatre, games etc. field of studies and practice. Because of these developments we are adapting the curriculum in this track on a yearly basis. In the recent past 7. REFERENCES programming in a hardcore language like C++ and programming DSP were important topics, the focus is now [1] Timmermans et. al. “Education in Music and widening up to system design using Super Collider and/or Technology, a Program for a Professional Education MAXMSP as tools working on adaptive music systems. ” in Proceedings of the International Computer The type of research students are educated on is shifting to Music Conference, Singapore, 2003, pp. 119-126. R&D. [2] F. Karlin and R. Wright, On the track: a guide to contemporary film scoring. London (UK): Routledge, 2004. pp. 29 - 31

180 Concerts Evening Concert Saturday May 1st 2010 Location: SETUP

ZLB Miguel Negro

In this performance a spatial array of strings is simulated using physical modeling in Supercollider and perturbed using several digital raw signals.

Flowerpulse Marc Sciglimpaglia

Flowerpulse is a short algorithmic composition that strives to maximize musical depth within a minimalist score structure. Composed in SuperCollider, this piece is rendered as a code poem, a block of code that defines an evolving, kaleidoscopic cloud of sonic agents, each performing a variable-speed trills over selected harmonic overtones that rise, overlap and die out like falling stars. It has a simple but dense code structure which defines a generalized pattern, that of a ”Routine” of musical agents released one after another into the soundscape, each with its own voicing and lifespan. The resulting texture is a hybrid of computer decision-making and performativity - each voice has a randomly determined element in its trill, while the rate and shape of the trill is performed in real time by performance on the mouse, or by MIDI control. In this sense, Flowerpulse strikes a balance between control and freedom, the smooth and striated, order and disarray. By coupling selective or performative compositional elements at each stage with algorithmic and stochastic processes, it seeks out the center where interplay between both spheres becomes deeply entangled, and where the result is greater than the sum of its parts.

Code Livecode Live Marije Baalman

Livecoding the manipulation of the sound of livecoding, causing side effects, which are live coded to manipulate the sound of livecoding, causing side effects, which are live coded to manipulate the sound of livecoding, which are live coded to manipulate the sound of livecoding, causing side effects, which are live coded to manipulate the sound of livecoding, which are live coded to manipulate the sound of livecoding, causing side effects, which are live coded to manipulate the sound of livecoding, ...

Pd live coding Pieter Suurmond, IOhannes m zmolnig¨

PD live coding and/or competition

181 Lunch concert Sunday May 2nd 2010 Location: SETUP

Chuck Demo Kassen Oud

Evening Concert Sunday May 2nd 2010 Location: Theater Kikker

Inverso Cosmico Massimo Carlentini

The composition comes from the idea of one of the hypotheses regarding the creation of the universe, dividine it into three phases: the ”Era Adronica” (the cosmic Big Bang), the ”Era Fotonica” (the processes of cooling, dominated by radiation), the ”Era Barionica” (the present one, of the gravitational type). ”Inverso Cosmico” has traced its way backwards through these aforesaid hypotheses: The ”Era Barionica” a pseudomelodic line or the lifeline mixes itself with sounds of the senses or sensations ( hot, cold, dark, light, evanescence,ever present); The ”Era Fotonica” rhythmns with varying timbres or radiations that mix in sound bands or fluctuating materials; The ”Era Adronica”, the reduction and expansion of fragments of materials used previously or the Big Bang, repeated through time. The sounds of this piece of music, have been created through the modulation of frequency, and through granular synthesis.

Tombak Madjid Tahriri

Tombak is an Iranian Percussion Instrument, capable of performing very complex rhythms. My piece tries to discover new playing techniques for the Tombak, which usually aren’t playable on this Instrument. (Such as Pitch Changes, sustained Notes, Accords, to name a few) The basic idea of the Composition is a conversation between 3 different Tombak (low,middle and high register). Each of the 3 Instruments has its own character. Tombak was realized with CSound and Ardour.

GR Frances-Marie Uitti, Yota Morimoto

GR by cellist/composer Frances-Marie Uitti and composer Yota Morimoto is a followup to their collaborative DVD 13AL. The piece is the first of a series of works made by the authors for the 192 loudspeakers in Leiden. In this condensed transient piece, F.M. Uitti explores the metalo-klang of the novel aluminum cello, while Y. Morimoto’s electronics disintegrates the sound of the instrument into numerous sonic cells whose dynamics and constellation are governed by a cellular automaton agent. Acamar is a star at ’the end of the river,’ 120 light-years away from the Earth.

182 Transmutation Jonas Foerster

I put myself in the place of an alchemist and start searching for ”musical gold”. There is no need for a specific definition of this term, because it’s all about the search and not about the discovery of some ”gold synthesis”. Thus I can be sure to stay searching and never to find what I don’t know. It is a playful and experimental handling of musical substances to see how they can conglomerate, split or regroup and in which way they do so. There exist, of course, a lot of methods to achieve this, which may vary in many aspects, e.g. in speed or complexity. The properties of the educts and products themselves do play an important role, but it is first and foremost the transformations from one aggregate state or material to another, that is brought into focus: the ”transmutation of substance”, to refer to the alchemists again.

A Very Fractal Cat, Somewhat T[h]rilled (2008-2010) Fernando Lopez-Lezcano

This is the latest version of the second piece (after ”Cat Walk”) of a series of algorithmic performance pieces for pianos, computer and ”cat” that I started working on at the end of 2007 (the proverbial cat walking on a piano keyboard). The performer connects through a keyboard controller, four pedals and two modulation wheels to four virtual pianos both directly and through algorithms. Through the piece different note and phrase generation algorithms are triggered by the performer’s actions, including markov chains that the virtual cat uses to learn from the performer, fractal melodies, plain scales and trills and other even simpler algorithms. The sound of the pianos is heard directly, and is also processed using spectral, granular and other synthesis techniques at different points in the performance, creating spaces through which the performer moves. A surround environment is created with Ambisonics spatialization, and everything in the piece (algorithms, sound generation and processing and graphical user interface) was written in the SuperCollider language. ”A Very Fractal Cat, Somewhat T[h]rilled” is a piece for performer (a classically trained pianist), four virtual pianos, an optional midi controlled player piano (Disklavier or similar), and computer-based sound processing and synthesis. The performer uses a weighted 88 note keyboard controller, four pedals and two modulation wheels to connect directly and through a program to four virtual pianos and a player piano that render both the notes the performer plays and derived materials created in real-time by the program. The computer processing involves algorithmic generation of additional note and control events and sound synthesis, processing and spatialization of the resulting audio streams. The virtual pianos used include a Yamaha C7, Steinway, Bosendorfer and a Cage prepared piano. Sound processing includes granular synthesis and frequency domain transformations that affect (in certain sections of the piece) the sound of the virtual pianos. Additive synthesis is sometimes used to augment and emphasize the pianos sounds. The piece has been (and still is) evolving since the end of 2008 when the first versions of the program and score were created. It has been performed in concert several times and a previous (older) version was submitted and performed in the LAC2009 concerts.

183 Landschappen Augusto Meijer

’Landschappen’ (’Landscapes’) is a spacious tape piece, meant to be a sonic representation of Dutch Landscapes. There is a lot of inspiration & admiration to find in these endlessly flat, empty spaces. There’s been a need to express these landscapes in music because of this. So, it is really these typical impressions of emptiness and endlessness in Dutch landscapes, which form the basic concept of the piece. http://augustomeijer.wordpress.com

The Cake Laurens van der Wee

The Cake is a sound based improvisation machine

Lunch concert Monday May 3rd 2010 Location: SETUP

Playing music lazily on the bank of a river while listening to the bells of a far away city Louigi Verona

Using loops and a midi controller with an Irish low whistle played live.

CSound Demo Andres´ Cabrera Demonstration of CSound

Evening Concert Monday May 3rd 2010 Location: SJU Jazzpodium

Playing music lazily on the bank of a river while listening to the bells of a far away city Louigi Verona

Using loops and a midi controller with an Irish low whistle played live.

184 Frozen Images Wanda & Nova deViator

The process of freezing is a break between movements. As such, the structure of the concert, perforated with suspensions - breaks between points, is an emergence of a performance of frozen images, full of texts, contemporary electronic rhythms, exasperated guitars, noisy oscillations, and hypnotic bass lines. Along with the performers’ actions, tactile interfaces, and moving pictures, the performance raises questions about hypersexualisation and pornification, fetishisation in consumerism, mechanisms of the image and visual culture, idealisation of love, and the meaning of art and culture. Movement, text, and frozen images open up in their primary way precisely based on contemporary organised noise. The music, ambivalently contextualised through the video image and movement, stretches out to the body and its vibration, rational and affective. Frozen images get broken by the vibration of the word and the moving of the actual flesh/body in all its resistance.

Elektronengehirn: Geburt Der Kunst (Satz 1 und 2) Malte Steiner

The piece Geburt Der Kunst (Birth Of Art) was composed 2010 for the birthday of art, which was declared by French artist Robert Filliou to be on the 17. January. ( http://www.artsbirthday.net ) The concert was performed at Esc Im Labor in Graz, Austria, conducted remotely via the internet from Steiners studio in Hamburg, Germany. The composition was created with Pure Data and separates the control from the synthesis part, both communicating via OSC. Steiner just sent controldata via the Internet, no streaming data. The concert at LAC 2010 happens on location without network. The second part, Satz 2, will be performed the first time. ?Elektronengehirn is an electroacoustic project by Malte Steiner, founded in 1996. For soundcreation only software based synthesis and processing is used, Steiner started with CSound, custom software and soon Pure Data. So far this project released 2 CDs, a netlabel release and several contributions to compilations and collaborations. Steiner is involved in several open source projects, like the ’Minicomputer’ synthesizer for Linux. http://www.block4.com - http://www.elektronengehirn.de

SuperCollider live coding jam session Marije Baalman

Supercollider Live Coding Event. The format is under construction and will be somewhere between competitive and communal livecoding.

DS-10 Denominator Rutger Muller

185 Lovis Myrthe van de Weetering Electric violin, Melodica, Piano Mark IJzerman Laptop, Monome, Piano Joep Vermolen Cello, Metallophone Robin Koek Melodica, Metallophone, various small objects

The 150-Years-of-Music-Technology Composition Competition Than van Nispen tot Pannerden

186 Workshops

A bird’s-eye view on Linux Audio Lieven Moors

A comprehensive overview of Linux Audio and other Open Source Music Applications with several live music demos. Interesting for beginners as well as advanced users.

Developing parallel audio applications with FAUST Yann Orlarey

While the number of cores of our CPUs is expected to double every new generation, writing efficient parallel applications that can benefit from all these cores, remain a complex and highly technical task. The problem is even more complex for real-time audio applications that require low latencies and thus relatively fine-grained parallelism. In order to facilitate the development of parallel audio application, the FAUST compiler developed at GRAME provides two powerful options to automatically produce parallel code. The first one is based on the OpenMP standard, while the second one uses Posix pthreads directly. This hands on demo will give the audience the opportunity to discover these parallelization facilities, their limits and their benefits, on concrete examples of audio applications.

Pro Audio vs. Consumer Audio: Discussion & Work Group Lennart Poettering

Supercollider for starters Marije Baalman, Jan Trutzschler, Henry Vega, Jan-Kees van Kampen

SuperCollider has been up and running on Linux since shortly after it was released as open source software. In the past two years, it has become really easy to install on Linux, and is packaged in distributions such as PlanetCCRMA and Pure:Dyne. On Linux there is a choice between three editors: emacs, vim and gedit.

Supercollider advanced Marije Baalman, Jan Trutzschler, Henry Vega, Jan-Kees van Kampen

For more advanced users of SuperCollider we offer another workshop covering specific advanced topics. Topics could include:

Input and output to devices or other programs • Patterns (a different paradigm of composing based on musical sequences) • Extending SuperCollider (writing your own classes) • Livecoding with SuperCollider • Using the JITLib •

187 Chuck Kassen Oud

A session on the ChucK programming language. How a more intimate familiarity with the workings behind synthesis can increase our options and how a DIY approach to instruments may force us to think more deeply about what we ourselves want, musically (as opposed to what’s offered commercially).

Using ambisonics as a production format in ardour Jorn¨ Nettingsmeier

The paper describes a pop production in mixed-order Ambisonics, with an ambient sound field recording augmented with spot microphones panned in third order. After a brief introduction to the hard- and software toolchain, a number of miking and blending techniques will be discussed, geared towards the capturing (or faking of) subtle natural ambience and good imaging. I will then describe the expected struggle to make the resulting mix loud enough for commercial use while retaining a natural and pleasant sound stage and as much dynamics as possible.

Making Waves Micz Flor, Adam Thomas

A workshop introducing Campcaster, an open source radio management application for use by both small and large radio stations (yes, real radio stations, not internet radio) to schedule radio shows. Learn the first steps of operating a radio station, including scheduling, live studio broadcast, play-out and even remote automation and networking via the web, all using free software. The workshop will take participants through hardware set-ups, software installations and studio configurations, resulting in the creation of a fully-functioning Linux Audio Conference station.

Creative Commons Bjorn¨ Wijers

Be informed about the options and implications for licensing your work using Creative Commons.

Live Coding with QuteCsound Andres´ Cabrera

This workshop will explore the Live Coding capabilities of QuteCsound for generating score events for a running instance of Csound, through the use of the Live Event Panel and the QuteSheet python API. It will show how the QuteCsound frontend can serve as a python IDE for the processing of note events to be generated, transformed, sent live, or looped for a running Csound process.

188