SoundPaint – Painting Music

Jurgen¨ Reuter Karlsruhe Germany http://www.ipd.uka.de/˜reuter/

Abstract the composition was strongly bound to a very We present a paradigm for synthesizing electronic particular technical setup of electronic devices. music by graphical composing. The problem of map- Consequently, the composer easily became the ping colors to sounds is studied in detail from a only person capable of performing the compo- mathematical as well as a pragmatic point of view. sition, thereby often eliminating the traditional We show how to map colors to sounds in a user- distinction of production stages. At least, new definable, topology preserving manner. We demon- notational concepts were developed to alleviate strate the usefulness of our approach on our proto- the problem of notating . type implementation of a graphical composing tool. The introduction of MIDI in the early 80s was Keywords in some sense a step back to electro-acoustic, electronic music, sound collages, graphical compos- keyed instruments music, since MIDI is based ing, color-to-sound mapping on a chromatic scale and a simple note on/off paradigm. Basically, MIDI supports any instru- 1 Introduction ment that can produce a stream of note on/off Before the advent of electronic music, the west- events on a chromatic scale, like keyed instru- ern music production process was clearly di- ments, wind instruments, and others. Also, vided into three stages: Instrument craftsmen it supports many expressive features of non- designed musical instruments, thereby playing keyed instruments like vibrato, portamento or a key role in sound engineering. Composers breath control. Still, in practice, mostly key- provided music in notational form. Performers boards with their limited expressive capabilities realized the music by applying the notational are used for note entry. form on instruments. The diatonic or chromatic The idea of our work is to break these limi- scale served as commonly agreed interface be- tations in expressivity and tonality. With our tween all participants. The separation of the approach, the composer creates sound collages production process into smaller stages clearly by visually arranging graphical components to has the advantage of reducing the overall com- an image, closely following basic principles of plexity of music creation. Having a standard set graphical notation. While the graphical shapes of instruments also enhances efficiency of com- in the image determine the musical content of posing, since experience from previous compo- the sound collage, the sound itself is controlled sitions can be reused. by color. Since in our approach the mapping The introduction of electro-acoustic instru- from colors to actual sounds is user-definable ments widened the spectrum of available in- for each image, the sound engineering process is struments and sounds, but in principle did not independent from the musical content of the col- change the production process. With the intro- lage. Thus, we resurrect the traditional separa- duction of electronic music in the middle of the tion of sound engineering and composing. The 20th century however, the process changed fun- performance itself is done mechanically by com- damentally. Emphasis shifted from note-level putation, though. Still, the expressive power of composing and harmonics towards sound engi- graphics is straightly translated into musical ex- neering and creating sound collages. As a re- pression. sult, composers started becoming sound engi- The remainder of this paper is organized as neers, taking over the instrument crafts men’s follows: Section 2 gives a short sketch of image- job. Often, a composition could not be no- to-audio transformation. To understand the tated with traditional notation, or, even worse, role of colors in a musical environment, Section 3 presents a short survey on the traditional use sive and thus became seldom. Mozart wrote a of color in music history. Next, we present and manuscript of his horn concert K495 with col- discuss in detail our approach of mapping col- ored note heads, serving as a joke to irritate the ors to sounds (Section 4). Then, we extend our hornist Leutgeb – a good friend of him(Wiese, mapping to aspects beyond pure sound creation 2002). In liturgical music, red color as con- (Section 5). A prototype implementation of our trasted to black color remained playing an ex- approach is presented in Section 6. We already traordinary role by marking sections performed gained first experience with our prototype, as by the priest as contrasted to those performed described in Section 7. Our generic approach is by the community or just as a means of read- open to multiple extensions and enhancements, ability (black notes on red staff lines). Slightly as discussed in Section 8. In Section 9, we com- more deliberate application of color in music pare our approach with recent work in related printings emerged in the 20th century with tech- fields and finally summarize the results of our nological advances in printing techniques: The work (Section 10). advent of electronic music stimulated the devel- opment of graphical notation (cp. e.g. Stock- 2 Graphical Notation Framework hausen’s Studie II (Stockhausen, 1956) for the In order to respect the experience of tradition- first electronic music to be published(Simeone, Wehinger ally trained musicians, our approach tries to 2001)), and uses colors in an au- Ligeti stick to traditional notation as far as possi- ral score(Wehinger, 1970) for ’s Artic- ble. This means, when interpreting an image ulation to differentiate between several classes as sound collage, the horizontal axis represents of sounds. For educational purposes, some au- time, running from the left edge of the image thors use colored note heads in introductory to the right, while the vertical axis denotes the courses into musical notation(Neuh¨auser et al., pitch (frequency) of sounds, with the highest 1974). There is even a method for training ab- pitch located at the top of the image. The verti- solute hearing based on colored notes(Taneda cal pitch ordinate is exponential with respect to and Taneda, 1993). Only very recently, the use the frequency, such that equidistant pitches re- of computer graphics in conjunction with elec- sult in equidistant musical intervals. Each pixel tronic music has led to efforts in formally map- row represents a (generally changing) sound of ping colors to sounds (for a more detailed dis- a particular frequency. Both axes can be scaled cussion, see the Related Work Section 9). by the user with a positive linear factor. The While Wehinger’s aural score is one of the color of each pixel is used to select a sound. The very few notational examples of mapping col- problem of how to map colors to sounds is dis- ors to sounds, music researchers started much cussed later on. earlier to study relationships between musical and aural content. Especially with the upcom- 3 Color in Musical Notation History ing psychological research in the late 19th cen- tury, the synesthetic relationship between hear- The use of color in musical notation has a long ing and viewing was studied more extensively. tradition. We give a short historical survey in Wellek gives a comprehensive overview over order to show the manifold applications of color this field of research(Wellek, 1954), including and provide a sense for the effect of using colors. systems of mapping colors to keys and pitches. Color was perhaps first applied as a purely no- Painters started trying to embed musical struc- tational feature by Guido von Arezzo, who tures into their work (e.g. Klee’s Fugue in invented colored staff lines in the 11th century, Red). Similarly, composers tried to paint im- using yellow and red colors for the do and fa ages, as in Mussorgsky’s Pictures at an Exhi- lines, respectively. During the Ars Nova period bition. In Jazz music, synesthesis is represented (14th century), note heads were printed with by coinciding emotional mood from acoustic black and red color to indicate changes between and visual stimuli, known as the blue notes in binary and ternary meters(Apel, 1962). While blues music. in medieval manuscripts color had been widely applied in complex, colored ornaments, with the 4 Mapping Colors to Sounds new printing techniques rising up in the early 16th century (most notably Petrucci’s Odhe- We now discuss how colors are mapped to caton in 1501), extensive use of colors in printed sounds in our approach. music was hardly feasible or just too expen- For the remainder of this discussion, we define a sound to be a 2π-periodic, continuous func- Unfortunately, there is no mapping between tion s : R 7→ R, t → s(t). This definition meets the function space of 2π-periodic functions and the real-world characteristic of oscillators as the R3 that fulfills all of the three constraints. most usual natural generators of sounds and the Pragmatically, we drop surjectivity in order fact that our ear is trained to recognize periodic to find a mapping that fulfills the other con- signals. Non-periodic natural sources of sounds straints. Indeed, dropping the surjectivity con- such as bells are out of scope of this discus- straint does not hurt too much, if we assume sion. We assume normalization of the periodic that the mapping is user-definable individually function to 2π periodicity in order to abstract for each painting and that a single painting does from a particular frequency. According to this not need to address all possible sounds: rather definition, the set of all possible sounds – the than mapping colors to the full sound space, we sound space – is represented by the set of all let the user select a three-dimensional subspace 2π-periodic functions. S of the full sound space. This approach also Next, we define the color space C following leverages the limitation of our mapping not be- the standard RGB (red, green, blue) model: the ing surjective: since for each painting, a differ- set of colors is defined by a three-dimensional ent sound subspace can be defined by the com- real vector space R3, or, more precisely, a sub- poser, effectively, the whole space of sounds is set thereof: assuming, that the valid range of still addressable, thus retaining surjectivity in a the red, green and blue color components is limited sense. [0.0, 1.0], the color space is the subset of R3 that Dropping the surjectivity constraint, we now is defined by the cube with the edges (0, 0, 0), focus on finding a proper mapping from color (1, 0, 0), (0, 1, 0), and (0, 0, 1). Note that the space to a three-dimensional subset of the sound color space is not a vector space since it is not space. Since we do not want to bother the com- closed with respect to addition and multiplica- poser with mathematics, we just require the ba- tion by scalar. However, this is not an issue sis of a three-dimensional sound space to be de- as long as we do not apply operations that re- fined. This can be achieved by the user sim- sult in vectors outside of the cube. Also note ply defining three different sounds, that span that there are other possibilities to model the a three-dimensional sound space. Given the color space, such as the HSB (hue, saturation, three-dimensional color space C and a three- brightness) model, which we will discuss later. dimensional subspace S of the full sound space, Ideally, for a useful mapping of colors to a bijective, topology preserving mapping can be sounds, we would like to fulfill the following con- easily achieved by a linear mapping via a matrix straints: multiplication,

• Injectivity. Different colors should map M : C 7→ S, x → y = Ax, x ∈ C, y ∈ S (1) to different sounds in order to utilize the color space as much as possible. with A being a 3 × 3 matrix specifying the ac- • Surjectivity. With a painting, we want to tual mapping. In practice, the composer would be able to address as many different sounds not need to specify this vector space homomor- as possible – ideally, all sounds. phism M by explicitly entering some matrix A. Rather, given the three basis vectors of the color • Topology preservation. Most impor- space C, i.e. the colors red, green, and blue, tant, similar colors should map to similar the composer just defines a sound individually sounds. For example, when there is a color for each of these three basis colors. Since each gradation in the painting, it should result other color can be expressed as a linear combi- in a sound gradation. There should be no nation of the three basis colors, the scalars of discontinuity effect in the mapping. Also, this linear combination can be used to linearly we want to avoid noticeable hysteresis ef- combine the three basis sounds that the user fects in order to preserve reproducibility of has defined. the mapping across the painting. • User-definable mapping. The actual 5 Generalizing the Mapping mapping should be user-definable, as re- As excitingly this approach may sound at first, search has shown that there is no general as disillusioning we are thrown back to real- mapping that applies uniquely well to all ity: pure linear combination of sounds results in individual humans. nothing else but cross-fading waveforms, which quickly turns out to be too limited for serious set is not closed with respect to addition and composing. However, what we can still do is multiplication by scalar. By changing the basis to extend the linear combination of sounds onto in R3, the cubic shape of the RGB color space in further parameters that influence the sound in a the first octant generally transforms into a dif- non-linear manner. Most notably, we can apply ferent shape that possibly covers different oc- non-linear features on sounds such as vibrato, tants, thereby changing the valid range of the noise content, resonance, reverb, echo, hall, de- vector components. Therefore, when operating tune, disharmonic content, and others. Still, with a different basis, vectors must be carefully also linear aspects as panning or frequency- checked for correct range. dependent filtering may improve the overall ca- pabilities of the color-to-sound mapping. In 6 SoundPaint Prototype general, any scalar parameter, that represents Implementation some operation which is applicable on arbitrary In order to demonstrate that our approach sounds, can be used for adding new capabilities. works, a prototype has been implemented in Of course, with respect to our topology preser- C++. The code currently runs under Linux, vation constraint, all such parameters should re- using wxWidgets(Roebling et al., 2005) as GUI spect continuity of their effect, i.e. there should library. The GUI of the current implementation no remarkable discontinuity arise when slowly mainly focuses on providing a graphical front- changing such a parameter. end for specifying an image, and parameterizing Again, we do not want to burden the com- and running the transformation process, which poser with explicitly defining a mapping func- synthesizes an audio file from the image file. An tion. Instead, we extend the possibilities of integrated simple audio file player can be used defining the three basis sounds by adding scalar to perform the sound collage after transforma- parameters, e.g. in a graphical user interface tion. by providing sliders in a widget for sound defi- nition. So far, we assumed colors red, green and blue to serve as basis vectors for our color space. More generally, one could allow to accept any three colors, as long as they form a basis of the color space. Changing the basis of the color space can be compensated by adding a basis Figure 1: Mapping Colors to Sounds change matrix to our mapping M: 0 0 M : C’ 7→ S, x → y = AφC0→C x = A x, (2) Currently, only the RGB color space is sup- ported with the three basis vectors red, green, assuming that φC0→C is the basis change matrix and blue. The user defines a color-to-sound that converts x from space C’ to space C. mapping by simply defining three sounds to be Specifically, composers may want to prefer associated with the three basis colors. Figure the HSB model over the RGB model: tradition- 1 shows the color-to-sound mapping dialog. A ally, music is notated with black or colored notes generic type of wave form can be selected from on white paper. An empty, white paper is there- a list of predefined choices and further param- fore naturally associated with silence, while a eterized, as shown in Figure 2 for the type of sheet of paper heavily filled with numerous mu- triangle waves. All parameters that go beyond sical symbols typically reminds of terse music. manipulating the core wave form – namely pan, Probably more important, when mixing colors, vibrato depth and rate, and noise content – are most people think in terms of subtractive rather common to all types of wave forms, such that than additive mixing. Conversion between HSB they can be linearly interpolated between dif- and RGB is just another basis change of the ferent types. Parameters such as the duty cycle color space. however only affect a particular wave form and When changing the basis of the color space, thus need not be present for other types of wave care must be taken with respect to the range forms. of the vector components. As previously men- Some more details of the transformation are tioned, the subset of the R3, that forms the worth mentioning. When applying the core color space, is not a vector space, since the sub- transformation as described in Section 2, the plitude level falls below a threshold value. This threshold value can be controlled via the gate parameter in the Synthesize Options widget. 7 Preliminary Experience SoundPaint was first publically presented in a workshop during the last Stadtgeburtstag (city’s birthday celebrations) of the city Karl- sruhe(Sta, 2004). Roughly 30 random visitors were given the chance to use SoundPaint for a 30 minutes slot. A short introduction was pre- sented to them with emphasis on the basic con- cepts from a composer’s point of view and basic use of the program. They were instructed to paint on black background and keep the paint- ing structurally simple for achieving best re- sults. For the actual process of painting, XPaint (as default) and Gimp (for advanced users) were provided as external programs. Almost all users were immediately able to produce sound collages, some of them with very interesting results. What turned out to be most irritating for many users is the additive interpre- tation of mixed colors. Also, some users started with a dark gray rather than black image back- Figure 2: Parameterizing a Triangle Wave ground, such that SoundPaint’s optimization code for silence regions could not be applied, re- sulting in much slower conversion. These obser- resulting audio file will typically contain many vations strongly suggest to introduce HSB color crackling sounds. These annoying noises arise space in SoundPaint. from sudden color or brightness changes at pixel borders: a sudden change in sound produces 8 Future Work high-frequency peaks. To alleviate these noises, Originally stemming from a command-line tool, pixel borders have to be smoothened along the SoundPaint still focuses on converting image time axis. As a very simple method of anti- files into audio files. SoundPaint’s GUI mostly aliasing, SoundPaint horizontally divides each serves as a convenient interface for specifying image pixel into sub-pixels down to audio reso- conversion parameters. This approach is, from lution and applies a deep path filter along the a software engineering point of view, a good ba- sub-pixels. The filter characteristics can be con- sis for a clean software architecture, and can trolled by the user via the Synthesize Options be easily extended e.g. with scripting purposes widget, ranging from a plain overall sound with in mind. A composer, however, may prefer a clearly noticeable clicks to a smoothened, al- sound collage in a more interactive way rather most reverb-like sound. than creating a painting in an external appli- Best results are achieved when painting only cation and repeatedly converting it into an au- a few colored structures onto the image and dio file in a batch-style manner. Hence, Sound- leaving the keeping the remaining pixels in the Paint undoubtedly would benefit from integrat- color that will produce silence (i.e., in the RGB ing painting facilities into the application itself. model, black). For performance optimization, Going a step further, with embedded paint- it is therefore useful to handle these silent pix- ing facilities, SoundPaint could be extended els separately, rather than computing a complex to support live performances. The performer sound with an amplitude of 0. Since, as an effect would simply paint objects ahead of the cursor of the before mentioned pixel smoothing, often of SoundPaint’s built-in player, assuming that only very few pixels are exactly 0, SoundPaint the image-to-audio conversion can be performed simply assumes an amplitude of 0, if the am- in real-time. For Minimal Music like perfor- mances, the player could be extended to play each of the RGB layers of the image as basis for in loop mode, with integrated painting facili- a transformation. Unfortunately, the relation ties allowing for modifying the painting for the between the image and the resulting sound is next loop. Inserting or deleting multiple objects not at all obvious. following predefined rhythmical patterns with a Coagula(Ekman, 2003) uses a synthesis single action could be a challenging feature. method that can be viewed as a special case Assembling audio files generated from multi- of SoundPaint’s synthesis with a particular set ple images files into a single sound collage is de- of color to sound mappings. Coagula uses a si- sired when the surjectivity of our mapping is an nusoidal synthesis, using x and y coordinates issue. Adding this feature to SoundPaint would as time and frequency axis, respectively. Noise ultimately turn the software into a multi-track content is controlled by the image’s blue color composing tool. Having a multi-track tool, in- layer. Red and green control stereo sound tegration with other notation approaches seems panning. Following Coagula’s documentation, nearby. For example, recent development of SoundPaint should show a very similar behav- LilyPond’s(Nienhuys and Nieuwenhuizen, 2005) ior when assigning 100% noise to blue, and pure GNOME back-end suggests to integrate tradi- sine waves to colors red and green, with setting tional notation in separate tracks into Sound- red color’s pan to left and green color’s pan to Paint. The overall user interface of such a multi- right. track tool finally could look similar to the ar- Just like Coagula, MetaSynth(Wenger and range view of standard sequencer software, but Spiegel, 2005) maps red and green to stereo pan- augmented by graphical notation tracks. ning, while blue is ignored. Small Fish(Furukawa et al., 1999), presented 9 Related Work by the ZKM(ZKM, 2005), is an illustrated book- Graphical notation of music has a rather long let and a CD with 15 art games for control- history. While the idea of graphical compos- ling animated objects on the computer screen. ing as the reverse process is near at hand, prac- Interaction of the objects creates polytonal se- tically usable tools for off-the-shelf computers quences of tones in real-time. Each game de- emerged only recently. The most notably tools fines its own particular rules for creating the are presented below. tone sequences from object interaction. The Maybe was the first one who tone sequences are created as MIDI events and started designing a system for converting im- can be played on any MIDI compliant tone gen- ages into sounds in the 1950’s, but it took him erator. Small Fish focuses on the conversion of decades to present the first implementation of movements of objects into polytonal sequences his UPIC system in 1978(Xenakis, 1978). Like of tones rather than on graphical notation; still, SoundPaint, Xenakis uses the coordinate axes shape and color of the animated objects in some following the metaphor of scores. While Sound- of the games map to particular sounds, thereby Paint uses a pixel-based conversion that can be translating basic concepts of graphical notation applied on any image data, the UPIC system into an animated real-time environment. assumes line drawings with each graphical line The PDP(Schouten, 2004) extension for the being converted into a melody line. (Puckette, 2005) real-time system fol- Makesound(Burrell, 2001) uses the following lows a different approach in that it provides a mapping for a sinusoidal synthesis with noise framework for general image or video data pro- content and optional phase shift: cessing and producing data streams by serializa- x position phase tion of visual data. The resulting data stream y position temporal position can be used as input source for audio process- hue frequency ing. saturation clarity (inv. noise content) Finally, it is worth mentioning that the vi- luminosity intensity (amplitude) sualization of acoustic signals, i.e. the op- In Makesound, each pixel represents a section posite conversion from audio to image or of a sine wave, thereby somewhat following the video, is frequently used in many systems, idea of a spectrogram rather than graphical no- among them Winamp(Nullsoft, 2004) and tation. Color has no effect on the wave shape /MSP/Jitter(Cycling ’74, 2005). Still, itself. these species of visualization, which are of- EE/CS 107b(Suen, 2004) uses a 2D FFT of ten implemented as real-time systems, typically work on the audio signal level rather than on Han-Wen Nienhuys and Jan Nieuwenhuizen. the level of musical structures. 2005. LilyPond, music notation for everyone. URL: http://lilypond.org/. 10 Conclusions Nullsoft. 2004. Winamp. URL: We presented SoundPaint, a tool for creating http://www.winamp.com/. sound collages based on transforming image . 2005. Pure Data. URL: data into audio data. The transformation fol- http://www.puredata.org/. lows to some extent the idea of graphical no- Robert Roebling, Vadim Zeitlin, Stefan Cso- tation, using x and y axis for time and pitch, mor, Julian Smart, Vaclav Slavik, and respectively. We showed how to deal with the Robin Dunn. 2005. wxwidgets. URL: color-to-sound mapping problem by introduc- http://www.wxwidgets.org/. ing a vector space homomorphism between color Tom Schouten. 2004. Pure Data Packet. URL: space and sound subspace. Our tool mostly http://zwizwa.fartit.com/pd/pdp/ hides mathematical details of the transforma- overview.html. tion from the user without imposing restric- Nigel Simeone. 2001. Universal edition history. tions in the choice of parameterizing the trans- 2004. Stadtgeburtstag Karlsruhe, June. URL: formation. First experience with random users http://www.stadtgeburtstag.de/. during the city’s birthday celebrations demon- Karl-Heinz Stockhausen. 1956. Studie II. strated the usefulness of our tool. The re- Jessie Suen. 2004. EE/CS 107b. URL: sult of our work is available as open source at http://www.its.caltech.edu/˜chia/EE107/. http://www.ipd.uka.de/~reuter/ Naoyuki Taneda and Ruth Taneda. 1993. soundpaint/. Erziehung zum absoluten Geh¨or. Ein neuer Weg am Klavier. Edition Schott, 7894. B. 11 Acknowledgments Schott’s S¨ohne, Mainz, Germany. Rainer Wehinger. 1970. Ligeti, Gyorgy: Ar- The author would like to thank the Faculty of ticulation. An aural score by Rainer We- Computer Science of the University of Karl- hinger. Edition Schott, 6378. B. Schott’s sruhe for providing the infrastructure for devel- S¨ohne, Mainz, Germany. oping the SoundPaint software, and the depart- MGG – ment for technical infrastructure (ATIS) and Albert Wellek. 1954. Farbenh¨oren. Musik in Geschichte und Gegenwart Tatjana Rauch for their valuable help in or- , 4:1804– ganizing and conducting the workshop at the 1811. city’s birthday celebrations. Eric Wenger and Edward Spiegel. 2005. Methasynth 4, January. URL: References http://www.uisoftware.com/ DOCS PUBLIC/MS4 Tutorials.pdf. Willi Apel. 1962. Die Notation der polyphonen Henrik Wiese. 2002. Preface to Concert for Musik 900-1600. Breitkopf & H¨artel, Wies- Horn and Orchestra No. 4, E flat ma- baden. jor, K495. Edition Henle, HN 704. G. Michael Burrell. 2001. Makesound, June. URL: Henle Verlag, Munc¨ hen, Germany. URL: ftp://mikpos.dyndns.org/pub/src/. http://www.henle.de/katalog/ Cycling ’74. 2005. Max/MSP/Jitter. URL: Vorwort/0704.pdf. http://www.cycling74.com/. Iannis Xenakis. 1978. The UPIC system. URL: Rasmus Ekman. 2003. Coagula. URL: http://membres.lycos.fr/musicand/ http://hem.passagen.se/rasmuse/Coagula.htm. INSTRUMENT/DIGITAL/UPIC/UPIC.htm. Kiyoshi Furukawa, Masaki Fujihata, and Wolf- 2005. Zentrum fur¨ Kunst und Medientechnolo- gang Munc¨ h. 1999. Small fish: Kammer- gie. URL: http://www.zkm.de/. musik mit Bildern fur¨ Computer und Spieler, volume 3 of Digital arts edition. Cantz, Ost- fildern, Germany. 56 S. : Ill. + CD-ROM. Meinolf Neuh¨auser, Hans Sabel, and Richard Rudolf Klein. 1974. Bunte Za- ubernoten. Schulwerk fur¨ den ganzheitlichen Musikunterricht in der Grundschule. Di- esterweg, Frankfurt am Main, Germany.