Computer Graphics Technical Reports

CG-2007-4

Improving the Aesthetic Quality of Realtime Motion Data Sonification

Christoph Henkelmann Computer Science Dept. II, University of Bonn, R¨omerstr. 164, D-53117 Bonn, Germany [email protected]

Institut f¨urInformatik II Universit¨at Bonn D-53117 Bonn, Germany

c Universit¨atBonn 2007 ISSN 1610-8892 ii CONTENTS

1. Introduction ...... 1 1.1 Previous work ...... 3 1.2 Overview ...... 5 1.3 Used software ...... 6

2. The Basics ...... 9 2.1 Definitions and Human Hearing ...... 9 2.1.1 Data and Audio Streams ...... 9 2.1.2 Frequency and Pitch ...... 10 2.1.3 Measures of Amplitude ...... 11 2.1.4 Timbre ...... 12 2.1.5 Masking and Psychoacoustics ...... 13 2.2 Sonification ...... 13 2.2.1 Audification ...... 13 2.2.2 Earcons and Auditory Icons ...... 14 2.2.3 Audio Beacons ...... 15 2.2.4 Model-based Sonification ...... 15 2.2.5 Parameter Mapping Sonification ...... 15 2.3 Summary ...... 16

3. Implementation of Realtime Sonification ...... 17 3.1 Sonification and MIDI ...... 17 3.2 Sonification and OSC ...... 20 3.3 Choosing an Appropriate Sonification Tool ...... 21 3.4 How Pd Works ...... 25 3.5 Using Pd as a Sonification Tool ...... 29 3.5.1 Loading and Saving Settings ...... 29 3.5.2 Scaling of Motion Data ...... 33 3.5.3 Audio Utilities ...... 34 3.6 Making Motion Data Available in Pd ...... 34 3.6.1 Input from Files ...... 34 3.6.2 Input via TCP/IP ...... 35 iv Contents

3.6.3 Input from Rowing Machine sensors ...... 36 3.7 Summary ...... 38

4. Continuous Parameter Mapping Sonification ...... 39 4.1 Audio Artifacts ...... 40 4.1.1 Zipper Noise ...... 40 4.1.2 Foldover ...... 42 4.2 Modulation of Pitch ...... 43 4.2.1 Formant Shift ...... 43 4.2.2 Problems with Musical Perception ...... 48 4.3 Modulation of Amplitude ...... 50 4.4 Modulation of Timbre ...... 51 4.4.1 Subtractive Synthesis ...... 52 4.4.2 Waveshaping and Frequency Modulation ...... 53 4.4.3 Formants & Vocal Sounds ...... 61 4.4.4 The Tristimulus Model ...... 64 4.5 Spatial Positioning ...... 70 4.6 Maintaining a Constant Amplitude ...... 71 4.7 Summary ...... 72

5. Applications ...... 75 5.1 Rowing Motion Sonifications ...... 76 5.2 Walking Motion Sonifications ...... 77

6. A More Musical Approach ...... 81 6.1 A “Musical” Sonification? ...... 81 6.2 Applying Sonification to Paradigms of Western European 82 6.3 A Melodic Sonification ...... 84 6.4 Re-introducing Fine Grained Parameter Mapping ...... 86 6.5 Creating Harmonic Progress ...... 86 6.6 Summary ...... 88

7. Further Prospects ...... 91 7.1 MotionLab and OSC ...... 91 7.2 More Sound Based Methods ...... 92 7.3 Sophisticated Sound Design ...... 92 7.4 More General Methods for Audio Rendering ...... 93 7.5 Need for Psychoacoustic Evaluation ...... 94 7.6 Conclusion ...... 94 Contents v

Appendix 95

A. List of Pd abstractions and externals ...... 97 A.1 arpeggiator ...... 97 A.2 arpeggiator scale ...... 98 A.3 channel ...... 98 A.4 clipping ...... 98 A.5 crossfading loop sampler ...... 98 A.6 derivative ...... 99 A.7 ergometer ...... 99 A.8 ergometer input ...... 100 A.9 file input ...... 100 A.10 floatmap ...... 100 A.11 fm1 ...... 101 A.12 hold note ...... 101 A.13 master ...... 101 A.14 median ...... 102 A.15 midi channel ...... 102 A.16 norm mapping ...... 102 A.17 paf ...... 103 A.18 paf vowel ...... 103 A.19 pink noise ...... 103 A.20 reverb ...... 104 A.21 sampleloop ...... 104 A.22 sampleloop filter ...... 104 A.23 settings ...... 104 A.24 sine pitch ...... 105 A.25 sonify bend ...... 105 A.26 sonify control ...... 105 A.27 sonify note cont ...... 105 A.28 sonify note dis ...... 106 A.29 sonify scale ...... 106 A.30 ssymbol ...... 107 A.31 subtractive1 ...... 107 A.32 subtractive2 ...... 107 A.33 subtractive3 ...... 107 A.34 subtractive4 ...... 108 A.35 subtractive5 ...... 108 A.36 sv ...... 108 A.37 tristimulus ...... 108 A.38 tristimulus model ...... 109 vi Contents

A.39 waveshaping1 ...... 109

B. Pitch Ranges ...... 111

C. Audio Examples ...... 113

Bibliography ...... 123 1. INTRODUCTION

Everybody is familiar with the subject of visualization. We all have seen numerous charts, diagrams, icons and other kinds of abstract graphic repre- sentations of information. If we see a chart relating e.g., income to taxes, or age to cancer risk, we immediately have an understanding of the structure of the corresponding dataset. We learn reading such charts, graphs, etc. from as early on as elementary school. Man being a mainly visual being, it is obvious that a visual representation is the first idea that springs to mind if faced with the task of conveying information (or datasets from which the recipient is to deduce said information). Representing data visually however has its drawbacks. For some appli- cations, another sense than vision is far more useful for monitoring data: hearing. If we use sound instead of images as our medium for representing data, we speak of sonification instead of visualization. That way, sonification can be thought of as the audio equivalent of visualization. Instead of creating graphs and charts of our data, we map the data to audio sample streams. This has certain advantages over a visual display:

• Free movement: When we watch data, we always have to keep our eyes on the screen (or another output device). If data is presented to us as sound, we can move freely about.

• Background monitoring: When some quantity is permanently super- vised aurally, we can focus our attention on other tasks, as long as the changes in the data create a monotone feedback. But as soon as the data changes drastically (which should be represented in a drastic change in sound), the sonification automatically comes to our attention.

• Temporal resolution: The temporal resolution of hearing is about twice as high (20-30 ms) as the temporal resolution of vision (50-60 ms). When we take spatial location into account, human hearing can differ- entiate time intervals of down to 1 ms![Warren, 1993]

• High dynamic ranges: We hear over a large range of amplitudes and pitches which allows for a high resolution of presentation. 2 1. Introduction

Fig. 1.1: Realtime sonification of motion data illustrated. Certain quantities of a motion (in this case rowing) are constantly measured. This constant stream of data is fed into a tool (in this case Pure Data). This tool creates constant audio feedback to which the subject performing the mo- tion listens to. This sonification should improve his perception of the movement. This in turn influences the way he executes that movement. 1.1. Previous work 3

• Supplementary information: Audio feedback can be used in combina- tion with classical visualization.

• New Impulses: As hearing is simply different than vision, a sonification may convey new structures in data that were not detected by visualizing the dataset.

Especially the aspects of free movement and better temporal resolution make sonification interesting when the dataset in question is created by mea- surements of human movement. These measurements could be positions, velocities, forces and quantities derived from these measurements. The soni- fications could provide additional feedback for improving motion sequences in professional sports or medical therapy. Obviously, realtime feedback in these applications must not hinder the subject in his or her mobility. The better temporal resolution of hearing over vision is useful as exact timing of movements is crucial in sports. Also, we are used to get audio feedback from our movements anyway and are used to react to that (the sound of footsteps, the rustle of clothes, the impact of a ball...).

1.1 Previous work

Sonification, though still not a widespread method, has already been applied to a number of data exploration tasks. Jameson [1994] uses Sonification to gain insight into the workflow of a program and debug it. In the field of geology, sonification is already a common tool [Hayward, 1994]. Ekdale and Tripp [2005] use sonification for the classification of fossils. Other applica- tions include the sonification of meteorological data [Bylund and Cole, 2001], the aural supervision of health measurements [Fitch and Kramer, 1994] and the navigation of maps for the visually impaired [Zhao et al., 2005]. Effenberg [1996] describes in detail the motivation for using sonification in the field of human movement. Taking into account various results from perceptual psychology and sports science, he assumes a positive effect on motion learning using sonifications derived from motion data. Especially with respect to temporal perception, he considers audio representations of movements promising. This assumption is further undermined by compara- tive analysis of the results of numerous studies using acoustic feedback for sports movements. In [Effenberg, 2004], Effenberg analysis amongst others the effectiveness of identifying motion patterns according to their sonifica- tions. Effenberg [2005] gives empirical results for identifying and reproducing the height of a jump using only visual and combined audiovisual feedback. 4 1. Introduction

The use of sonification in sports lead to a cooperation of the Institut f¨ur Sportwissenschaft und Sport and department II of the Institut f¨urInformatik at the university of Bonn. One of the results of this cooperation is described in Melzer [2005]. Melzer presents an expansion to the MotionLab Software1 that is able to sonify various parameters of a motion sequence via param- eter mapping sonification (see section 2.2.5). The data streams are turned into streams of MIDI Messages that are sent to the built-in General MIDI Synthesizer of . Data streams can be either mapped to the pitch or amplitude of a certain MIDI Channel, arbitrary controller data cannot be sent. An short summary of this module can be found in [Effenberg et al., 2005]. This expansion module was then implemented as a standalone applica- tion that received the data streams via TCP/IP. This enabled the sonification module to be used with arbitrary datasources, not only the ones made avail- able by the MotionLab software. In a further experiment a rowing machine equipped with sensors was used as the data source to this standalone module. In order to achieve this, the controller software that read the data created by the rowing machine’s sensors from the USB port was adapted to transmit this data via TCP/IP. The four measured quantities of a subject’s rowing motion where the position of the lever, the traction applied to the lever, position of the seat and the force on the footrest. These sonifications, though conveying details of the rowing motion not easily to be discerned by just watching the motions of the person using the rowing machine, were far from aesthetically pleasing. As all four parameters where mapped to continuous pitch, the resulting mix was unaesthetic and stressing to perceive. Annoying sound quality is a common problem in soni- fication. After analyzing and comparing several previous results, Effenberg [1996] infers a number of demands for further studies, of which the sixth is of major importance for this work:

“Es muß eine Soundqualit¨at realisiert werden, die individuell als angenehm empfunden wird. Wenn m¨oglich, ist dem Musikge- schmack der Zielpersonen Rechnung zu tragen.”2 [Effenberg, 1996, chap. II, p. 113]

1 MotionLab is a tool for the analysis and processing of motion capture data used internally at the university of Bonn. 2 “A sound quality must be realized which is individually perceived as pleasant. If possible, the targeted person’s musical taste has to be accommodated for.” (Translation by the author) 1.2. Overview 5

A common problem indeed. Kramer [1994, chap. 11.3, p. 52] reports:

“..., it is a familiar experience of people working in AD [note by the author: Auditory Display] that a sonification will be running and it becomes sufficiently annoying that we just turn it off to take a break.”

He elaborates this further:

“Gaver relates that SonicFinder was frequently disabled, My- natt reports that poorly designed sounds degraded Mercator, and Kramer considers some of his sonification experiments downright ugly.”

Obviously, more aesthetically acceptable audio results are a necessity if sonification is to gain a wide acceptance as a common tool in data explo- ration. In the context of sonifying motion data this is even more pressing. If sonifications are to be used as regular tools in rehabilitation for physically impaired persons, or as an additional training aid for athletes, audio results have to be at least bearable if not enjoyable on a regular basis. The aim of this work is to research methods for the sonification of realtime motion data streams that are less annoying and yet deliver the same amount of detail as the preceding results. Ideally they should be “pleasant” to listen to.

1.2 Overview

In an attempt to achieve said improved results, we will first compile some background on human hearing and the formulas needed to deal with quan- tities like pitch and amplitude formally. Following that we will have a closer look at the field of sonification. We will review various methods for sonifica- tion and examine them as to their applicability for the realtime sonification of motion data. Chapter 3 contains the description of the technical aspects of this work. The MIDI standard will be reviewed critically and the OpenSound Control (OSC) protocol will be presented as an alternative to MIDI. We will have a look at various frameworks for audio generation and discuss the reasons for choosing Pure Data (Pd) for the present work. The following introduction to Pd will help understanding later sections. The problems and limitations when working with Pd and their solutions are also discussed. The review of the various sound generation methods tested with Pd will be the focus of chapter 4. First, improvements to the pitch based approach are 6 1. Introduction

presented. Following that, we give an application of amplitude modulation. Methods that create varying timbres are the main focus of that chapter. In chapter 5 we will use the previously described methods to create sonifi- cations of a rowing and walking motion. The motivation for the combinations of audio generation methods and their settings as well as a brief subjective evaluation of the audible result will be given. The applicability of melodic and harmonic structures to sonification is the subject of chapter 6. After discussing the problems when trying to create “true music” out of arbitrary data sets and motion data in particular, we will describe a method for creating sonifications with harmonic and melodic content. Those will be accompanied by simple reference implementations to give a first idea of the possibilities of this approach. The final chapter gives a summary of the overall results. This summary is accompanied by suggestions for future work based on the results. Appendix A lists the Pd modules created for the previous chapters. It serves as a technical manual for people who wish to implement their own sonifications with the presented toolset. For a better understanding of pitch ranges, appendix B summarizes the pitch range of often used singing voices and instruments with respect to note names, MIDI pitch numbers and frequencies. This comes in handy when trying to determine the appropriate pitch for a certain orchestra sample or choosing the intervals between multiple audio generation methods with a discernible pitch. Many arguments in this work are illustrated with audio examples that can be found on the accompanying CD. The audio examples are referenced by an Example 99: icon and track number on the margin of the page. A more detailed description References to audio for each of the examples can be found in appendix C. The examples are also examples look like this. available online. The URL for each example is given in the respective section in the appendix. The online versions of the audio examples are encoded in the format. This codec is shipped with almost every distribution. For other platforms (Mac OS, Windows, OS/2 and PocketPC) please go to http://www.vorbis.com/ to download the codec, if necessary. Many popular Players such as WinAmp3 also support Ogg Vorbis out-of-the- box.

1.3 Used software

A number of programs was used for the present work. The main software with which the sonifications where realized was of course Pd. The spectro-

3 http://www.winamp.com/ 1.3. Used software 7 grams and oscilloscope graphics where done with Baudline 4, an excellent and very powerful . This software in combination with the JACK realtime audio server 5 under Linux made an excellent testing and bug tracking framework. A realtime spectrum analyzer is much recom- mendend when developing sonifications. Post processing of audio examples and samples for sonifications was done using Audacity 6. All this software is available free of charge. Except for Baudline, which is closed source, they are all available under some kind of open-source license.

4 http://www.baudline.com/ 5 http://jackaudio.org/ 6 http://audacity.sourceforge.net/ 8 1. Introduction 2. THE BASICS

In this chapter we will discuss fundamentals needed for the understanding of the following chapters. We will first define the terms needed to discuss the audible results of this work. Then we will have a second look at the field of sonification to place our results in a greater context. We will see that the kind of sonification that applies to our problem at hand is parameter based sonification.

2.1 Definitions and Human Hearing

The way the human ear perceives sound is complex, the perceived quantities do not map exactly to their scientific counterparts. In order to arrive at usable definitions for the phenomena pitch, amplitude and timbre, we will have to simplify matters. We now give the formulas we use throughout this work which we use to describe those quantities.

2.1.1 Data and Audio Streams As we want to turn data into sound, we first need two basic notions to talk about sonification: data streams and audio streams. A data stream x contains the data we want to turn into sound, an audio stream s is the result of our sonification. We use the term stream to emphasize the realtime nature of the process. Each of our data samples x[t] can be a vector (our rowing motion data for example consists of four values for each sample), that is: x[t] ∈ Kd where d is the number of measurement values for each sample and K is the underlying set from which these measurements are taken.1 Our stream x can be regarded as a mapping x : N → Kd, i.e.: a discrete time index is mapped to a tuple of measurement values. Individual measurements of a tuple are indicated by an index: xi[t] is the i-th value of the t-th sample. The resulting audio stream s[t] of our sonification is an audio signal ready to be sent to a Digital-to-Analogue-Converter (DAC). The stream s is also

1 Throughout this work K will be the real numbers R, except for the short overview of Auditory Icons and Earcons in section 2.2.2. 10 2. The Basics

parameterized by a discrete time time index, t. Each audio sample s[t] lies in Rc, where c is the number of output channels. We mainly consider c = 1, i.e. monaural audio streams. A sonification f is now a function mapping a data stream to an audio stream: d c f :(N → K ) → (N → R ) Up to now we have omitted an important detail here: we silently assumed that our data stream x is sampled with the same frequency as our sample stream s. This means that the measurement x[t1] corresponds to the same point in time as our audio sample s[t1]. In most cases however, the audio stream has a much higher frequency (often 44100 Hz or higher) than our motion data streams (the rowing data is measured with a frequency of 100 Hz). For all our following considerations we will assume that our motion data stream was upsampled by sample repetition, so indexing x and s with t1 will have results corresponding to the same point in time. More formally, speaking, we assume the samples for x and s are taken at time intervals δx and δs, with δx ≥ δs. Assume further thatx ˜ is the original data stream without sample repetition. Then we define the data stream used for sonification as follows:

x[t] :=x ˜[max(tδ˜ x ≤ tδs)] t˜ Sometimes we will regard data and audio streams as continuous functions over R instead of N for simplicity. We will denote this with round brackets, e.g. s(t) instead of s[t].

2.1.2 Frequency and Pitch The human ear can hear periodic signals with a frequency between 20 Hz and 20,000 Hz. Frequency is detected on a logarithmic scale: the change from 100 Hz to 200 Hz is perceived as equally large as the change from 200 Example 1: Hz to 400 Hz. So we will mostly talk about pitch, not frequency, which is Linearly rising pitch measured on a logarithmic scale. We will measure pitch in MIDI note values, similar to [Puckette, 2006]. MIDI note value 69 is mapped to concert pitch A4 (440 Hz)2. As MIDI is Example 2: rooted in the western European musical tradition, note values are measured Exponentially in semitones and an equally tempered scale is assumed. Thus, a raise in Rising Pitch pitch of one octave (which consists of twelve semitones) leads to a note value twelve MIDI units higher. A raise of pitch by one octave means multiplying frequency by two. So, if we want to raise pitch by one semitone, we have to

2 For an overview of note names, MIDI note values and frequencies, see appendix B 2.1. Definitions and Human Hearing 11

√ multiply frequency with 12 2. Assuming the aforementioned concert pitch of 440 Hz for note value 69 this leads to the following formula for converting pitch to frequency: √ (p−69) mtof(p) = 440 ∗ 12 2 Hz Inverting this function gives the transformation of frequency back to MIDI note values:

ftom(f) = 69 + 12 ∗ log2(f/440Hz) In the case of a non-periodic signal the notion of frequency makes no sense. However, we are still able to recognize a pitch in many non-periodic signals. Imagine for example a slowly frequency modulated periodic signal. The result of the modulation is not periodic anymore, yet we can still recognize a pitch at every instant. Thus the definition of an instantaneous frequency inst is very helpful. 1 dp(t) inst(f(t)) = 2π dt Note that the result of inst is not a constant like ω but a function itself. This function gives us the instantaneous frequency of f at each instant t. It should be noted that the perceived pitch of a sound does not only depend on the instantaneous frequency of the sound but also on its timbre. [Gockel, 1996, chp. 3.3] For simplicity’s sake we will ignore this property of the human ear in our further considerations.

2.1.3 Measures of Amplitude In the context of hearing and sound we will need to talk about the loudness of an audio signal. This is closely related to the physical measure of amplitude. Example 3: The peak amplitude of a discrete signal s[t] is simply the maximum of its Comparison of Peak absolute values: and RMS Amplitude Apeak{s[t]} = max{|s[t]|} For the continuous case the peak amplitude of a p-periodic function s : R → R is simply:

Apeak(s(x)) = max (|s(x)|) x∈[0,p] More important from a perceptual standpoint is the so-called Root Mean Square (RMS) amplitude: v u M+N u 1 X A {s[M]} = t s[i]2 (2.1) RMS N i=M 12 2. The Basics

where M is the starting point of a sample window of size N. For the contin- uous case of a p-periodic function s : R → R the RMS amplitude is defined as: s Z p 1 2 ARMS(s(x)) = s(x) dx (2.2) p 0 The RMS amplitude closely resembles the way loudness is perceived by the human ear. Again this is somewhat of a simplification: loudness also Example 4: varies depending on the energy distribution of the spectrum of the sound, Comparison of as hearing is more sensitive in some frequency ranges than in others. The signals with 3 constant amplitude measure for this is sone, which takes an average hearing curve deduced from but different pitch psychoacoustic experiments into account. Again we omit this detail, as this curve has relatively little variations over the frequency range used by most instruments. As this curve varies from person to person, even amplitudes in sone are not perfect quantities for measuring perceived loudness.

2.1.4 Timbre

Timbre is often described as that quantity of a sound that is not described by pitch and loudness. This is of course not a definition with which one can work formally. So we will adopt the common notion of putting the timbre of a sound to a level with its spectrum. As usual, we will describe this spectrum using Fourier analysis. This will be sufficient for our uses, though it must be noted that the definition of the timbre of a sound by it’s spectrum is not perfectly accurate. In practice, the spectrum of a sound must be calculated using a Fast Fourier Transformation (FFT) which may give different results depending on the window size used in the transformation. Also, very fast changes in sound, as they often occur in the attack phase (the initial phase of a sound) are hard to describe in terms of an FFT. They are sometimes shorter than the window size of the transformation which means that the FFT would report a constant “timbre” during that phase. Yet we distinctly hear a change in the sound quality. But as the data streams and mappings applied to those streams used to create sonifications throughout this work do not create changes that fast, we still arrive at usable results using the spectrum of a sound as its definition of timbre. One must keep in mind however, that those fast changes in the attack phase of a sound, the transients can have a strong impact on the perceived timbre.

3 Mapping of frequency to perceived loudness of a constant amplitude test signal, mostly a sinusoid. 2.2. Sonification 13

2.1.5 Masking and Psychoacoustics

The human ear is also subject to a number of psychoacoustic effects. The most prominent is masking. Masking is the effect that a sound is “covered” by another sound. The most obvious masking effect is that of a very loud Example 5: sound causing a simultaneous silent sound not to be heard. This effect is Frequency Masking also subject to the spectrum of the two sounds. It is not even necessary for a loud sound to be played simultaneously to a silent sound. Loud sounds may even mask silent sounds directly following it. The subject of psychoacoustics is a complex one. Numerous effects are not comprehended completely and are still subject to intensive study. When Example 6: Temporal Masking creating sonifications one must however be aware that such effects exist. Masking and other psychoacoustic effects did not prove to be a serious issue for the sonifications described in this work. It was mainly encountered when using amplitude as target for the sonification mappings. General rules of thumb to limit these effects are given in section 4.3.

2.2 Sonification

Kramer et al. define sonification as “the use of nonspeech audio to convey information” [Kramer et al., 1999, p.1]. We will now have a look at var- ious general methods for sonification. This will help to define the task at hand more precisely and put it into relation to other works in the field of sonification. This differentiation into the various techniques is loosely based on Kramer [1994], where it can be found in greater detail. In this work we are mainly interested in the applicability of the various techniques for the realtime sonification of motion data and review them under that aspect.

2.2.1 Audification

In Audification, the values of a datastream are directly interpreted as audio samples. A good example for this can be found in [Hayward, 1994]. Hayward describes the use of audification in geology, where geophysicists play back the results of seismic measurements as audio data “to infer the characteristics of the rocks along the energy’s travel path or to learn the characteristics of the energy source” [Hayward, 1994, p.373]. More formally speaking, if a data stream x[n] is given, then s[t] := x[t], that is, each data measurement is interpreted directly as an audio sample. This of cause places a constraint on the data streams that can be used for audification: The data stream must be pseudo periodic at frequencies be- 14 2. The Basics tween 20 and 20000 Hz. Otherwise the human ear simply cannot perceive anything. To bypass this problem, the data can of course simply be played back at a higher speed, to move the frequencies present in the data into the audible spectrum. To adjust amplitude the data stream may also be scaled. The data we are dealing with is available at sampling rates of about 100 Hz which is far to low to be used directly as audio data. Furthermore the frequencies present in the data are created by human motion and are way below the lower frequency threshold of human hearing (a rowing motion has a frequency of roughly 1 Hz). Playback at higher speed is obviously no choice as we want realtime feedback. This makes audification unusable for our needs.

2.2.2 Earcons and Auditory Icons Earcons and Auditory Icons are sound events that are used to present infor- mation about the state of a computer, mainly as feedback to user actions. Brewster et al. [1994, p.473] define Earcons as “abstract, synthetic tones that can be used in structured combinations to create sound messages to represent parts of an interface”. Basically earcons map events in a computer system (file operations, user notifications, etc.) to acoustic events. The difference between Earcons and Auditory icons is mainly that Ear- cons are synthetic sounds that form certain hierarchies and use melodic and rhythmical patterns. The structure of Earcons follows rules that allow to deduce information from the sound, but the understanding of these sounds has to be learned. Auditory Icons are supposed to resemble natural sounds so the meaning of the event triggering the sound is reflected in the sound structure.

“Auditory icons are everyday sounds that convey information about events in the computer or in remote environments by anal- ogy with everyday sound-producing events.”[Gaver, 1994, p.417]

Earcons and Auditory Icons are not bound to a certain audio generation method. The main idea is the organization and mapping of sound events. The sound creation methods used can also be applied in other sonification contexts. Auditory Icons and Earcons -as their intended application is com- puter interfaces- are discrete in nature and are not suitablefor continuous sonifications as they are necessary for human motion. Though Earcons and Auditory Icons could be used to mark certain singular events (see next sec- tion), they too where not considered in the present context. 2.2. Sonification 15

2.2.3 Audio Beacons Audio Beacons are similar to Earcons and Auditory Icons as they too are discrete, singular events. An Audio Beacon is an acoustic signal added to the sonification result to provide the user with a reference point for the audio events created by the datastream. A metronome for example could be viewed as an Audio Beacon providing a time reference. Audio Beacons are not a sonification technique in themselves per se, they rather serve as references for the actual sonification. Beacons can also be used to signal a certain feature in the data, like a zero crossing or an extremum. We will not employ audio beacons in this work. In chapter 6 we will do something similar by emphasizing a change in direction of the rowing motion by a change in underlying harmonics. This is however a special case of Parameter Based Sonification.

2.2.4 Model-based Sonification Hermann [2002] introduces model-based sonification. Model-based sonifica- tions are sound renderings derived from dynamic systems which are para- meterized by the data set to be sonified. The user then interacts with the parameterized model which emits sounds according to the users input and the underlying dataset thus creating audible feedback. Its main purpose is exploration of high dimensional data sets which are difficult to represent graphically. By interacting with the model parameterized by the data set, the user is to gain insight in the structure of the underlying data. Assuming an n-dimensional dataset of m points in Kn, the elements of the data stream x[i] are said points ∈ Kn. We now do not have a mapping m x[n] → s[n] but a mapping M : Kn × I → Rd, where M is said model, K is the set underlying the data points and I is the set of possible user input to the model. The models are processes like iterative clustering of data points into vir- tual “crystals” which are used as a basis to sound synthesis. Though model- based sonification is an interesting and powerful technique, it is not applicable to realtime sonification, as it works on a complete unchanging dataset and needs interaction with the user to create sound. As the user is already busy performing the motion he wants to sonify, he cannot interact with the model he creates with his motion data.

2.2.5 Parameter Mapping Sonification Parameter Mapping Sonification is the most general of the sonification cat- egories presented here. Kramer [1994] uses this term for sonifications where 16 2. The Basics a data stream is constantly mapped to the parameter of a sound generation method. Audification can be regarded as a simple case of Parameter Based Sonification where the sound generation method is simply the identity map- ping. A similar interpretation can be constructed for Earcons and Auditory Icons. The main idea is however that a “true” Parameter Mapping Sonifica- tion is able to map a (piecewise) continuous stream to the parameter of an audio creation method that creates a change in sound (be it pitch, ampli- tude or timbre) that reflects the change in the data. The intuition is that of “painting” an “audio graph”. The challenge here is finding a suitable method and mapping for a given data stream. Obviously this approach is a good choice for our objective. Parameter Mapping Sonification can be applied in real time (though this is not a ne- cessity). We want to give a recipient feedback about his motion, ideally the change in sound should be analogous to the change in the data stream. Rhythms and Patterns in the sound should stem from rhythms and patterns in the data.

2.3 Summary

We have now a small set of useful definitions at hand for our discussions in the following chapters. We have seen that quantifying human perception is not a trivial task. The given definitions for pitch, loudness and timbre are only approximations of human perception. More accurate measures (like sone) have been hinted at. We will nevertheless stick with the given definitions, as they provide a fair compromise between usability and accuracy. The discussion of general paradigms of sonification helped us single out a general sonification approach to start with. We will see that simple pa- rameter mapping approaches give us promising results. By developing more sophisticated mappings and modulation targets we will expand this approach to even create structures of a simple musical nature (chapter 6). 3. IMPLEMENTATION OF REALTIME SONIFICATION

This chapter explains the implementation of the sonifications presented here. The MIDI standard is evaluated with respect to sonification. Then the rea- sons for choosing Pd as a sonification tool are reviewed, followed by a brief overview of Pd and Pd programming. Finally the most important Pd mod- ules that were built to work conveniently when creating sonifications are presented. 1

3.1 Sonification and MIDI

MIDI (The “Musical Information Data Interface”) is a protocol and hardware standard to communicate input to a musical controller (e.g. a keyboard). As such, it can of course be used to transport the results of sonification in the form of control data for a sound rendering device (some software or hardware synthesizer). The pros and cons of the MIDI interface and protocol standard are constant topic for discussion and evaluation in a multitude of contexts, however it is still necessary to have a closer look at probable issues when using MIDI as a medium for realtime sonification. MIDI uses so called messages to transport data. Every message is a short package of data that can be sent in realtime or stored in a MIDI file (together with a timestamp) for later playback. MIDI messages are split into three groups: Channel Voice Messages, System Common Messages and System Realtime Messages. Only the first group is of interest for the present discussion, as it transports the actual “musical” information. The two other groups deal with “administrative” tasks like communicating timecodes, se- quencer information and vendor specific information. The following table gives an overview over Channel Voice Messages and explains their proper- ties:

1 For a complete list of Pd objects created see appendix A 18 3. Implementation of Realtime Sonification

Name Description Data resolution Note On Begin of a note 7 Bits for the note number, 7 Bits for the velocity of the key press Note Off End of a note 7 Bits for the note number Control Change Sound parameter 7 Bits for the control changed number, 7 Bits for the control value Pitch Bend Change of Pitch 14 Bits for the inten- sity of pitch change Poly Pressure Pressure intensity of 7 Bits for the key num- a pressed key changed ber, 7 Bits for the (affects a single note) pressure intensity Channel Pressure Pressure intensity of 7 Bits for the pressure a pressed key changed intensity (affects all notes) Program Change The sound program is 7 Bits for the program changed number

We will only discuss the first four, because the Program Change message is not meant for sound modulation and the Pressure messages are ignored by most devices anyway and have otherwise the same properties as control messages. The Note On message causes a MIDI instrument to play a note. The note number and velocity however do not denote an absolute pitch and an absolute loudness. The Note On message simply communicates that a certain key has been pressed with a certain intensity. The interpretation of this message is completely up to the recipient. It may result in the desired pitch and loudness but might as well not. Often the velocity is mapped by the instrument to another sound parameter than loudness or the keys are mapped to different pitches (by the use of different scales or a pitch offset of a sound). It must be noted that the result is strongly dependent on the attached instrument and its settings. The Note Off event causes the playback of the note to stop. This may as well not mean an immediate end of sound rendering, often sounds take some time to fade out. Note that MIDI is inherently keyboard oriented: a continuous creation of sound is not provided for. To create such an effect, a Note On without a Note Off must be sent. 3.1. Sonification and MIDI 19

The Control Change message is even more ambiguous. It communicates that a sound controller (like a slider or a knob) was used and reports its new value. Which sound parameter the respective control number is mapped to (if any) is unclear and largely depends on the attached device. Certain numbers are officially reserved for certain parameters like volume or filter cutoff, but these recommendations are as often adhered to as not. Even if a device implements the recommended mapping, the exact amount by which the sound parameter is changed is not defined. If a device indeed uses controller number 7 for volume control, a controller value of, let’s say, 80 has no definite meaning. Does the volume change in dB or linearly? How many decibels are mapped to a controller value of 80? This is different for every device and, even worse, often for every sound of a device. The last message of interest is the Pitch Change that affects the pitch of an already played note. If no note is played, this message has no effect. Again, the meaning of the pitch controller is ambiguous. The amount by which the pitch is changed depends on the attached device and its pitch bend settings. Some devices allow the setting of the pitch bend range by a combination of control messages, others do not. Except for the pitch bend message which uses up to fourteen bits, the data resolution of MIDI messages is severely limited. Seven Bits give a range of [0, 127] which is often not enough to detect fine nuances in a data stream, especially if some headroom is added to compensate for unforeseen peaks in the data. Many devices even only use seven or twelve bits for the pitch bend message and simply ignore the least significant bits. We now see the two mayor drawbacks of using MIDI as a medium for sonification: ambiguity and low resolution. So why do any sonification with MIDI messages at all? The reason is that there is an enormous number of instruments that understand MIDI, and though one cannot rely on a certain interpretation of MIDI data, one can fine tune and adapt the data to a certain attached device. That way, a large number of sound generation methods can be tested. If a certain sound generation method is deemed useful, it can be implemented using different tools. Sometimes, one may want a certain sound that is only available on a certain instrument. Often, the only way to create a sonification with this instrument will by via MIDI. So it is useful to have a MIDI based fallback implementation in addition to a more sophisticated one. This is why a number of modules were built that provide a fallback MIDI implementation to send pitch shift and controller data to create a sonification. For a description of these modules, please refer to appendix A. A final word on ambiguity: the introduction of the General MIDI (GM) and Extended General MIDI (XM) standards was aimed at removing much of this ambiguity. It must however be noted, that these -while being use- 20 3. Implementation of Realtime Sonification ful for tasks like playing ringtones on mobile phones or background music for annoying web pages- are of minor importance in a professional context. Instruments supporting these standards are mostly aimed at the consumer market and often do not provide any innovative sound generation methods or high sound quality.

3.2 Sonification and OSC

OpenSound Control (OSC), defined in Wright [2002], is a message based protocol that is aimed at replacing the outdated MIDI protocol. For a first introduction, please refer to Wright et al. [2003]. Like MIDI, it is a message driven protocol. Unlike MIDI however, no underlying transport protocol is specified for OSC. OSC can theoretically be implemented atop of a number of transport protocols, though all known clients so far only use UDP based implementations. By using a variety of data types, among them floating point types with up to 64 bits (though the 32 bit type is standard), OSC does not suffer the problems of low resolution of the MIDI protocol. By in- troducing a querying mechanism with which an OSC client (e.g. a sequencer or sonification application) can query information about OSC servers (e.g. a synthesizer or sound rendering framework), the problem of ambiguity is partly remedied: it is at least clear which targets exist. Yet OSC does not define semantics for messages. Basically servers are free to decide which mod- ulation targets to offer, how to name them and how to interpret messages sent to them. Considering the large number of sound rendering methods and different approaches to controlling these, a specification defining the seman- tics of all messages precisely seems hardly feasible. With every introduction of a novel sound approach, this protocol would need to be updated. New devices could not claim to be “OSC-compliant” from the start. Yielding to that fact, OSC defines no semantics but allows every server to completely de- fine their own semantics, organized in a tree-like structure and only provides the mentioned querying mechanism to allow a more comfortable interaction with the protocol. With a constantly growing user base and number of supporting applica- tions, OSC must definitely be considered in further sonification work. By constantly using OSC as an intermediate medium for sonification, the actual sound rendering can be decoupled from the sonification source and can thus be replaced and handled more transparently. That way, an existing sonifi- cation OSC client can easily be combined with a new OSC server offering a novel sound rendering solution. 3.3. Choosing an Appropriate Sonification Tool 21

The sonification solution presented here did not make use of OSC yet. This has a number of reasons: • OSC support for Pd under Windows was broken at the time of writing. • The realtime sonification of rowing machine data required implemen- tation of a custom protocol anyway (see section 3.6.3). The C++ code of that implementation can however be easily reused to implement a mapping from said custom protocol to OSC. • The second case study was provided in the form of motion data saved to files. As reading of these files was possible with the chosen framework directly, there was no need to use OSC additionally. Nevertheless, reuse of the presented modules and sound rendering so- lutions in an OSC context is easily possible. The naming scheme used for addressing modulation and configuration targets as described in section 3.5.1 is OSC compliant. As soon as OSC support is stable with Pd, the OSC sup- port of Pd can simply be used as another means of input. In section 7.1 a proposed OSC extension to the MotionLab Software is described.

3.3 Choosing an Appropriate Sonification Tool

This section reviews frameworks that were considered to realize the task of realtime sonification. Instead of building a tool from scratch in a general pur- pose language like C++, a language designed for the task of audio rendering was sought for that would shorten the time needed to arrive at a usable implementation. Tasks like converting sample formats, communicating with audio and MIDI drivers, creating basic waveforms, filter implementations etc. should be left to the framework so the main focus of the programmer could be the sonification itself. Prefabricated synthesizers -though often providing powerful means for creating individual sound rendering solutions- where not considered, as the flexibility and advantages of a programming language were still expected. Four languages for audio programming were considered: • CSound CSound [Vercoe, 2006] is the oldest of the languages discussed here. It is a successor of the MUSIC-N2 family of languages and released under 2 The MUSIC program developed in 1957 at Bell laboratories was the first software that created music with computers. It had a number of successors named MUSIC II, MUSIC III, MUSIC IV, MUSIC V and MUSIC 360 - hence the name MUSIC-N for this family of audio generation languages. 22 3. Implementation of Realtime Sonification

the GNU Lesser General Public License (LGPL). It runs on a variety of platforms (Mac OS X, Microsoft Windows, GNU/Linux and other Unices) and has a large user base. A Csound program is divided into a definition of a score (definition of the input to the sound generation) and an orchestra (definition of the sound rendering methods). Though Csound is basically run from the command line, additional applications (foremost CsoundVST) exist that allow the control of Csound programs with a graphical user interface.

• Super Collider Super Collider3 uses an object oriented language developed by James McCartney in the style of Smalltalk. It runs on Mac OS and GNU/Li- nux, a Microsoft Windows port is in development. It is released under the GNU General Public License (GPL) and has a strong focus on realtime performance. The modification of code in a running program is possible.

• Nyquist Nyquist [Dannenberg, 2007] is a functional audio programming lan- guage derived from XLISP. It was written by Roger Dannenberg at the Carnegie Mellon University and can be run under Mac OS, Mi- crosoft Windows and GNU/Linux. It is distributed under a custom license which allows free copying and the creation of products based on Nyquist. As the source code is available, modifications are possi- ble. However, modified versions of Nyquist may not be redistributed without the author’s permission.

• Pure Data (Pd) Pd [Puckette, 2007] is a graphical programming language developed by Miller Puckette and released under a BSD-style license. It can be regarded as an open source version of Max/Msp which was also devel- oped by Puckette at the Institut de Recherche et Coordination Acous- tique/Musique (IRCAM). Its graphical, dataflow oriented approach makes it easy for beginners to start working with Pd. Pd programs can be easily modified while running, which encourages fast debugging and experimentation.

There where a number of demands on the audio programming language used for the sonification experiments:

3 http://swiki.hfbk-hamburg.de:8888/MusicTechnology/6 3.3. Choosing an Appropriate Sonification Tool 23

• Realtime: Ee want immediate feedback for movements. Technically all of the alternatives allow for realtime sound generation. Nyquist however provides only writing soundfiles to stdout under Unix systems. To create a realtime sonification, this output would have to be redirected to an additional audio player. This rather uncomfortable setup ruled out the use of Nyquist under GNU/Linux for the moment. CSound, though able to respond in realtime, has a stronger focus on offline rendering. Realtime input is generally only possible via MIDI - which presents the drawbacks discussed above- or realtime audio input, which would make it necessary to transform motion data into an audio sample stream first.

• Ease of use: normal users should be able to work with the resulting sonifications. Both SuperCollider and Pd provide the possibility to create user in- terfaces to custom modules fast. With both CSound and Nyquist this interface would have to be created using an addition general purpose language or additional toolsets and connected with the sound rendering module.

• Fast development: Implementing new ideas should lead to prototypes fast, to weed out less favorable approaches. All of the languages have this characteristic. The possibility to change a sound rendering program while it is running is however only provided by Pd and SuperCollider, which gives those an advantage in this category.

• Flexibility: We want to be able to implement a wide range of audio generation techniques and be able to expand the platform to our needs. All of the candidates provide libraries with a large variety of sound generation algorithms. They are all either powerful languages in them- selves or provide the possibility to add modules written in a general purpose language where the built-in language features prove unfit.

• Platform independence: Platform independence is an indicator of good software design and poses less restraints when one wants to combine technologies. The most important platforms for these considerations were Microsoft Windows and GNU/Linux, though Mac OS X support is of course an additional advantage. This is where SuperCollider was ruled out, as it does not provide Windows support (yet). Windows compatibility 24 3. Implementation of Realtime Sonification

was necessary for the cooperation with the the Institut f¨urSportwis- senschaft und Sport at the University of Bonn described in section 1.1.

• Not too pricey: Using low price software enables more people to repro- duce the presented results. All languages presented fulfill this criterion. They are all available free of charge.

• Open source: Ideally the used software should be open source to enable us to make changes to the platform itself, should it become necessary. All of the above alternatives are using some kind of license that allows modification of the source code.

Basically all of the alternatives presented are powerful tools for sound generation in general and realtime sonification in particular. It was finally decided upon Pd as it performs well in all of the above categories and has a shallow learning curve, as the graphical programming paradigm Pd imple- ments is very intuitive and allows for very fast results. But depending on where ones main focus lies, another tool might prove more adequate. Nyquist definitely provides the most powerful language, as it features convenient specializations for audio rendering while still maintaining the flexibility of a general purpose language. As it is released under a License that allows modifications of the source, the missing direct output to the soundcard under Unix-style systems can be added if necessary. At some point the flexibility of the language might outweigh this drawback. If controllability of final sound rendering methods via a GUI is less im- portant, both CSound and Nyquist become serious alternatives. If the em- bedding of the final result into another application is an issue, Csound and Nyquist become especially interesting, as algorithms implemented in these two languages can be easily embedded into other applications written in dif- ferent languages. 4 Without the need for Windows support SuperCollider might as well prove a better alternative to Pd. To summarize, there is no “best” audio programming language for sonifi- cation tasks. Pd provided all the demanded abilities and had the additional advantage that programs can be edited while they are running as well as the possibility to create -albeit simple- GUIs for the resulting modules. So it was chosen for the task at hand.

4 A good example for this is the excellent open source audio editor Audacity (http: //audacity.sourceforge.net/) which is shipped with a large number of plugins written in Nyquist. 3.4. How Pd Works 25

3.4 How Pd Works

The purpose of this section is to give a general overview of Pd, as a start to using the finished sonification objects and to better understand the following sections. It is by no means a complete overview of Pd or Pd programming. For a thorough discussion of pd see Puckette [2007] or Zimmer [2006].

Fig. 3.1: A Pd session with some patches.

To use Pd one starts the main Pd application. A running instance of Pd is called a session. Within a session, an arbitrary number of patches can be opened (such a patch can be regarded as a Pd “program”). A patch is a graph whose vertices are called boxes and whose edges are called connections. The boxes generate/process data, the connections transport it. All connections are directed, i.e. data only goes one way. Though saved as a text file, a Pd patch is strongly connected to its graphical arrangement. Editing and using of patches is only done through the graphical representation. A big advantage of Pd is that there is no compilation process, patches can be edited while they are running which greatly speeds up development because changes become audible immediately. A drawback of this approach is a performance hit, as the whole patch is basically evaluated by an interpreter. Nevertheless, Pd performs quite well because most computational expensive actions are 26 3. Implementation of Realtime Sonification performed by single objects which are written in C and compiled into native machine code. There are three kinds of boxes: objects, messages and GUI controls, which can be regarded as objects with a special graphical representation. Techni- cally spoken, comments are boxes too, but they have no functionality and cannot be connected. All these boxes can be interconnected by two types of

Fig. 3.2: An example for Pd objects with some connections. The audio connections are painted fatter to distinguish them from the message connections. edges: audio connections and message connections. Audio connections are distinguished graphically from message connections by thicker lines. As all connections are directed, it must be possible to determine the direction of a connection. To do so, boxes have inlets and outlets. A connection can only go from an outlet to an inlet, and an inlet or outlet is either able to handle audio or control data. Audio data can be thought of as a stream of samples which is e- valuated constantly. The moment an audio outlet is connected to an audio inlet the data is flowing from the outlet to the inlet, as in a phys- ical audio cable used in a hi-fi sys- tem or modular synthesizer. For nearly all purposes it is sufficient Fig. 3.3: A collection of GUI objects. They to think of audio processing on a can be used for creating GUIs for per-sample basis though it is ac- new objects like the channel ob- tually done on blocks of data for ject. performance reasons. Audio data is internally represented as 32-bit float val- ues, which gives an extremely high dynamic range and practically eliminates 3.4. How Pd Works 27 clipping problems. Only when data is output to the hardware it should be scaled to a range of [−1, 1] to avoid of the output signal.

The processing of control data works differently, as it is not constantly flowing. Con- trol data needs to be created at some place and then flows through the connections caus- ing objects to react on them. Fig. 3.4: Examples for Pd messages. Note how As long as no data is sent, no- the shape of the boxes distinguishes thing is computed. This can them from object boxes. be compared to a MIDI con- nection which also only transports data if input is created by a musician (or MIDI software). Otherwise the connection is “silent”. Control data either consists of numbers or symbols (strings), see Fig. 3.4. Pd does not dis- tinguish between integer and floating point values, all numbers are floating point numbers. Numbers and symbols can be concatenated into lists which are treated as single messages. Object5 boxes are the building blocks of Pd6. They do the actual work like arithmetic operations on numbers or audio signals, reading data from a file, writing symbols to the main window and much more. Message boxes can be thought of as constants for messages. Every time a message enters the inlet of a message box, the message stored in the box is emitted by the outlet. A simple shell script-like replacement mechanism can be used to reference entries in incoming lists to build the outgoing messages dynamically. Message boxes do not play an important role for the presented sonifications from the end-users point of view. The sonifications are built out of object boxes with high-level functionality which only need to be connected directly. Object instances are created by adding a new object in the patch view and typing the name of the desired object type into the initially empty box. The name typed into the empty box is also referred to as the constructor of the object, as it tells Pd which object to create. Many objects also can/must be supplied with constructor arguments which influence the behavior of the resulting objects. These may be mandatory or optional (with appropriate

5 We will often refer to object boxes simply as objects for short. The notion of an object is similar but not identical to the notion of an object in object-oriented programming. In object oriented programming we would call the object type (e.g. unpack ) a class and a object box present in a patch an object (or instance). We will refer to both simply as “objects”. 6 Pd objects are highlighted like this: an object 28 3. Implementation of Realtime Sonification default values). The median object presented in section 3.6.3 for example needs the width of the filter window as initial argument. A large number of objects is already shipped with the standard version of Pd, it is however possible to add new objects. This can either be done by creating abstractions or programming objects directly in C or C++ by using the API provided by the header files shipped with Pd 7. In this case these new objects are referred to as externals. An abstraction is basically a Pd patch that can be included in another Pd patch and used like an ordinary object. Special objects exist which equip a patch with inlets and outlets that can be connected in the embedding patch. For objects presented here both approaches were taken, depending on the respective necessities. Implement- ing an abstraction has the benefit of fast development and the possibility to use already existing objects to build the new one. A new object is build by connecting existing ones that way. It is also possible to nest abstractions, which allows reuse of often used Pd constructs. Another advantage is that using the already existing objects immediately makes the abstraction exe- cutable on all operating systems supported by Pd, while the C/C++ based objects need at least to be recompiled, if not rewritten. Using the existing GUI control objects, a Graphical interface for a new module can be built easily, in a platform independent manner. Then there are of course cases where there is functionality not yet available in Pd that is needed to realize a task, the C/C++ code is easier to write because of language specific capabil- ities (the graphical, data flow oriented paradigm of Pd is not well suited for building complex data types) or just because the C/C++ implementation is faster. There are a large number of externals and abstractions available for Pd as additional libraries. Apart from the standard Pd version provided by Miller Puckette there is an extended version of Pd named “pd-extended”. This extended version includes most of the additional libraries available. The use of these libraries offers a multitude of additional possibilities. Regrettably it turned out that many of these do not work properly under Microsoft Win- dows. In the end it was decided to use a combination of the standard release of Pd coupled with the iemlib library by the Institute for Electronic Mu- sic and Acoustics Graz.8 This combination turned out to work reliably on Microsoft Windows as well as GNU/Linux.

7 For a thorough discussion of programming externals, see Zm¨olnig [2001] 8 http://pd.iem.at/iemlib/ 3.5. Using Pd as a Sonification Tool 29

3.5 Using Pd as a Sonification Tool

In this section the basic objects written to facilitate the creation of sonifi- cations with Pd are presented. The goal was to enable people not familiar with Pd to create a sonification with a small number of intuitional objects without worrying about the inner workings of Pd too much.

3.5.1 Loading and Saving Settings One of the paradigms of pd is that everything that is saved with a patch must have been entered by the user as a “constructor”. That means that only the boxes and their connections are saved with a patch, not the state the boxes are in. Imagine creating an osc∼ object (a simple cosine oscillator) and setting its frequency to 440 Hz with a slider that is attached to it. Saving the patch will only store the information that there is a oscillator and a slider, the fact that they are connected and their positions on the screen. After opening the stored patch, the Oscillators frequency will be back to 0 Hz and the sliders position will be far left. Of course Pd provides a solution to this. The “normal” way to save state in Pd is to store the information to save in message boxes, as every information in Pd needs to be a message. If you want to store the state of the oscillator and slider mentioned before, one would have to create a message box containing the number “440” and connect it to the slider. To reload this information, one would have to press the message box after loading the patch or connect it to another object ( loadbang ) that triggers the message the moment the patch is loaded. There are good reasons for this kind of behavior in the everyday usage of Pd: • No ambiguity: Every information contained in a patch is visible in the graph. It can always be deduced where a value comes from. • Simplicity: The standard message system which is very robust is also used for saving state. There was no need to to implement an additional functionality. This keeps the Software slim and elegant. The state information and the program structure are one. • Ease of Implementation: When writing externals, the programmer need not worry about an implementation for persistence of his new object, as long as it cooperates with the standard message system (which it needs to do anyway). • Scope: Pd has no idea of scope. Which objects can communicate with each other is determined by their connections, variables ( var ob- 30 3. Implementation of Realtime Sonification

jects) are always global. That way it would be hard to determine to which object a saved state belongs if there is more than one instance of the same type. As the messages must be connected to the object whose value they hold, this problem is automatically resolved. As can be seen from the above summary, there are good reasons for the way Pd normally handles state. For sonification purposes however, this approach presents us with a number of drawbacks. • Overview: Because every information is visible in the graph, the screen gets cluttered. • Not easy for beginners: To use the standard approach, one has to be- come familiar with the Pd message system. This should be unnecessary for people who just want to use the sonifications. • Error prone: If one object needs more than one value (the state of the scaling object used for motion data streams for example consists of seven values) one needs to prepend selectors to the messages. One needs to know all the selectors for the values which of course leads to typographic errors fast. • Uncomfortable: Every value to save needs to be typed in manually. If one wants to “play around” with the settings until the correct one is found, this interrupts the workflow. Settings should be stored with the press of a button. To decouple the saving of state information from the saving of patch information, the sonifications where first built using the RRADical library.9 “RRADical” stands for “Reusable and Rapid Application Development” and is a library of abstractions that amongst others contains abstractions for saving and loading the state of an object. To do so, RRADical maintains a hashtable (provided by another library) in which all data is stored as key/value pairs. This data can either be set to the appropriate objects or stored to a file. To make this work work, the RRADical objects use send / receive objects 10 with a single name. Now every object whose state is to be made “savable” is connected with a RRADical commun object. Additionally each subpatch containing commun objects needs an addi- tional originator object that in turn communicates with the global pool . 9 http://footils.org/cms/show/1 10 send and receive objects are objects that work like a connection. They both get a name as first parameter. All data sent to the inlet of a send foo arrives at the outlet of every receive foo 3.5. Using Pd as a Sonification Tool 31

Fig. 3.5: The standard way to save settings in Pd: A message box containing the settings is connected to the object whose settings to save. The strings in front of the values are so-called selectors that tell the object what parameter the values are for. The loadbang objects triggers the messages on loading of the patch.

By wiring all the sonification objects like that “under the hood” their states became savable. This provided additional work at the implementation stage but made working with these objects much more convenient for the end user. It turned out however that these objects depended on libraries that were among those not working reliably under Microsoft Windows (see 3.4). As the sonifications were supposed to be run on GNU/Linux as well as Microsoft Windows, the RRADical approach had to be discarded. Instead a new set of abstractions was created that only depended on the standard objects shipped with Pd. The main idea of RRADical was adopted. A new object named sv (appendix A.36) for “savable variable” 11 was created that was to be wired to an object that holds a number value. On receiving a signal from the global send it emits its name and value to the global receive. A similar object was created for objects holding a symbol instead of a number ( ssymbol , appendix A.30). Finally a settings object (appendix A.23) that provides a simple GUI and the mechanism for file IO was added. The implementation however was much simplified. No global hashtable is maintained and no

11 The normal variable object of Pd is generally used in its abbreviated form v 32 3. Implementation of Realtime Sonification

OSC (see 3.2) path parsing12 is performed. This makes the implementation independent from external libraries while still providing the needed load/save functionality albeit in a more robust fashion.13

Fig. 3.6: Diagram of the newly created saving functionality. The settings ob- ject loads/saves values to a text file using the names of the sv and ssymbol objects as keys. When loading, the names and the values are used to construct “global” messages (designated by the leading “;”) which are automatically send to the receive object inside the sv and ssymbol objects. When saving, the sv and ssymbol ob- jects communicate their settings to a global receive which is inside the settings object.

This approach -apart from being easier to use- proved also much more robust and reliable and was used with all objects presented here that have state information. All these connections to the sv and ssymbol objects are done within the abstractions, so all that is to be done to make the state of a sonification patch savable is to add a settings object. The only drawback (which this approach has in common with the RRAD- ical patches) is that each of these savable values needs a unique name, as this is the way by which the data is fed back to them when loading the settings. To make this work, each sonification object has to be given a unique name when built. Then the unique name of the values is built by prepending the parent name to the value name which only needs to be unique within the patch. By using only names that start with a slash‘/’ and consist only of US-ASCII characters otherwise, the user can make sure the names are all OSC compatible.

12 The OSC path parsing is one of the broken features of the Pd-extended version for Windows. Once this problem is fixed, OSC capabilities can easily be added to the presented custom settings method. 13 By using a hashtable, RRADical was able to maintain a number of settings at once which could be switched. This functionality was not duplicated as the sturdy implemen- tation was deemed more important than this extra feature. 3.5. Using Pd as a Sonification Tool 33

3.5.2 Scaling of Motion Data Often, incoming data streams have to be scaled and transformed to the do- main needed for the respective sonification method. In the case of the rowing motion, the force of the footrest usually lies somewhere between 0 and 800 Newton. If the force is to be mapped to, say, the pitch of a sine wave, the values have to be scaled properly. An interval of two octaves starting at C3 would correspond to the MIDI note interval [48, 72]. This would result in x the following transformation: f(x) = 800 · 24 + 48. Pd has a large collection of objects that perform arithmetic operations. For every necessary transfor- mation the necessary subpatch could be built. To make this job easier for the user, an abstraction was built that provides a user interface to the most common scaling operations (appendix A.29). This abstraction performs the following tasks:

• The incoming data is scaled to the interval [0, 1]. A tolerance for the low and high interval borders is added to the total input interval.

• On the downscaled data a gamma function is applied.

• The result is scaled to the desired output interval.

The user can supply the input interval I := [inlow, inhigh], the tolerances tollow and tolhigh for that interval in percent, a gamma value γ and the output interval O := [outlow, outhigh]. The incoming data is constantly measured for minima and maxima which can be set as the input interval. This facilitates finding the correct values for the mapping. The tolerance values are there to give the possibility to provide some headroom for unforeseen extrema in the motion data. Those can easily occur when measuring input from human test subjects. With the gamma function, the ascent of motion data can be made better discernable in the audio result. This is especially useful for movements that jump to their extrema fast, like the force on the footrest in the rowing machine case. Here is the transformation applied by the scaling object in full detail:  γ x − inlow f : x ∈ I → O = ∗ |O| + outlow sizein

The size sizein is the distance of the low and high borders of I with the low and high tolerance applied:

100 + tol + tol size := (in − in ) ∗ low high in high low 100 34 3. Implementation of Realtime Sonification

3.5.3 Audio Utilities

Pd provides objects to perform mathematical operations on audio streams. With these, the task of adjusting the loudness of audio results of the various data streams as well as their position in the stereo panorama can be done. Again, to speed up the creation of sonification patches some abstractions where built that facilitate the fine tuning of the patches and enable the user to save the settings conveniently. The channel object (see appendix A.3) was built for adjusting the volume and stereo position of an audio stream. It works like a channel in a physical mixing console and provides a level meter and clip indicator for monitoring volume. A master version of this channel was built that outputs the incoming audio to the soundcard and provides a global volume control. Both the channel and the master can be muted. Additionally, the master provides the possibility to record audio to the Harddisk (see appendix A.13). To improve sound quality, the freeverb object was added to many patches, as a adding artificial reverberation makes the audio results less stressing for the ear. It was wrapped in an abstraction that provides a GUI for the parameters and makes the settings savable (appendix A.20).

3.6 Making Motion Data Available in Pd

The moment motion data is available as a stream of messages within Pd, all the possibilities of Pd are available for creating audio feedback from this data. This section examines the methods used for making motion data available as messages in Pd.

3.6.1 Input from Files

The first and easiest approach taken was to first write the motion data to a textfile. Pd has built in objects for reading and writing textfiles. Whitespace was used as a delimiter between successive values, a newline separated indi- vidual measurements. Each line is then converted into a message containing a list of numbers when reading the file. The sample rate of the measurements can be set in the file input object (appendix A.9) which contains an inter- nal metronome. With every tick of the metronome, one line is output. The list message can then be easily split up into individual numbers which then are scaled and used as control data for audio or MIDI generation modules. 3.6. Making Motion Data Available in Pd 35

Fig. 3.7: The easiest way to get motion data into Pd is by whitespace separated values in text files.

3.6.2 Input via TCP/IP

Pd offers a convenient way of communicating messages over TCP/IP, the built-in netsend and netreceive objects. The netreceive object can be easily used to get motion data into Pd: It listens to the port given as first argument in the constructor for incoming TCP/IP packages. The data in these packages is interpreted as an ASCII String, the format conventions are the same as for the textfile input described in section 3.6.1. The end of a line (i.e. the end of a message) is designated by a semicolon ‘;’. To send a single measurement consisting of the floating point values 3.14, 42, 0.001, 123 for example, the string “3.14 42 0.001 123;” must be sent in a TCP package to the port netreceive listens to on the machine the targeted Pd session is running. That way, computationally very expensive sonifications could be divided onto different machines. This, however, was not necessary with the sonifications presented here. For GNU/Linux there exist the command line utilities “pdsend” and “pdreceive” that are shipped with Pd which do the task of communicat- ing with Pd over TCP/IP. They read/write from/to stdin and communicate with Pd in appropriately formatted packages. A program written in C/C++ wishing to send motion data to Pd could just start these utilities with the 36 3. Implementation of Realtime Sonification exec family of C-functions and send the data to them. These utilities are not ported to Microsoft Windows yet. But as they are open source they can serve as a reference implementation.

3.6.3 Input from Rowing Machine sensors As described in section 1.1, a case study for the sonification methods pre- sented here is the sonification of data gained from a sensor equipped rowing machine. This rowing machine was developed by the Institut f¨urForschung und Entwicklung von Sportger¨aten (FES).14 Here the external developed to use that data in Pd is described (see also appendix A.7 and A.8).

Fig. 3.8: The old setup for rowing sonification without Pd.

The rowing setup consists of the sensor-equipped rowing machine, one computer to receive and display the sensor readings and a second computer to playback reference video, record new video and create the realtime soni- fications. The rowing machine sends its measurements to the first computer via USB (the USB controller chip and driver actually simulate a virtual serial port). The software that handles communication with the rowing machine, sensor calibration and correct normalization of the sensor data according to the calibration is also developed by FES and written in LabView, a graphical programming language for measurement and control applications by National Instruments15. As this rowing machine software is rather ressource hungry, it is run on a separate computer. It has the possibility to either save measured data to a file, to send it via TCP/IP in a binary form or to send it via a serial port in a frame based binary format specified by FES.

14 http://www.fes-sport.de/ 15 http://www.ni.com/ 3.6. Making Motion Data Available in Pd 37

The original setup using the MotionLab Sonify Plugin had another appli- cation written in LabView running on the second computer that could read the frame based format from the serial port, and send the same TCP/IP packages of the main application to localhost. They were received by a stand-alone version of the MotionLab Plugin which created a sonification with the General-MIDI synthesizer shipped with Microsoft Windows.

Fig. 3.9: The new setup for rowing sonification using Pd with the parallel port plugin.

One goal was to remove the unnecessary intermediate step of reading the data from the serial port with the second LabView application and sending it to localhost, as this setup was bloated, inconvenient to use and increased the latency of sound processing as well as costing valuable ressources. The preferred solution would have been to use TCP/IP packages, as Pd already has built-in support for that. But as there was no access to the source of the main LabView application, the format of the sent packages could not be changed as well as the destination IP address (which was hardcoded to 127.0.0.1). So an object to read the sensor data from the serial port was written that could parse the FES frame format. The implementation runs on both Microsoft Windows and Linux with only a minor part of the code (handling of serial port IO) being platform dependent. This module was wrapped into an abstraction to add a simple GUI for convenience. The abstraction outputs a list with numbers similar to the file and TCP/IP input methods. That way it is possible to switch between input methods easily just by drawing new connections with the mouse. One technical problem with the sonification of the rowing machine data was that there where occasional glitches in the sound rendered due to occa- sional frame skips or defective frames sent by the FES software. These errors 38 3. Implementation of Realtime Sonification presented themselves as extreme deviations from the previous and following measurements, mostly for only one sample (“salt and pepper noise”). These were easily filtered out by applying a median filter with a small window size (3-5 samples) to the sensor data (see appendix A.14). This removed the glitches without noticeable changes to the sonification itself. The median filter was also written in C++ as an external, as the imple- mentation using the STL was much faster to write than trying to sort with built-in Pd objects. This is one of the rare cases where the implementation as an external is done faster as the API available for C++ provides methods better suited for the problem.

3.7 Summary

We have seen that MIDI is not the only medium with which to turn data to sound. Yet, outdated as it may be, we have also made clear that a de- fault MIDI implementation is useful as there is such a large number of sound sources supporting it. The OSC protocol, which does away with many draw- backs of the MIDI protocol has been discussed as a promising alternative. Pure Data was chosen as the tool for implementing the sonification ex- periments. A first introduction to it was given. With the presented imple- mentation for storing session settings the usability was improved. Finally the methods used for making the motion data available in Pd were reviewed. 4. CONTINUOUS PARAMETER MAPPING SONIFICATION

After having discussed the infrastructure that is used to create our soni- fications, we will now move on to describing various methods for creating continuous parameter mapping sonifications from streams of motion data. The main focus will lie on other modulation targets than pitch to create more pleasant results. Still, we will start with the discussion of pitch modu- lation, because it is so often used for sonification. As we will see, if certain aspects are considered, the perceived quality of continuous pitch modulation can still be greatly improved. As we are mainly interested in sonifying continuous data streams and giving the user permanent feedback to his actions, we create continuous, un- interrupted sounds. This means, that the methods as they are used here do not create a designated attack phase, as many real instruments and synthe- sizers. We do not create distinct notes, where we have a certain interval of the sound that is clearly a “beginning” of the sound. Yet, such attack phases that are often characterized by extremely fast timbre and pitch changes (often containing partials that are not multiples of the fundamental), are important for the recognition of real world instruments. Those attack phases are called transients. Even with synthetic sounds, transients are integral to the overall percep- tion of the sound. Transients are an important part of sound design. Often, the transients of a sound determine how it is received by the listener. There even exists specialized hardware for modulating and shaping transients, e.g. the SPL Transient Designer.1 Though we have concentrated on continuous sound mappings in this work, transients are a promising aspect of sound de- sign neglected in the field of sonification so far. A possibility to use transients to carry information for motion data sonification could be the approach in chapter 6.

1 http://www.spl-usa.com/Transient Designer/in short.html 40 4. Continuous Parameter Mapping Sonification

4.1 Audio Artifacts

This section deals briefly with the audio artifacts that can occur when using parameter mapping sonifications. We will discuss the two classes of artifacts encountered while implementing the methods described later in this chapter and how they were removed or suppressed. We will then silently ignore the handling of possible artifacts in the discussions in the later sections and assume that the appropriate techniques were applied were necessary.

4.1.1 Zipper Noise In section 2.1.1 we stated that the sonification data is usually available at a much slower rate than the rate at which the audio data is created (100 Hz vs. 44100 Hz for the rowing data). To make the sonification data available at the audio sample rate we decided to use sample repetition. This was done for two reasons:

• Sample repetition introduces the lowest latency. A change in value is immediately available. If we have some interpolation scheme by default, we introduce always a certain additional latency to the sonification process. It also commits us to a certain interpolation/filtering method. If we always assume the piecewise continuous signal created by sample repetition we can add some custom interpolation scheme if and where necessary.

• Sample repetition is the way Pd handles this conversion for us. Sonifi- cation data is made available in Pd as messages (see section 3.6), our sonifications create audio data. At some point, a stream of messages is fed into an object requiring control data at the audio sample rate. At this point, Pd implicitly handles the messages as if an sig∼ object was inserted into the message path. This object simply emits a con- stant audio signal that has the value of the last number message it got as input.

Depending on the modulation target, the sudden jumps introduced by the sample repetition can introduce unpleasant audio artifacts, known as “zipper noise”. The term zipper noise stems from the sound these artifacts create: a slight crackling moving with the direction the control data, like a Example 7: zipper being opened or closed. We will illustrate this effect using one of the Zipper noise simplest of the sonification processes used in this work: lowpass filtered white noise. The input stream used is that of the force on the handle of the rowing machine. Using only sample repetition, the zipper noise is clearly audible. 4.1. Audio Artifacts 41

Fig. 4.1: Left: The control data at audio sample rate with sample repetition: The steep flanks caused by every new sample value are clearly visible. Right: The smoothed result created by inserting a line∼ object.

Fig. 4.2: The spectrogram of the filtered white noise created by the control data on the left of fig. 4.1. The zipper noise created by the sudden jumps in the unsmoothed control data is clearly visible as artifacts over the whole frequency range.

Fig. 4.3: The spectrogram of the sound created by the smoothed control data. The artifacts are completely removed, the desired audio result remains practically unchanged. 42 4. Continuous Parameter Mapping Sonification

To remedy this, the approach described in [Puckette, 2006, chp. 1.9] is used: a line∼ object is inserted in the message path. This object works like the sig∼ object: it emits a constant audio stream with the last value Example 8: it had as input. Contrary to the sig∼ object, it does not immediately Zipper noise jump to the next value when input arrives, but ramps to it in a specifiable removed by interpolation number of milliseconds. (This is basically linear interpolation.) In all the cases where this was necessary, 10 ms proved to be slow enough to suppress audible artifacts.

4.1.2 Foldover The Nyquist-Shannon sampling theorem states that a band-limited signal f containing frequencies ≤ ωmax needs at least to be sampled with a frequency Example 9: of 2ωmax so all original frequencies can be reconstructed. If we now sample Foldover created by the signal at ω < 2ω , a partial with frequency φ above ω /2 the tristimulus sample max sample model is reconstructed wrongly from the samples to have frequency φ − ωsample/2. That is, it is folded over the sampling frequency to a wrong frequency. This is also known as aliasing. How this is prevented depends on the process with which the original signal f is created.

Fig. 4.4: The tristimulus model with no foldover correction. The partials above 22050 Hz “fold over” from the left and land on dissonant positions, which gives the result an undesired, metallic sound.

In the present work, this problem occurred with the tristimulus model in section 4.4.4 and with the sawtooth oscillators used in section 4.4.1. As the tristimulus model is implemented using additive synthesis (summing of sinusoid partials) the solution was straightforward: the frequency input to the individual partials where clipped at half the audio frequency, so no partials above half the audio sampling frequency where created. This is the Example 10: approach taken in the original tristimulus implementation by Riley [2004]. The tristimulus The foldover effect created by the sawtooth wave in section 4.4.1 was not model with foldover correction found to be so intensive that the resulting sawtooth wave was unusable. The higher partials in a sawtooth wave are already quite weak so they only become noticeable in signals with a high base frequency. A solution to remove 4.2. Modulation of Pitch 43 it would have been to create the sawtooth wave at a higher audio sampling rate, lowpass filter it and then reduce the sampling rate to the real audio sampling frequency. This is the approach suggested by Puckette [2006, chp. 10.4]. Another technique is to use wavetable based oscillators. This and other approaches are described in detail in Stilson and Smith [1996]. Though the present implementation is acceptable to work with, it must be noted that the quality of the sonifications using the simple sawtooth oscillator can be further improved using one of the mentioned techniques.

Fig. 4.5: The tristimulus model with the changes suggested by Riley. Clipping the partial frequency at 22050 Hz prevents the foldover.

4.2 Modulation of Pitch

Pitch modulation can be applied to any audio generation method that has output with a perceivable pitch, i.e. signals with mostly harmonic partials. This applies to almost all methods used in electronic music and to most real instruments. Pitch detection of the human ear is very accurate over a large frequency range. As far as resolution is concerned, pitch seems to be an ideal parameter to transport information. But why do many pitch based sonifications sound so bad?

4.2.1 Formant Shift The perceived (aesthetic) quality of a pitch based sonification much depends the modulated sound. Nearly all natural instruments have so called for- mants, peaks in their spectrum that are independent of the pitch played by the instrument. Those formants are generally created by the physical prop- erties of the instrument itself. Depending on the shape and material of the instrument, certain frequencies are emphasized and others are suppressed. A good example for this is the human voice: vowels are distinguished by their formant structure (see section 21). To speak a certain vowel, we need to shape our mouth in a certain way. The shape of our oral cavity when pro- nouncing, say, an “Ah” lets certain frequencies resonate which emphasizes 44 4. Continuous Parameter Mapping Sonification

them. It does not matter at which pitch we speak (or sing) the “Ah” those frequencies will always be more prominent than others. Oboes for example have a formant around 1000 Hz, which means that no matter if one plays an B1 or a B3 on an oboe, the spectra for the respective Example 11: tones have a visible emphasis at about 1000 Hz. If one now uses sampled Formant shift of an oboe sounds to imitate the sound of a real oboe and pitches it by playing oboe sample the sample2 back at higher speed3, the result is perceived as artificial, even annoying, because the formant is also moved with the base pitch. We are used to hear those instruments and just expect the sound to be different than what we hear by just playing a sample at higher speed. 4 Because of this, good sample based synthesizer sounds use multisampling, that is a collection of samples, each for a small pitch range (sometimes down to one sample per note) Together with multisampling for different amplitude levels, there may be a multitude of samples for one sound. Big sampled sounds can use up to a gigabyte of sample data! So how do we avoid unnatural artifacts when modulating pitch? The problem of formant shift does generally not occur when other methods than sampling are used to create sound. If we have direct access to sound creation, we just modulate the pitch when the sound is created for the first time (strictly speaking sampling does not create sound, it just repeats it). The result of the PAF algorithm presented later in section 4.4.3 for example also has strong formants (basically, that is all it has). But there the formants are created, not played back when we hear the sound. We just modulate the pitch input and get a different pitch but still the fixed formant positions. But what if we do not want an “artificial” sound like the ones presented later in the chapter, but the realistic sound of, let’s say, an oboe? Of course we could always use more sophisticated algorithms to reproduce a certain physical instrument.5 These approaches are known as physical modelling. A model of the physical properties of the instrument is used to calculate the acoustic result when it is played. With the ever growing power of today’s

2 In the context of audio synthesis, a “sample” is often used as an abbreviation for “a sampled sound”. Technically speaking, such a sampled oboe sound consists of a large number of single data samples. 3 This is generally achieved by an interpolating lookup, the actual rate at which samples are emitted stays constant, see [Puckette, 2006, chp. 2] for details. 4 The formant shift created by faster sample playback is often referred to as the “chip- munk effect”, named after the fictional music group “Alvin and the chipmunks” from the 1960s. The songs for this group were sung much slower and lower than the final song was played. The tape with the recorded voice was then played back at higher speed which made the singers sound like chipmunks (due to the differently placed formants). 5 The reproduction of realistic sounds is of course one of the key interests when devel- oping synthesizers and artificial sound generation methods in general. 4.2. Modulation of Pitch 45

Fig. 4.6: The sonogram and one stationary spectrum of an oboe playing a B3 . The formant between about 900 and 1500 Hz is clearly visible. It is even slightly louder than the fundamental.

Fig. 4.7: The same oboe sound shown in fig. 4.6, but pitched up two octaves. The formant has now moved to the frequency range between 3600 and 6000 Hz. It is much too high and too broad, which sounds unnatural.

Fig. 4.8: An oboe playing the note B5. This is the correct sound of a oboe playing that pitch, the sample is not pitched. As can be seen, the partials are spaced as in fig. 4.7, but the formant still has the same, correct position as in fig. 4.6 46 4. Continuous Parameter Mapping Sonification

computers, such methods can already be found in commercial synthesizers and programs. The physical modelling approach was not further pursued in this work, as it requires sophisticated algorithms whose development and implementation is rather time consuming. But what if we want to use an already present set of samples? Those are available in abundance, often at excellent quality. Those sample collections are used to produce music at a professional level, they should be good enough for sonification. Sadly, we cannot use them in combination with the samplers6 they are designed for. If a sample is modulated continuously with such a sampler (using MIDI Pitch Bend events, see 3.1), the sampler chooses from the large set of samples in a patch the one assigned to the note that is played by the Note On event triggering the sound. If it now receives Pitch Bend events it will just change the playback speed of that already triggered sample. We would want it to switch to a different sample smoothly if the pitch is changed more than a certain interval. To switch to a new sample however we would have to trigger a new note and stop the old one, which would result in an audible interuption (this is actually done by the sonify note cont object that is briefly explained in appendix A.27). There is a good reason for this behavior in commercial sample based synthesizers: they are musical instruments, not sonification tools. Even if we fade smoothly from one sample to the other while it is played back, we will still hear a change in timbre. How noticeable this effect is depends largely on the used samples, but a manufacturer of such devices cannot make assumptions about the nature of the used samples. As the Pitch bend effect in music is seldom used over a very large range, the formant shift is generally not very noticeable when pitch bending is used as intended. For the task at hand however we would like to have a constant sam- ple playback combined with multisampling. This is why such a crossfading Example 12: sampling module, the crossfading loop sampler (appendix A.5), was imple- Crossfading a set of mented in Pd. It chooses from a (possibly large) set of samples always the oboe samples one closest to the current input pitch. If the incoming pitch causes a change of the current sample, the old sample is faded out while the new sample is faded in.7 This has the desired effect that -if a sufficiently dense set of sam- ples is provided- formants are not shifted over a large range. The drawback is that the switch from one sample to the next is still audible. Yet, this change is not unpleasant or annoying, if the sampled sounds are similar enough in character.

6 “Sampler” in this context means sample based synthesizer. 7 This is called “crossfading”, as the two samples have crossing amplitude envelopes. 4.2. Modulation of Pitch 47

Fig. 4.9: A sonogram of a pitch sweep over two octaves using just one sample. The formant around 1000 Hz is “spread out”.

Fig. 4.10: The same sweep using the crossfading loop sampler . The formants are still slightly shifted, but the sample switching (noticeable by short horizontal “spikes”) limits the effect significantly. 48 4. Continuous Parameter Mapping Sonification

After having implemented a sample module with the desired functionality, the problem of making professional collections of samples available in Pd remained. Pd has built-in support for loading samples in and aiff format. Professional sample collections are available in different formats, though, which often differ from manufacturer to manufacturer. In the present case, a large collection of samples was available in the E-III format used by E-Mu samplers. 8 This is a closed standard, even the CDs on which these sounds are shipped have their own file system format. Luckily, samplers which can read these CDs also understand SMDI ([Pea, 1991]), a protocol that is designed for transmitting sample data over SCSI (Small Computer Systems Interface). SMDI has the advantage that the loop points9 are also communicated. Using OpenSMDI by Christian Nowak, an open source implementation of the SMDI protocol, and the SMDI tools by Olivier Doar´e 10, a simple command line application was created that could download a sample bank from an attached E-Mu Esi 32 sampler. Those where automatically converted into wav samples and a text file containing the names and loop points of the individual samples. After manually adding the original pitches these samples were assigned to, the whole sample bank could be loaded with the crossfading loop sampler object. Simply using those professional samples instead of the consumer grade ones provided by the GM-synthesizer used in [Melzer, 2005] improved sound quality tremendously. Combined with the crossfading of large sample sets to limit formant shift, the results are much more realistic.

4.2.2 Problems with Musical Perception Even with the better sound quality of the aforementioned improved sample playback and high quality samples, extreme continuous pitch change is not really pleasant to listen to. We will now try to give a possible explanation for this based on the way we are normally used to perceive pitch in music. Following that we give some suggestions to improve the aesthetic impression according to that argumentation. A problem with modulating pitch is that pitch is the dominating quan- tity in many styles of music, especially western music. We have learned from childhood that there are “good” and “bad” sequences of pitches.11 Certain

8 http://www.emu.com/ 9 The indices between which a sample can be looped without audible clicks. 10 Both software packages including documentation can be downloaded here: http: //nolv.free.fr/SMDITools/. Work on these two tools seems to be discontinued, there is no official site for the OpenSMDI library. 11 To what extend the characterization of certain musical progressions is the result of social influences is beyond the scope of this paper. Nevertheless we all know the feeling of hearing a wrong note in a piece of music that makes us grind our teeth. 4.2. Modulation of Pitch 49 intervals between pitches are considered pleasant, others annoying. This classification is of course heavily dependent on the style, epoch and cultural background of the music in question. Yet, this classification into aesthetic categories is more extreme with pitch than with other sound parameters. Imagine a progression of n notes at m random amplitudes. While some would be associated with an explicit musical gesture, like a crescendo or de- crescendo, others would maybe perceived as random and meaningless. Still, even the meaningless progression of amplitude changes would probably not be considered as “ugly” or explicitly unpleasant as long as all the amplitudes stay below the pain threshold of about 130 db. Now choose a progression of n notes at m arbitrary pitches. The result will most likely be considered as unnerving, unaesthetic - unless you hit a “nice”12 progression of notes by chance. When we have a polyphonic mixture of sounds, each at different, con- stantly changing pitches, the negative effect on the listener is hardly surpris- ing. The whole sound mixture will be perceived as out-of-tune most of the time, as it is hardly probable that they will form harmonic intervals during their progression. But how do we remedy that? There are a number of approaches -apart from simply improving sound quality as stated in the previous section- to make pitch based sonification more “bearable”.

• The pitch ranges of the individual voices can be chosen not to lie too Example 13: close to each other. Dissonant intervals are perceived stronger as such Dissonant voices if they are closer together. with different ranges • Even when spreading the pitch ranges far over the audible frequency spectrum, care should be taken not to use too high pitches as they are Example 14: often perceived as piercing and painful. The ranges should be roughly Sonification with limited to the pitch range of a concert piano. extremely high pitch ranges • Use sounds with a less defined pitch. These can be sounds with noisy components, distorted sounds with inharmonic partials, sounds with extremely rich spectra (like the waveshaping sounds presented later in the chapter) or ambiguous harmonic structure (Shepard tones, highpass filtered tones). Sounds with less pronounced pitch have less chance for dissonating with other sounds and yet can still be modulated in a way that a progression of pitch is clearly noticeable. Sounds sent through effects like flanger, chorus, exciter etc. are also good candidates, as these also introduces detuning and ambiguity to a sound.

12 The highly subjective nature of such an assessment cannot be overstated. 50 4. Continuous Parameter Mapping Sonification

• Simply do not use too many pitch based sonifications! Some parameters can be mapped to pitch and others to other sound properties. The less voices are modulated in pitch, the less chance for dissonance there is. A combination of these techniques can often lessen the level of annoyance caused by heavy use of pitch modulation. Finally one can of course map the data stream to discrete pitch scales following accepted musical progressions. The drawback here is that we loose the fine resolution provided by a con- tinuous pitch change. We will examine methods to circumvent this loss of precision while gaining even a small degree of musical quality in chapter 6.

4.3 Modulation of Amplitude

After improving the quality of pitch modulation, we were looking for alterna- tives to it. The obvious next choice is modulation of amplitude. Of course, any kind of signal can be modulated in amplitude. But we found a simple but 1 strikingly effective and pleasant target for amplitude modulation: f noise, often also called pink noise (see appendix A.19 for the module description). 1 Pink noise has a spectral power distribution determined by f , where f is the frequency. This has the effect that the energy contained within the range of an octave is constant over the whole spectrum. This of course is useful in the context of human hearing as frequency detection is logarithmical (see 2.1.2). The term pink noise is an analogy from color perception. White noise, which has a constant energy distribution over the whole spectrum is named that way in analogy to white light, which also has an equal distribution over the (visible) spectrum of electromagnetic waves. Red noise (sometimes also 1 referred to as brown noise), which has a f 2 energy distribution has a spectrum similar to that of red light. Pink noise, having a distribution between those two is named accordingly. Pink noise resembles many (acoustic) noise sources found in nature, for example that of a waterfall, waves or leaves in the wind. This kind of modula- tion creates soothing and pleasant results. The pink noise itself is not annoy- ing to listen and the amplitude modulation does not create the unpleasant impressions of pitch modulation. Depending on the data stream modulating the amplitude, the results are indeed reminiscent of natural sounds. Fur- ther experiments with sounds mimicking environmental noises could lead to widely accepted results. Perception of noises found in nature could be less affected by individual preferences and sociocultural influences which would make them robust carriers for sonification. 13 13 Though of course we enter the field of musique concr`ete and ambient music in which case we can again benefit from the field of musicology. 4.4. Modulation of Timbre 51

Amplitude modulation nevertheless has a problem: If an amplitude mod- ulated signal is mixed with a number of other signals (be they amplitude modulated or modulated by other means), masking effects (see section 2.1.5) become a stronger problem than with other sonification methods. When the incoming data creates a relatively silent signal, it is easily covered by other, louder signals present in the mix. While these masking effects are also subject to a lot of other parameters (spectrum of the signal, temporal relationship), the most obvious and most often encountered masking is that of silent signals by loud ones. Because of this, special care must be taken when creating sonifications using amplitude modulation that the signal can still be detected in the mix, even if it has it’s minimum amplitude. This can be achieved by spatial positioning (making sure the amplitude modulated signal has a reserved space in the panorama), different timbres and pitches (if applicable). Basically all the design approaches available to make signals distinguishable from each other.

4.4 Modulation of Timbre

The aesthetic quality of the previous, amplitude modulating approach is encouraging the use of alternatives to pitch modulation. The next and most complex sound parameter is that of timbre. In order to change the timbre, we can either modulate parameters of the process generating the sound or apply transformations to a given sound, like filters. It most be noted however, that the present selection of methods is far from complete. The creation of musical sound is a vast field. Some methods that are also promising methods for sonification, like granular synthesis are omitted. A complete coverage of audio synthesis is beyond the possibilities of this work. The methods chosen here represent certain classes of generation meth- ods. Many could be regarded as the “Hello World” versions for their special paradigm of sound generation. There is also no clear line between one gen- eration method and another. One seldom finds sounds stemming from just one method, most electronic musical instruments, be they implemented only in software or by actual electronic circuits, combine many of the approaches presented here to create a special sound quality. Filters can be applied to any sound source, as can be nonlinear distortion. A signal created with additive synthesis can be used as a modulator for FM synthesis, a sampled signal can be ringmodulated, etc., etc. 52 4. Continuous Parameter Mapping Sonification

4.4.1 Subtractive Synthesis

A simple way to achieve a change in timbre, which is also used in a large vari- ety of synthesizers, is the filtering of an audio signal. Depending on the origi- nal signal and the kind of filter used, a wide array of sounds can be achieved. We used three classes of signals to filter: white noise (appendix A.32-A.34), a classical synthesizer sound (two slightly detuned sawtooth waves, appendix A.31) and looped samples of real instruments (appendix A.22). The filters used were resonant low- high- and bandpass filters. We chose white noise for the filtering experiments instead of pink noise, as it’s much brighter spectrum created better14 results.

Fig. 4.11: The characteristic curve for the lowpass (left), highpass (middle) and bandpass (right) filters used. The resonance of the high- and lowpass was exemplary set to 4, cutoff frequency to 500 Hz. The amplification of frequencies around the cutoff frequency due to the high resonance value is clearly visible.

The mapping to the cutoff frequency was chosen to be exponential like Example 15: the frequency mapping used for pitch modulation. For the cutoff/center Linear and exponential filter frequency of the filters we wanted the same characteristic in perceived change sweep over the whole frequency range. The highpass filter proved to be the least effective of the filters. Though definitely an interesting tool for sound design, it proved less effective for sonification. With the harmonic sounds (the sawtooth and sampled sounds) Example 16: Highpass filtered the highpass has the drawback that it removes the fundamental. Though string sample human hearing has the remarking ability to reconstruct the original frequency of a highpass filtered sound, the first partials contain a lot of information about the sound . Removing them removes important characteristics from the sound.

14 This is of course an subjective claim that must be further undermined by perceptual studies. 4.4. Modulation of Timbre 53

The white noise in combination with the high- or lowpass filters had the drawback that the non-filtered parts of the spectrum of the white noise are very dense and tend to mask other audio streams. In combination with harmonic sounds, the lowpass filter was very effec- tive. Lowpass filters can be thought of as a way to control the “brightness” Example 17: Filter sweep with of a sound. When material with a bright spectrum was used, the changes in and without timbre where very well traceable - which is no surprise as lowpass filters are resonance among the most common means of sound manipulation found in synthesizers. As was also expected from common practice in sound design, high resonance values helped increase the clarity of the mapping. The most effective results where created by the bandpass filtered white noise. Again, due to it’s noisy character the perceived result is rather “neu- tral”, as with the amplitude modulated pink noise. As the bandpass filtering removes a very large part of the audible spectrum, the bandpass filtered noises can be effectively combined with other sonifications. They are easily Example 18: detected in the mix but do not disturb other signals much. For the same Harmonic and noisy sounds bandpass reasons as for the highpass filter, the bandpass filter is often ineffective for filtered sonification purposes when applied to harmonic material. Finally, it must again be remarked that these examples using subtractive synthesis are only the tip of the iceberg. Subtractive synthesis (combined with a huge number of different sound sources like oscillators, waveshaping, sampling, etc.) is the most common synthesis method found in software or hardware synthesizers. There are literally millions of sounds created with subtractive synthesis, most of them using one of the filter types discussed here (of which the lowpass is clearly the most common). So it must be pointed out that there is probably an effective sonification mapping using highpass filters, and there are many other filter types that can be used to arrive at usable results. The discussion given here rather serves as a hint by which means one can arrive at usable results most easily and why we think this is so.

4.4.2 Waveshaping and Frequency Modulation

In the previous section we discussed ways to change the timbre of a given sound source by filtering it. This section deals with common sound creation methods that allow altering the resulting spectra by modulating parameters of the underlying algorithm. These methods, waveshaping and frequency modulation(FM) are also commonly used in electronic music. The overview given here is loosely based on Puckette [2004, chap. 5], where a much larger number of methods is presented. As Puckette notes, FM can be regarded as 54 4. Continuous Parameter Mapping Sonification a special case of waveshaping. We will nevertheless maintain the distinction, as it is common practice in electronic music. Waveshaping, also known as non-linear synthesis or non-linear distortion, describes algorithms that distort a (pseudo-)periodic function g by a non- linear function f. s(t) = f(a · g(t)) The factor a is used to control the amplitude of the distorted signal. It gives a simple control over the resulting spectrum. This is the parameter we will map a (one dimensional) data stream to, so the resulting sonification mapping is: s(t) = f(x(t)g(t)) The non-linear function f is commonly referred to as the transfer function, the parameter a as the waveshaping index or just index for short.

Clipping The first use of waveshaping applied was that of simply clipping the signal. Hence, the transfer function is:   1, x ≥ 1 f(x): R → [−1, 1] = −1, x ≤ −1  x, otherwise The clipping of a signal is normally an undesired effect. Yet, this method of sound manipulation was chosen, as an expression of ”force” was sought for. The effect of clipping on an arbitrary signal is hard to describe in the frequency domain, though it’s effects are easily described in position space. The sharp clipping for all values > 1 or < −1 introduces new partials. Similar effects to clipping are commonly used in popular music. Especially electric guitars are often played with amplifier settings that distort the signal. As those amplifiers are analogue circuits, they do not simply clip the signal sharply, nevertheless the principle is similar. Clipping of course needs another signal to work on. A general purpose clipping module was built (see appendix A.4) that could be applied to any signal and the amount of clipping modulated smoothly. In a first test, clip- ping was applied to the lowpass-filtered sawtooth used in section 4.4.1 ( subtractive5 , appendix A.35). The incoming data stream was mapped to both cutoff and clipping.15 That way, more force resulted in a bright, aggressive sound. 15 The combination of filtering a sawtooth with a resonant lowpass and clipping was chosen as it is a typical sound used in popular electronic music. It is advisable to use sound structures possibly known to the recipient as this aides orientation. 4.4. Modulation of Timbre 55

Fig. 4.12: The effects of clipping on a signal consisting of two lowpass filtered saw waves at 220 Hz and 444 Hz. The spectrogram on the top shows the change of the spectrum subject to rising amplification of the signal and consecutive clipping. It also shows that the additional partials created by the clipping appear abruptly once the signal is amplified enough to be above or below the threshold of +/- 1. 56 4. Continuous Parameter Mapping Sonification

This method is useful to emphasize the effect of other sonifications by additionally applying clipping to an already modulated signal. Using it as the sole method for sound manipulation is problematic due to the hard to Example 19: predict nature of the resulting sound. Creating a mapping that results in a Clipping precise rendering of the incoming data’s features solely with clipping is hard, due to the non-linear nature of the results.

Ring Modulation The multiplication of two signals is called ring modulation. We did not use ring modulation by itself as a sonification method, but we need the properties of the resulting signal for the discussion of frequency modulation and the Phase Aligned Formant algorithm in the next sections. For these methods it is sufficient to limit our discussion to a special case of ring modulation where one of the two signals is a sinusoid with frequency ωc (called the carrier frequency) and the other is an arbitrary periodic signal with frequency ωm (called the modulator frequency). To make the formula easier to read, we will actually assume f to be 2π periodical and write f(ωmt). This results in the following formulation:

s(t): R → R = cos(ωct) · f(ωmt) We are interested in two properties of the resulting signal s: its frequency and its spectrum. We will first determine the frequency of the resulting signal. The result- ing signal is periodic if ωm and ωc are integer multiples of a common base frequency ω [Puckette, 2004, chap. 5.5]. I.e.: ωc = kω and ωm = mω with k and m relatively prime. If this is not the case, the resulting signal is not periodic. It is easy to see why. Assume that: ωm = kω and ωc = mω with k and m relatively prime. As cos(ω t) is 2π periodic it follows that cos(ω t) = c ωc c cos(ω (t + k 2π )) = cos(ω (t + 2π )). That means that cos(ω t) is also 2π - c ωc c ω c ω periodic. This follows in an analogous manner for f. So, cos(ωct) · f(ωmt) must have frequency ω. If the two have no such common base frequency, the result is not periodic. If this is the case, or the base frequency is below the human hearing range, the result sounds dissonant and “metallic”, as it has no harmonic spectrum. But how is the spectrum affected by ring modulation? Assume we have a harmonic signal f at base frequency ωm : ∞ X f(t): R → R = an cos(nωmt + φn) n=0 4.4. Modulation of Timbre 57

Fig. 4.13: The modulator and carrier before the ringmodulation. The modulator has a frequency of 50 Hz, the carrier a frequency of 400 Hz.

Fig. 4.14: The result of the ringmodulation. The modulator’s spectrum is shifted by the carrier frequency (green) and “mirrored around” the carrier fre- quency (blue). The carrier is just given as a reference, it is not present as a partial in the final signal. 58 4. Continuous Parameter Mapping Sonification

For simplicity’s sake, we will assume the phase offset φn of each partial to be zero. We will see that the harmonic signal we need for our discussion later on has that property. ∞ X f(t) = an cos(nωmt) n=0

When we multiply such a harmonic signal with a sinusoid at frequency ωc, we get:

∞ X cos(ωct) an cos(nωmt) = n=0 ∞ X an an cos((ω − nω )t) + cos((ω + nω )t) 2 c m 2 c m n=0

This signal basically consists of the original spectrum at half the original amplitude and offset by +ωc and a copy of it “mirrored” around ωc, also at half the original amplitude. Due to the symmetry of the cosine function one can also think of the resulting spectrum as two spectra with the same shape as in the original signal, each at half the amplitude of the original spectrum and offset by +ωc and −ωc respectively. It is important to remember that cos(−x) = cos(x), so partials with a “negative” frequency “wrap around” at DC. That is why, depending on the values for ωc and ωm it is sometimes helpful to think of the resulting spectrum as the sum of two slightly offset spectra and sometimes as the sum of the original spectrum and its “mirrored” version. We now see where the carrier and the modulator got their names: the carrier “carries” a signal to a certain frequency (the spectrum is now centered around ωc) and the modulator “modulates” the carrier to create a certain spectral shape.

Frequency Modulation The idea of frequency modulation (FM) is to modulate the frequency of a waveform at a speed that causes a distortion of the original waveform, creating new spectra. When implementing FM digitally, usually the phase and not the frequency is modulated [Puckette, 2006, chap. 5.4]:

s(t) = cos(a cos(ωmx) + ωcx) (4.1)

This leads to similar results as modulating the frequency, yet the implemen- tation and mathematical formulation is much easier. This is the formula used 4.4. Modulation of Timbre 59 for the present examples (see appendix A.11). Though we actually modulate the phase, not the frequency, we will nevertheless talk about FM, as it is the name generally used for this technique. There are much more complex cases of FM that use more complex substitutions and also multiplication of signals. The probably best known implementation of FM is the Yamaha DX 7 synthesizer. In figure 4.15 we see the effect of the modulation index a on the resulting waveform. Higher values cause a more intensive modulation of the original cosine wave which creates an more complex waveform and thus a more complex spectrum.

Fig. 4.15: The effect of different modulation indices a on the basic cosine wave. Higher values lead to a stronger distortion of the waveform.

We will limit our discussion of FM to the simple case given in formula 4.1. As stated previously, frequency modulation (FM) can be regarded as a special case of waveshaping. To do so, Puckette rewrites formula 4.1 as follows:

s(t) = cos(ωct) · cos(a cos(ωmt)) − sin(ωct) · sin(a cos(ωmt)) That way, we can regard our original formula as the sum16 of two ring mod- ulated wave shaping functions. What frequency has the resulting signal? Obviously, the frequency of the first term is the same as that of the second, so all we need to find out is the frequency created by the ring modulation. From the previous section we know that that if ωc and ωm have a common base frequency, i.e. they are both integer multiples of some frequency ω, the resulting signal has frequency ω. That places the same constraints on FM as on ring modulation: if the result is to be a harmonic signal, ωc and ωm should be chosen so that ωc = kω 16 The minus simply inverts the phase of the second signal 60 4. Continuous Parameter Mapping Sonification

and ωm = mω with k, m ∈ N. This is why the FM module (appendix A.11) was built in a way that a base frequency and multipliers for ωc and ωm are chosen. We now know how to create a harmonic signal whose spectrum’s intensity we can control by a single variable a using FM. This leaves the discussion of the resulting spectrum. Obviously it is the sum of the two ringmodulated signals. Each of these two spectra is in turn the sum of two offset copies of the spectrum created by cos(a cos(ωmt)) and sin(a cos(ωmt)) respectively. The formula for the partials of these two waveshaping signals is rather complex. For a general overview it suffices to say that cos(a cos(ωmt)) contains only odd partials, sin(a cos(ωmt)) only even partials. The strength of the n-th partial is defined by the the Bessel function Jn(a) [Puckette, 2006, chap. 5.5.6]. After ring modulation, the resulting total signal has partials at ωc +nωm and at ωc − nωm, where the intensity of the partials decreases with higher values of n.

Fig. 4.16: The intensity of the first 20 partials depending on the modulation index a. The frequencies where chosen so that ωm = 3 · ωc. So the green partials are those positioned at ωc + nωm, the blue partials are placed at ωc − nωm. The red partials are constantly zero. The graphs of the partials are the Bessel functions of the first kind.

Figure 4.16 shows the development of the partials subject to higher values of a. As can be seen, a higher value of a introduces more partials. Example 20: In that way the effect of this simple case of FM can be compared to that Harmonic and non of a lowpass filter. But the spectrum plot also shows that the intensity of harmonic FM sound the partials oscillates, unlike a simple lowpass filtered signal. This is because the partials’ amplitudes are controlled by the Bessel functions of the first 4.4. Modulation of Timbre 61 kind. So the sound not only becomes constantly brighter but also constantly changes in character. While this is one of the desired aspects of FM when used in electronic music —the ability to create complex, changing spectra— it makes the use of FM for sonification harder. The difficulty lies in creating mappings of data streams to a in a way that the change in timbre is perceived as a 1:1 mapping of the data stream. Nevertheless, FM created results that were subjectively pleasant and seemed well traceable. hen using FM one must also be ware of aliasing artifacts. As the spectrum contains many high partials even for “modest” values of a, one runs the risk of really unpleasant foldover artifacts. Unlike additive synthesis, this is harder to prevent here. The only reliable method would be to create the signal at a high sampling rate and the apply a low pass filter before downsampling it to the output sampling frequency. In our experiments, we simply took care to limit the values of a appropriately. This however must be be done independently for each setting of ωc and ωm.

4.4.3 Formants & Vocal Sounds In section 4.2 we discussed the importance of formants when perceiving the timbre of a sound. As they are so well detected by the human ear -speech recognition relies heavily on recognizing formants- they are an obvious target for sonifications modulating the timbre of a sound.

The PAF Algorithm Puckette [1995] describes a computationally cheap method to create sounds with user definable formants, the PAF (Phase Aligned Formant) algorithm. This in-depth presentation of the PAF algorithm explains in detail the com- putational efficiency, numerically robustness and the mathematical back- ground of the method. We will however base our discussion of the PAF algorithm on Puckette [2006, chap. 6.4], as it is more intuitive. We saw in section 4.4.2 that we can shift a spectrum using ring modula- tion. If we want to do this continuously however, we will create a dissonant sound as long as the frequency of the sinusoid we use to shift the spectrum and the spectrum’s base frequency do not share a common base frequency. Puckette suggests a simple yet efficient solution to this. Instead of multi- plying with a sinusoid, we multiply with a weighted sum c (the carrier signal) of two sinusoids that are each at an integer multiple of ωm:

c(x): R → R = p cos(kωmx) + q cos((k + 1)ωmx)

The integer multiple k and the weights p and q are chosen so that (k+q)·ωm = ωc and p + q = 1. The spectral center of mass of the two sinusoids is still 62 4. Continuous Parameter Mapping Sonification

placed at the carrier frequency ωc that way but the ring modulation result only contains partials that are integer multiples of the base frequency:

∞ X (p cos(kωmx) + q cos((k + 1)ωmx)) an cos(nωmx) = n=0 ∞ X pan (cos((k + n)ω ) + cos((k − n)ω )) + 2 m m n=0 qa n (cos((k + 1 + n)ω ) + cos((k + 1 − n)ω )) 2 m m This can be interpreted as four superimposed spectra which are all harmonic p P∞ with respect to the fundamental frequency ωm. The sum 2 n=0 an cos((k + n)ωm) is the original spectrum shifted to the k-th harmonic and weighed p p P∞ with 2 . The sum 2 n=0 an cos((k − n)ωm) is the same but mirrored at the k-th harmonic. The two other sums are the same just one partial higher and q weighed with 2 . This means that we interpolate between two shifted and shifted and mirrored versions of the original spectrum. Puckette calls this method mov- able ring modulation. If we now choose a modulator signal that has a peak at the 0-th fundamental and whose other partials drop monoto- nously, we get a sound whose for- mant position can be controlled by the carrier frequency. Ideally, this formant signal should be parame- terized in a way that allows us to control the width of the formant. Puckette gives a number of func- Fig. 4.17: The spectrum of the modulator tions that have this property, of used for the PAF algorithm. which we use the following waveshaping function: f(x) = ea(cos(xωm)−1). The bandwidth of the formant can be controlled with the factor a. If a is zero, we get a constant signal, i.e.: one without partials except at frequency zero. For higher values of a, the partials spread out and create half of a bell shape (see fig. 4.17). This results in the following formulation of the PAF algorithm:

a(cos(xωm)−1) P AF (x) = e · (p cos(kωmx) + q cos((k + 1)ωmx) This algorithm allows us to easily create sounds with a constant pitch but changeable formant position. As all partials in the resulting sound are phase aligned (hence the name of the algorithm), we can easily combine the 4.4. Modulation of Timbre 63

Fig. 4.18: Example of a formant created by the PAF algorithm. The base fre- quency is 200 Hz, the formant center frequency 1430 Hz. The first and second spectra are combined to the final result on the right. The first spectrum is centered around 1400 Hz, the second, much quieter spec- trum around 1600 Hz.

output of multiple PAF generators (as long as they are controlled by the same underlying oscillator which determines the frequency ωm) and build up complex spectra. In this case, the resulting spectra are easily predictable from the parameter values. This is a strong improvement to other waveshaping approaches, which are also able to create rich spectra, but which are hard to predict and control.

Puckette gives a lot more implementation details that make the calcu- lation both fast and robust. He synchronizes formant frequency updates to the start of a phase to avoid artifacts created by discontinuities. An effi- cient wavetable lookup, based on a reformulation of the modulator makes the calculation more efficient. These details, though important from an imple- mentation point of view, are not necessary for the evaluation in the present context. It suffices to say that the PAF algorithm is both elegant and fast. The implementation used was a slightly modified version of the example im- plementation already shipped with Pd. The example was basically broken down into smaller modules to allow for an easier reuse in the context of the next section. Apart from that, some GUI controls and the ability to store settings were added (see appendix A.17).

The PAF algorithm proved to be quite useful even in it’s simplest use, a sound with a single formant. The predictable behavior when combining multiple formants using the PAF algorithm makes it promising for more Example 21: complex mappings. Multiple streams can be mapped to various formants of Formant sound created by the PAF a single sound. This of course would degrade the detectability of a single algorithm stream, but may give an intuitive overall impression of a subset of our data. One could for example combine the velocity and acceleration of a limb in a single sound whose two formants are controlled by the two quantities. 64 4. Continuous Parameter Mapping Sonification

Creating Vowel Sounds Vowels are an important part of every human language. By using the PAF al- gorithm, one can create a simple mapping of a data stream to vowel sounds. Instead of directly modulating the position of formants, we create certain formant structures that are present in human vowels. Then we use a data stream to interpolate between two such settings in order to smoothly inter- polate between to vowels. Lee et al. [2005] present a similar idea, using a physical modelling approach to simulate the frequency response of the human speech tract. The PAF algorithm could be easily used to create vowel-like sounds. Three PAF modules using the same underlying oscillator to keep them in Example 22: phase were used to create a static sound resembling one vowel. To do so, each Vowels created by PAF module was tuned to a different formant center frequency. Basically the PAF algorithm each vowel was represented by a three-dimensional vector. This implemen- tation was easily done according to [Puckette, 2006, chap. 6], a description of the module can be found in appendix A.18. The incoming data stream then was used to linearly interpolate between two of four selectable vowel settings(“Aah”, “Eeh”, “Ow” and “Ooh”). To do so, the vectors of the two respective vowels were simply linearly interpolated.

4.4.4 The Tristimulus Model Fox and Carlile [2005] present “Sonimime”, a system that turns movement of the hand into immediate acoustic feedback. The sound rendering engine used was based on [Riley, 2004] which is an generative implementation of the so- called “tristimulus model”, an approach to explain human timbre perception. The tristimulus model stems from the field of human vision where it is used to explain human color recognition. The human eye contains three kinds of receptors for color, each of which detects a different wavelength. Every color perceivable by the human eye can thus be encoded as a three dimensional value (e.g. RGB (red-green-blue) values). Pollard and Jansson [1982] introduced the tristimulus model as a method for timbre classifica- tion. When applied to timbre perception, the tristimulus model makes the same assumption: timbre is a three dimensional quantity. It states that the decision making process of the brain for timbre evaluation is based on the intensities of three distinct bands of partials in a harmonic sound: the fundamental, second to fourth harmonic and fifth to n-th harmonic. Riley [2004] turned this assumption into a generative method for audio synthesis. As his implementation was also done in Pd, the results could easily be used in the present technical context. The implementation is based 4.4. Modulation of Timbre 65 on additive synthesis. This means that the resulting signal is created by summing up sinusoids with varying amplitudes. The basic tristimulus model results in the following formulation:

4 20 X X tristim(t, ω) = l · cos(ωt) + m cos(iωt) + h cos(jωt) (4.2) i=2 j=5 Where l, m and h are the amplitudes of the fundamental, middle and high partials, ω the desired frequency in radian and t a point in time. Riley used a maximum of 20 partials for synthesis. This is sufficient for most applications as already sounds with a fundamental frequency of 100 Hz have their highest partial at 20000 Hz, the upper limit of the frequency spectrum perceivable by man.

Fig. 4.19: The partials for l = 0.5, m = 0.3, h = 0.2, d = 0.7, e = 0.3 in the extended tristimulus model. Every second partial is a muted due to the low e value, which creates a nasal, square-wave-like sound. The d-value < 0 damps the higher partials within the mid and high band.

Riley added further parameters to the model, as the basic model proved to be very limited in the spectra it could produce. Riley notes: “This problem stems from the fact that the tristimulus method was originally an analysis method; it simply measures the energy contained in each of the critical bands and has no concern as to the distribution of energy in these bands.” [Riley, 2004, p.21]. To expand the capabilities of the model, Riley added a decay factor d to the model, that damped the amplitude of higher partials within a band. He further added a parameter e that controls the strength of the even partials independently, allowing for spectra that contain strong odd partials. This results in the following formulation for the extended tristimulus model:

4 X tristim(t, ω) = l · cos(ωt) +m (i + 1 mod 2)e cos(iωt)di−2 + i=2 20 X +h (j + 1 mod 2)e cos(jωt)dj−5 (4.3) j=5 66 4. Continuous Parameter Mapping Sonification

Riley also added three noise bands with center frequencies corresponding to the approximate middle frequencies ω, 3ω and 9ω of the three harmonic bands. The amplitudes of the bands corresponded to l, m and h. This method was also implemented for this work, but a discussion of it will be omitted for brevity, as similar results can be achieved with the method de- scribed in section 4.4.1.

Controlling the Tristimulus Model The timbre of a sound does not change if we scale the overall amplitude of it. This means that there are different values for l, m and h that create the same timbre, just at different amplitudes. 17 Therefore, if we just want to modulate timbre and keep amplitude constant, we have only two degrees of freedom in the basic tristimulus model. Riley used the following equation in his Pd patches to deduce the amplitude for the fundamental:

l + m + h = 1 =⇒ l = 1 − m − h

That way, each timbre created by the tristimulus model can be plotted as a point within a triangle using l, m and h as barycentric coordinates. Riley used a triangle shaped input controller as input for the amplitudes of the bands in his implementation.

Fig. 4.20: The partials for l = 0.5, m = 0.3, h = 0.2 in the basic tristimulus model. The high number of high partials over the other partials makes the high band sound louder in total than the low band, which is supposed to have the largest impact on the overall sound.

There is however a problem with the way Riley calculates the relative amplitudes of the three bands in his implementation. After the model was re- Example 23: implemented according to Riley’s specifications (appendix A.37), first tests Tristimulus sound showed that the amplitude was not constant over the whole area of the with changing amplitude triangle control. It is easy to see why: there are 16 high partials, 3 medium 17 This can be compared to color perception, where the same color exists for different brightness levels. 4.4. Modulation of Timbre 67 partials and one fundamental. If one raises the value for h, the value for the fundamental is automatically lowered. As there are many more partials in the high band than in the medium and low bands, the overall amplitude of the sound increases, which is what we wanted to prevent by using the barycentric coordinates. The values l, m and h are not the energy in the individual bands, they are just multipliers for the partials. But how do we create the proper multipliers for the bands so that we get a constant overall amplitude and do have the correct relative strengths of the three bands? To answer this question, we need the definition of RMS amplitude, which, as we have stated in section 2.1.3, is a measure for amplitude that closely resembles human hearing: s Z p 1 2 ARMS(s(x)) = s(x) dx (2.2) p 0 We now need to calculate the amplitude of a tristimulus sound subject to l, m and h. To do so, we will first determine the RMS amplitude of an harmonic signal f with a finite number of partials:

n X f(ω) := ai cos(iω + φi) (4.4) i=1

Inserting this into equation 2.2 gives us: v u " n #2 u 1 Z 2π X A (f(ω)) = t a cos(iω + φ ) dω RMS 2π i i 0 i=1 v u 1 X Z 2π = u a cos(iω + φ )a cos(jω + φ )dω(4.5) t2π i i j j i,j∈[0,n] 0

To solve this, we recall that

1 cos(a) cos(b) = (cos(a − b) + cos(a + b)) (4.6) 2

That way, we get for all i 6= j:

R 2π 0 ai cos(iω + φi)aj cos(jω + φj)dω = aiaj R 2π = 2 0 cos((i − j)ω + φi + φj) + cos((i + j)ω + φi + φj)dω = = 0 (4.7) 68 4. Continuous Parameter Mapping Sonification

This is because all the cosines have frequencies which are integer multiples of the base frequency, and integrating over a full period of the fundamental yields zero. This leaves the products where i = j:

Z 2π 2 Z 2π Z 2π  2 2 ai ai cos (iω + φi)dω = cos(0)dω + cos(2iω + 2φidω) 0 2 0 0 a2 = i ω|2π + 0) 2 0 2 = ai π (4.8)

If we now apply 4.7 and 4.8 to equation 4.5 for the RMS amplitude of a harmonic signal with a finite number of partials, we get: v u 1 X Z 2π u a cos(iω + φ )a cos(jω + φ )dω t2π i i j j i,j∈[0,n] 0 v u n 2π (4.7) Z u 1 X 2 2 = t a cos (iω + φi)dω 2π i i=0 0 v u n (4.8) u 1 X = t a2π 2π i i=0 r Pn a2 = i=0 i (4.9) 2

This is the desired RMS amplitude for a general sound with a finite number of harmonics. If we apply 4.9 to the basic tristimulus model, we get: r l2 + 3m2 + 16l2 A (tristim(t, ω)) = (4.10) RMS 2

As our amplitude is to remain at a constant value, we can ignore the root and division by 2.

1 = l2 + 3m2 + 16l2 (4.11)

The total value of the tristimulus model is not important, as long as it remains constant (we can simply scale it to a desired value later on). So now assume we want the three bands to contribute to the total signal with the 4.4. Modulation of Timbre 69

Fig. 4.21: The partials for l0 = 0.5, m0 = 0.3, h0 = 0.2 in the basic tristimulus model with correct factors for the partials. Here the correct handling of the barycentric coordinates raises the multiplier for the fundamental significantly, so it does indeed provide half the loudness of the total signal. Due to their large number, the multiplier for each high partial is smaller than l0. relative weights l0, m0 and h0, with l0 + m0 + h0 = 1. We then get the correct multipliers for our three bands: √ l = l0 (4.12) rm0 m = (4.13) 3 r h0 h = (4.14) 16 The implementation in Pd yielded the desired results: the amplitude re- mained constant for all l0 + m0 + h0 = 1. This however failed to work, if d 6= 1 or e 6= 1. We have to adapt our formulas for m and l to the extended tristimulus model (l is not affected by d and e): r m0 m = (4.15) e2 + d2 + d4e2 s h0 h = (4.16) 2 2 P7 4i 2 4i 2 1 + d e + i=1(d + d d e ) These factors finally lead to a normalized amplitude over the whole range of valid values for l0, m0 and h0 for any value of e and d. This also improved the overall results of model. Changes in timbre when using the correct factors Example 24: where much more pronounced. The audible effect can be compared to that Normalized tristimulus model of improved contrast when talking about colors.

Implementation As we are interested in a continuous sonification of a movement, the model by Riley was reduced to the sound generation. The MIDI implementation was 70 4. Continuous Parameter Mapping Sonification

Fig. 4.22: The partials for l0 = 0.5, l0 = 0.3, l0 = 0.2, d = 0.7, e = 0.3 in the extended tristimulus model with correct factors for the partials. The amplitude of the fundamental does not differ from the one in fig. 4.21. The amplitude of the first partial in the high band is much higher than the one in the basic model. This compensates for the loss of power due to the damping of higher partials and even partials in the band. left out, as was the generation of amplitude envelopes. The implementation was done according to the Pd graphs given by Riley, with some modifica- tions. The 20 individual patches for each harmonic were replaced by a single abstraction that got the number of the partial as a constructor argument. Instead of multiplying each individual partial frequency with an incoming vibrato18 wave, the frequency input to the partials was already an audio signal instead of Pd messages. That way, the multiplication with a modula- tor wave can be done once globally instead of individually for every partial. This design also allows for another audio signal than a built-in LFO to be used as modulator, e.g. to multiply the tristimulus model frequency with another audio signal to achieve complex frequency modulation (see section 19). The noise generation was implemented as described in the original pa- per. A barycentric GUI controller was also implemented (appendix A.38). The present implementation however uses a equilateral instead of a perpen- dicular triangle to calculate the barycentric coordinates. As all three sides have now the same length, a change in position always results in the same “amount” of timbre change, as long as the start and end point have the same distance. The improved handling of partial amplitudes described in the previous section was of course also used.

4.5 Spatial Positioning

Throughout the experiments with the synthesis methods described in the previous sections it has turned out that individual audio streams could be

18 “Vibrato” is a periodic, slow (0-10 Hz) change in pitch that is generally realized by modulating the frequency input with a Low Frequency Oscillator (LFO). 4.6. Maintaining a Constant Amplitude 71 better kept apart if they were given different positions in the stereo panorama. The results in Wenzel [1994] suggest that even better results can be achieved when not only stereo position but spatial synthesis methods for true 3D positioning of sound sources is used. Especially in the context of motion data sonifications, where the data stems from changing positions of the human limbs, dynamically coupling the 3D position of a sound to properties of the data stream could be promising. For our walking motion for example, one could render the speed of the measured joints with one of the presented methods and additionally measure the spatial position of these joints with respect to the test person’s head. If he listens to the sonification in realtime over wireless headphones, the sound sources could be virtually positioned to correspond to the true position of the measurement points.

4.6 Maintaining a Constant Amplitude

All sonifications that change the timbre of a signal should attempt to keep the amplitude as constant as possible. This has several advantages:

• Many signals with constant amplitude can be easier mixed. The au- dio results created during a sonification session are more predictable, clipping becomes less of a problem.

• Less masking effects occur. If the amplitudes of the signals stay con- stant, one of the most prominent masking effects is removed.

• The features of the data modulating the sounds becomes clearer. An additional change in amplitude may distort the perceived features of the data stream.

There are several ways to achieve a constant amplitude. One way is to analyze the audio generation method and alter it in a way that guarantees a constant RMS amplitude. This was done in case of the tristimulus model. Another simple yet effective way is to measure the amplitude of the out- put of a certain method. If a simple characteristic can be deduced from that (e.g. a linear dependency of the incoming data), the respective module can be amplitude modulated by the same data stream that it is fed for changing the timbre. To normalize the audio output, the inverse of the amplitude charac- teristic determined in the measuring step is used to map the data stream to the amplitude modulation. This was done using the norm mapping object described in appendix A.16. The last way is to use a so called “compressor”. This is the method used in professional audio productions, as the other two methods can obviously 72 4. Continuous Parameter Mapping Sonification not be applied to a sung voice or an instrument played by a human. A compressor constantly measures the RMS or peak amplitude of a signal and damps it by a certain factor (the compression ratio) if it is higher than a certain threshold. Compressors where not used throughout this work, as either the first two methods could be applied or amplitude changes were neglectable.

4.7 Summary

We have presented a number of methods to create parameter mapping soni- fications. First, we have reviewed a number of simple methods and rules of thumb to improve the quality of pitch based methods. We have studied the problem of formant shift and suggested a simple method that allows us to use the large number of professional sample collections available for music production. We have briefly looked into amplitude based methods and found an interesting mapping that helps to improve results. We have found that by changing the timbre instead of the pitch of a sound we could create a number of mappings that do not create such a stressing impression for the listener. The presented methods are mostly standard procedures used in sound design and electronic music. They could be imple- mented or reused from the vast set of modules available in Pd which kept implementation time at a minimum. With some of these mappings, the frequency detection of the ear could still be used to trace the data. The filtering of sound, especially in com- bination with high resonance or small filterbands, created a well traceable characteristic over the frequency spectrum. With the use of the PAF algo- rithm for sonification, another method for circumventing a pitch mapping was available that creates easily controllable and predictable results. Looking into non-linear synthesis, we found a vast field of powerful meth- ods to create a large variety of sounds. Nonlinear synthesis is such a vast field that “Sonification Using Nonlinear Synthesis” justifies separate research in its own right. The short overview together with the two examples serves to show the advantages of nonlinear synthesis: it is a simple and computa- tionally cheap method to create complex, rich spectra that can be modulated over a wide range of timbres. But we also hinted at the drawbacks of these methods: often the results are hard to predict. A lot of trial and error is necessary to arrive at a specific sound. This is a flaw found in many syn- thesizers using such methods. A simple yet efficient mapping is easier to create with the other methods presented here. The discussion of the PAF algorithm however has shown that such methods can also be used to create 4.7. Summary 73 easily controllable methods. Still the vast majority of waveshaping sounds seems to be hard to control. Finally we have seen a way of using additive synthesis to create vary- Example 25: ing timbres. The presented method, the tristimulus model, claimed to be Tristimulus compared to able to create a vast array of sounds. Yet, results where rather disappoint- sawtooth and ing. Even with the slight improvements made to the calculation of coefficients square wave the tristimulus model could not produce significant variations in timbre. The adjustment of the “even” parameter simply led to interpolating between a sawtooth and square wave like sound. The decay factor is basically an ad- Example 26: justment of brightness that produces less pronounced affects than a resonant Tristimulus filter. The amplitude of low, mid and high partials is more reminiscent of a attenuation change of equalizer settings than of a true change in timbre. compared to lowpass filter

Example 27: Tristimulus bands compared to equalizers 74 4. Continuous Parameter Mapping Sonification 5. APPLICATIONS

In this chapter we present applications of the methods discussed in the previ- ous chapter to two kinds of motion data: the data captured from the rowing machine and the motion tracking data from a walking person. The sonification methods used in this work were tested with two sets of data. The first is motion capture data from a walking person. The measured values are the velocities of the ankles and wrists. The second data is cap- tured from the sensors of a sensor equipped rowing machine. The measured data values are the position of the seat, the distance by which the handle is pulled and the forces applied to handle and footrest. Both motions have in common that they are cyclic, i.e. their data streams are pseudo-periodic. An interesting feature of the walking data is that it is symmetrical: the streams for the left arm and leg resemble those for the right side. It is interesting to hear how this symmetry creates a recognizable pattern in the audio result. It must be noted that the evaluations given here are highly individual. The perception of the various methods may differ from person to person. The ef- fectiveness as far as human perception is concerned is still subject to scientific evaluation. Nevertheless the first impressions given here may give valuable information as to which methods deserve more attention and study in the future and which do not. They should be treated as “educated guesses”. The presented sonifications share certain properties and settings. The audio streams are spread in the stereo panorama, as first tests showed that this improves detectability of the individual streams. The sonifications of the walking motion all map the streams for the left limbs to the left and the right limbs to the right. The rowing motions streams where spread evenly, separat- ing the streams using the same sonification methods to improve detectability. The sonifications mix different kinds of algorithms, as a mixture of different timbres is easier to differentiate. That way, the individual gestalts stand out better in the sum. However, not every data stream was mapped to a unique method. Both movements’ data streams could be divided into two groups. For the rowing motion, this was forces and distances, for the walking motion this was the speeds of wrists and ankles. Every group was given the same type of sonification. This was done to emphasize similarities in the data (the left wrist is “similar” to the right wrist). To be able to keep the streams 76 5. Applications

apart, they where not only separated in the stereo panorama but also by the basic settings (if available) of the respective methods.1 For sounds that have a recognizable pitch (i.e. a harmonic spectrum) the intervals between them have to be chosen with care. Harmonic intervals (unison, octave, fifth, third) sound more pleasant but tend to “merge” into one sound easier. Slightly dissonant intervals (minor/major seventh, sixth, second) sound less pleasant but are easier to discren. 2

Example 28: Rowing Sonification Using a Purely Pitch Based Approach Rowing sonification using a purely pitch • both distances and forces mapped to pitch based approach • all four streams positioned at the center of the stereo panorama

The first example given here was not created by the presented work. It is the result of a native C++ application written before (see [Effenberg et al., 2005]) and serves as a reference for the aesthetic quality of the new approach. The basic data is the rowing motion. As far as feature detection is concerned, this sonification serves it’s purpose. The drawback of this method is that it is straining to hear after a short while. The pitch modulation is stressing and unpleasant. The following examples are to be evaluated with regard to this first one.

5.1 Rowing Motion Sonifications

Rowing Sonification Using Pitch and Lowpass Filtering Example 29: Rowing sonification • The distances are mapped to the pitch of two professional grade mul- using pitch and tisamples. The sounds chosen where that of a cello and a boys choir. lowpass filtering • The forces are mapped to the cutoff of a filtered sawtooth wave.

Both forces are mapped to parameters changing the timbre, which is to em- phasize their similarity (they are in the same “class” of data). More force results in brighter, more intensive sounds. To better keep the two sounds apart, they are separated by two octaves in base pitch. The filtering approach works well, for future experiments the use of different basic waveforms is rec- ommended to keep them better apart (e.g. sawtooth and squarewave). The

1 This shows the difficulty of finding a proper selection of sonification mappings and settings: we want to emphasize similarities but also be able to keep the individual streams apart. The presented mappings are compromises between those two constraints. 2 It must be noted here that the perception of dissonance of course highly depends on the musical education, tastes and social background of the recipient. 5.2. Walking Motion Sonifications 77 multisample based pitch mapping indeed creates better results. The mul- tisamples used here provide four samples per octave. This greatly reduced the “chipmunk” effect in the choir sound. Even denser maps are expected to yield better results. By using two pitch and two timbre based methods instead of four pitch mappings, the overall impression is further improved. The pitch range was limited to one octave, which suffices to convey details in the stream.

Example 30: Rowing Sonification Using Bandpass Filtered Noise and the Tristimulus Rowing sonification Model using bandpass filtering and the • The distances are mapped to the center frequencies of bandpass filters tristimulus model filtering white noise.

• The forces are mapped to various parameters of the tristimulus model.

Using a mixture of noisy and harmonic sounds greatly helps keeping the streams apart. The bandpass filtered noise, though a rather simple method, proved highly effective. If a narrow enough passband is chosen, the position of the band is very easily detectable. The mapping uses the fine pitch res- olution of the human ear without the stressing effects of pitch modulation. Low- and highpass filtered white noise was not used as first tests showed that it tends to “hide” other streams in the mix. The very narrow spectrum of the bandpass filtered noise does not interfere with other audio streams so much. The tristimulus method proved to be somewhat disappointing. In a first attempt we used just one tristimulus generator and tried to map one force to the middle and the other to the high partials. In the resulting sound the individual streams where no longer detectable as such. The present ex- ample uses two tristimulus generators, in addition to the partials the relative strength of the even partials is also used as a modulation target. The re- sult is nevertheless disappointing. Even with the discussed modifications, the tristimulus model is nothing more than a sawtooth / square wave - like waveform (depending on the “even” setting) with equalizers for the middle and high frequencies. It is far from providing a large range of timbres.

5.2 Walking Motion Sonifications

Walking Sonification Using the PAF Algorithm and Pink Noise Example 31: • The wrist speeds where mapped to the center frequencies of PAF gen- Walking erators creating a single formant each. sonification using the PAF algorithm and pink noise 78 5. Applications

• The ankle speeds where mapped to the amplitude of pink noise.

This patch works without any direct pitch mapping. Again, the noisy sounds can be easily discerned from the harmonic ones, which improves the percep- tion of the data features. The mapping to the formant frequency is well detectable like the pitch change in the first example, but as the the effect is created by emphasizing different partials without changing the base fre- quency the siren-like character of the old pitch mappings is avoided.

Example 32: Walking Sonification Using Vowel Sounds and Lowpass Filtered Samples Walking sonification using • The wrist speedsare used to interpolate between the vowels “Aah” and vowel sounds and “Eeh” lowpass filtered samples • The ankle speeds are mapped to the cutoff frequency of lowpass filtered samples. The samples are choir and string samples of the Mellotron.3

Because the vowels used for the ankle speeds are the same for the left and right side, the movement of the ankles can be compared easily. Yet they can still be both independently identified because of different stereo positions. The interpolation of the vowels does not create as clearly detectable features in the result as the modulation of a single formant. In case of harmonic sounds being filtered, lowpass filters proved more useful than the other filter types. The “hiding effect” that occurred when using lowpass filtered white noise in combination with other audio streams did not occur here. This is (probably) due to the less dense spectra of harmonic sounds. By setting the resonance of the filter high, the partials around the cutoff frequency where emphasized which helped to track the cutoff position. By using samples with very long (2-3 seconds) loops, the sound could also be improved as they sound less artificial and pressing.

Example 33: Walking sonification using FM synthesis Walking sonification using • All four channels are mapped to the modulation index of an FM sound. FM synthesis The settings of the carrier and modulation frequency are different for the ankle and wrist data streams, which gives them a slightly different sound character. The four FM modules are pitched one fifth apart, beginning at MIDI pitch 31. The ankle data streams are mapped to the two low sounds, the wrist data streams to the high sounds. Though all streams are sonified

3 The Mellotron is a popular vintage instrument from the 60s and 70s that used tape bands with recorded sounds for playback. It was used, among many others, by the Beatles. 5.2. Walking Motion Sonifications 79 using the same method, they can still be told apart due to their different pitches, stereo position and FM parameters. 80 5. Applications 6. A MORE MUSICAL APPROACH

Up to now our sonifications turned motion data into sound. We have dis- cussed a number of methods with which we hoped to improve the subjective quality of the results while maintaining a high degree of information trans- fer. In this chapter we will try to take this one step further and get closer to creating actual musical structures.

6.1 A “Musical” Sonification?

The problem with many sonifications is that they can tend to be stressing to hear. Ideally a sonification should not only be bearable but pleasing in itself. This would encourage the recipient to listen more carefully, to pay closer attention to details and thus better understand the underlying data, which is the actual purpose of the sonification process. When thinking along these lines one is of course inclined to ask: ‘Why not represent the data through music?’. For most people, music is the most pleasant form of sound. We listen to music for recreation, as an intellectual stimulus and many more reasons. Music plays an important part in all cultures and we are used to listen to music from childhood on. But trying to transform datastreams to pieces of music poses numerous problems. First and foremost there is a general problem: Which kind of music? Is there a “correct” musical representation of a given dataset? And perhaps even more substantial: What actually is music? Allegedly, polyphonic western music from the 15th to the late 19th cen- tury (and in case of neoclassicism maybe the first half of the 20th century) is still regarded by many (even “experts”) as the definition of music. One could easily apply parameter mapping to create “free stochastic music” as suggested (and realized) by Xenakis: simply map a parameter to the variance of a stochastic process that creates successive soundscreens1. This, however will not be seen by many as a“musical” piece. When applying the concepts of serial music, one could control the evolution of a series by the data pro-

1 We will not delve further into a description of soundscreens here. An in depth discus- sion can be found in Xenakis [1971]. 82 6. A More Musical Approach vided by a single stream. However, this again would not be regarded as a “proper” piece of music by many. Interestingly, this very question (“What is music?”) that is necessary at this point to achieve a qualitative description of a certain sonification technique is a central topic of discussion in current musicology, as the answer defines the material that is subject to the whole discipline itself. Regarding music only from a limited cultural point of view is a dead end when dealing with music in general, as it singles out many achievements of human culture beyond that narrow scope as not worthy of discussion. Then again a general theory of musical sonification that would apply to any mu- sical genre is simply beyond our -and probably anyones(?)- scope. So we do have to narrow down our discussion to certain musical genres in order to arrive at usable definitions for what we want to achieve. But it is impor- tant to remember that the following decisions and approaches according to a definition of what is musically “right” present just one of many approaches to music and do not claim to be universally correct. We will use the mu- sical paradigms here that we are most used to in order to arrive at usable results. An approach from another cultural background may yield substan- tially different results. It must be further noted that the following attempts at musical sonifications just scratch the surface of musical possibilities even for the narrow view of harmonic, polyphonic western European music. The following discussion of scales and harmony are greatly oversimplified, as a thorough discussion of this matter is not possible within the scope of this pa- per. The interested reader is recommended the books of de la Motte ([de la Motte, 1976], [de la Motte, 1981]) and Salmen and Schneider [1987]

6.2 Applying Sonification to Paradigms of Western European Music

Pitch is the dominating quantity in western European music. The pitches which are used by a certain piece are chosen from a musical scale that un- derlies the composition.2 The set from which the available notes are chosen is called the key of the piece. This could for example be C major. The key determines which notes within the range of one octave are available. By the compositional means of modulation this key can change in the progress of a piece. With the invention of the equally tempered scale3 in the late 17th century, the possibilities of modulation where greatly expanded. However,

2 Note that the concept of a scale is not exclusive to this kind of music. 3 The equally tempered scale introduces slight errors in the intervals between notes. Those are hardly detectable but allowed each of the scales used to that point to be ex- 6.2. Applying Sonification to Paradigms of Western European Music 83

Fig. 6.1: The C major scale in the one-lined octave. Western European scales are periodic: you get the frequencies for the next octave by multiplying them by 2. Thus by defining the scale for one octave, all available notes over the whole frequency spectrum are defined. one cannot simply concatenate any selection of notes from a scale and arrive at a proper piece of music. Depending on the genre, different complex rules for constructing melodies apply. Again we cannot go into detail here and we will not attempt generating sonifications that adhere to a certain set of melodic rules. The usage of pitch in western music is further coupled with the notion of harmony. At any given point, most musical pieces present a certain har- monic context. When we take any of the notes in our scale as a basis and stack certain other notes from the scale above it (i.e. letting them sound at once) we get a chord. Again depending on the era and genre of music we are dealing with, a certain subset of these chords is applicable for our music. It is a characteristic of western European music that individual parts of a piece can be accompanied by sounding such chords. Even if a composition does not explicitly sound such a chord accompanying the melodies, often a certain chord is implicitly present. This is because the notes sounding over a certain time interval in the piece are chosen from one of the chords.4 Which chord is currently “active” and which ones precede and succeed it is called the har- monic context. Certain sequences of such chords are established. If we now limit our selection of pitches for sonification to a certain subset of the scale fitting an accepted harmonic context, we can approximate “correct” melodic progression. This results in melodic progressions that resemble arpeggios5, as we are mostly dealing with continuous data. Another problem when mapping data streams to pitch scales (and thus creating melodies) when viewed from the perspective of western European pressed as a subset of that scale. The equally tempered scale is the underlying concept of the MIDI pitch scale. 4 Again, this is a large oversimplification: we omit the discussion of passing notes, changing-tones, harmonic ambiguity and many more means of musical expression. 5 An arpeggio is the fast successive sounding of all notes in a chord, generally in an ascending or descending order. 84 6. A More Musical Approach

music is the realtime character of our data. “Classical” western music relies a lot on structure on various levels, from a small theme to larger phrases to sections to the overall structure of a piece. Tension is built up with a specific direction in mind. Certain harmonic decisions are made because the composer knows from the beginning where he is heading. This is why we cannot expect to turn a sonification session into a singular, “correct” piece of music. Depending on the information we have in advance about the performed motion, we can however try and approximate such a progression. A very simple attempt at this will be described in section 6.5

6.3 A Melodic Sonification

A first attempt at creating simple, melodic pitch progressions was outlined in the previous section: we choose a chord from a scale and map a data stream to discrete pitch values used in that chord, instead of continuous pitch values as described in section 4.2. If the chord is chosen appropriately, we have melodic material that will be accepted by the recipient. To achieve this, the arpeggiator scale object (see appendix A.1 and A.2) was built which maps integer values to MIDI pitch values. The mapping is specified by providing m note names as constructor arguments. The note names are mapped to the lowest MIDI octave (Note values 0-11) and then repeated for every other octave. More formally: if p0, . . . , pm ∈ [0, 11] are the given initial MIDI pitches, the following mapping is built:

f(x): R → N = p(bxc mod m) + 12(bxc div m) (6.1) This creates the desired, arpeggio-like pitch progressions. They indeed have melodical character, and -if proper chords are chosen- sound much more Example 34: pleasant than a continuous, siren-like pitch modulation. The concept of Rowing sonification this object is not entirely new: as the name of the Pd abstraction hints at, using arpeggios this method is a variation of the arpeggiators found in many synthesizers. Arpeggiators are devices that take a number of (MIDI-) notes as input and output arpeggios of the input notes. The difference here is that the rhythm and speed with which these pitch values are output is not determined by the arpeggiator but by the change in the input signal. A normal arpeggiator gets a number of Note-On messages as input and outputs the same sequence of notes as long as no corresponding Note-Off is received. Also, this module does not take note numbers as input and chooses the notes of which to create the arpeggio of from them. With this Pd abstraction the notes that are output are predetermined and the progression within the arpeggiator is controlled by the input signal. 6.3. A Melodic Sonification 85

Fig. 6.2: Formula 6.1 applied to the control data created by the position of the seat of the rowing machine. The original data is first quantized, then the quantized data is mapped to the notes of the C major chord (c-e-g). Note that the graphs in the top row are scaled to the same size. The effect of the note mapping is visible by the irregular jumps in the rightmost graph. The bottom row gives a very coarse symbolic rendering as notes of the resulting musical pattern. It does not reflect the slight rhythmic irregularities of the actual output. 86 6. A More Musical Approach

6.4 Re-introducing Fine Grained Parameter Mapping

As pleasant as the discrete pitch mapping in the previous section sounds, it introduces a big problem. In section 3.1 we stated that one of the short- comings of MIDI is it’s low parameter resolution. But if we take a chord consisting of four notes as basis for the previously described mapping and let the notes range over three octaves (a too high octave range would sound un- usual for melodic progressions), we have a resolution of only twelve discrete values! All the subtleties we want to detect in a data stream are lost. Move- ments that are slightly different would sound nearly the same, as it takes a large value change to reach the next note. This loss of information is due to the floor function in formula 6.1. Our aim now is to make those lost decimal places available as audio feedback to the user.We achieve this by simply mapping this fractional part to a param- eter of the generated sound. We mix the discrete pitch mapping with one of the methods described in chapter 4. The difference is that we do not map the complete data stream but only the decimal places, that is we create a new data stream g from the original one: g(x): R → R = x − bxc (6.2) In order to do this, the arpeggiator scale abstraction was given a sec- ond outlet that emits the result of formula 6.2. This can be input into the Example 35: modulation inlet of one of the modules described in chapter 4. The results Rowing sonification are promising: Not only have we still the nice arpeggios of the discrete pitch using arpeggios and sound manipulation mapping, but we have re-gained a much finer resolution of the previous re- sults. These are now even more pronounced: as a much smaller part of the input range now modulates over the whole input range of the sound, the changes have much more effect. The pitch of the sound now gives as a coarse evaluation of the progression of the movement, while the timbre change con- veys details. This has the further advantage that the sound is now more dynamic as long as a constant pitch is held, as the sound of the note now follows the movement, which also sounds more pleasant. The whole effect of dividing a data stream like that can is illustrated by the following example: 2175 8 The magnitude of the fraction 25 is hard to judge, but written 87 25 , we have a much better idea of that number, about its general magnitude as well as the fractional part.

6.5 Creating Harmonic Progress

We previously stated that harmonic progression is one of the marks of western European music. The pitch mapping we have established so far however only 6.5. Creating Harmonic Progress 87

Fig. 6.3: Here we have broken down the data stream of the seat position into a discrete part for mapping to chord notes and the remainder to be used for modulating sound parameters. Note how this is especially useful at the turning points of the data stream, where the discrete stream gives no further information about the progress of the original stream. This information is still present in the fine grained control data. uses a constant set of pitches to which the incoming data is mapped. To move further towards a musical result it would be of help to change the harmonic context of a sonification over time. A complex progression of chords could of course be imposed on the current method, but how should we decide when to change from one set of notes to the other? Simply following a predetermined temporal progression would not make sense as we would introduce a change in the resulting sonification that is not the result of some features present in the incoming data. Normally, the music we use as a archetype for our considerations has a temporal ordering superimposed by a meter, bars and rhythm. In our case, the meter and rhythm is to be deduced from the incoming data. Indeed, detection of rhythmical patterns in the movements is one of the main motivations for Effenberg [1996] to suggest sonification for motion learning. It would not be wise to introduce rhythmical features in the sonification that are not somehow connected to features in the underlying data. So we need to determine features in the underlying datastream that allow us to use them as markers for harmonic change. When we sonify motions, we generally have an idea about the motion we will deal with beforehand. That way, sections of movements that stand out and are long enough to be recog- nizable through a change in harmonic context can be identified and mapped to individual chords. This would lead to a further level of segmentation in the sonification: 88 6. A More Musical Approach

• on the topmost level, the current harmonic context informs about the segment of the motion performed

• the melodical progression informs about the coarse structure in the data created by the movement

• finally the fine grained parameter mapping allows detection of subtle features.

Thus we map levels of musical organization -harmony, melody, sound progression- to different organizational levels of movement. To illustrate how this works, we will apply this technique to one of our movement examples, the rowing motion. Two segments obviously stand out prominently in the rowing motion: the part where the rower pulls back the oar and the part where he slides Example 36: forward again. As we have the position of the sliding seat available as a Rowing sonification data stream, we can use a simple method to detect which of the two parts of using a simple harmonic the movement we are in: we simply take the sign of the first derivative (see progression appendix A.6) as an indication of the motion segment we are in. When the sign changes, we change the underlying chord of all arpeggiator modules.6 That way, all voices follow the chords determined by the seat position. After Example 37: some lowpass filtering on the seat-position data stream this proved to be a Rowing sonification reliable indicator for harmonic context. The harmonic change follows the using another rhythm of the movement that way. simple harmonic progression 6.6 Summary

We have seen that by splitting up the control data in a simple way, we create a mapping that works on a coarse and fine level simultaneously. That way, we can combine the advantages of a discrete pitch mapping with those of a continuous parameter mapping. We can create much more acceptable and easily adjustable pitch progressions but can still get a fine feedback about the structure of the data. By using a simple criterion, the derivative, on the underlying streams, we were able to divide the motion into two general parts which allowed to introduce a change of harmony in the pitch scale used. This principle can be applied to other movements as well, though different and probably more sophisticated means of detecting the phase of a motion have to be used. This of course might introduce a new problem: We want the listener to detect

6 This was actually realized by simply switching between two arpeggiator objects with different constructor arguments 6.6. Summary 89 features in the underlying data stream. If we built too sophisticated means of analyzing a movement into the sonification process, one runs the risk of relocating the task the listener is supposed to perform (feature detection) into the sonification process. With the mapping we created in this chapter, MIDI again becomes a vi- able medium for transporting the sound information. As we now only express the remainder of a split up data stream as sound changes, the resolution of MIDI controllers is sufficient. The fact that MIDI now is better usable for our purposes may also be a hint that our mapping is now in some way “more musical” as this is what MIDI is supposed to transport: musical information, not sonification information. 90 6. A More Musical Approach 7. FURTHER PROSPECTS

Though a broad range of aspects was covered in this work and the results are a definite improvement as far as aesthetic acceptability is concerned, much remains to be done. The following sections line out certain aspects that deserve further considerations in the author’s view. Some of these aspects are new directions which movement sonification might take, others are areas that build upon the present approaches.

7.1 MotionLab and OSC

In section 3.2 the advantages of the OSC protocol were discussed. The set- tings mechanism (section 3.5.1) used throughout the objects created for this work was designed with OSC compatibility in mind. For further sonification work, extending the MotionLab Software with a plugin enabling MotionLab to act as an OSC client is proposed. That way, not only the presented re- sults are made easily accessible from MotionLab. Other audio solutions can be used in addition to, or as a replacement for, Pd. Because OSC was de- signed with network communication in mind and is generally used over UDP, computationally heavy tasks can be more easily divided over the nodes of a network. Common functionality, like scaling of values can be implemented within the plugin using a general purpose language like C++ and supplied with an easy-to-use GUI. The OSC querying mechanism can be used to pro- vide the user with possible sonification targets. To provide for OSC servers that do not implement querying, an additional manual entering of targets must be provided for. The sv and ssymbol abstractions originally designed for saving values and restoring them can easily be extended to serve as OSC nodes. That way the sonifications presented here become immedi- ately reusable. For non-OSC sound sources, a fallback MIDI implementation should be provided, that is accessible and adjustable with the same user in- terface. Basically, all scaling tasks, the saving of settings, etc. should be implemented in such a front end. With such a general purpose sonification tool, different sound sources can be accessed in a transparent fashion, using MIDI or OSC depending on the abilities of the target. 92 7. Further Prospects

7.2 More Sound Based Methods

As was often hinted at, the field of electronic music and sound design is vast. Though a broad overview over some of the most common methods used was given, many promising approaches where neglected for lack of space. The already mentioned granular synthesis [Xenakis, 1971] is another promising approach to creating sonifications, as it can be used with existing material (by applying it to already existing sound samples). Granular synthesis allows for a wide range of sound manipulation methods. Not creating a sound from scratch but working on existing samples (apart from simply pitching them) seems promising, as an end-user can decide for himself what to use as basis for the sonifications. Another simple method for doing that, is simply degrading the sound quality in a number of ways. A first example of that was given in section 4.4.2, by clipping an arbitrary signal in order to make it “harsher”. Other methods, that are also used in modern popular music, include downsampling a signal, by reducing the sample rate and the number of bits, and more sophisticated methods of clipping and distorting a signal. Much effort is spent by developers of professional audio effects to reproduce digitally the behavior of real guitar amplifiers when they are in overdrive. There is also a wide range of audio effects that were not discussed here. All-pass filters like flanger, phaser, chorus and equalizers, effects like delay, echo and the simulation of reverberation1 allow for many more possibilities to create and shape timbres.

7.3 Sophisticated Sound Design

Most examples and implementations reviewed in this work were used in very basic variations. This was done to give an impression of exactly the method under discussion. When these methods are used in a more artistic, musical context, the are often combined to create an overall sound. A sampled sound could be mixed with one created by waveshaping. The two sounds could each by slightly modulated in pitch to create a more lively result. Then they may be lowpass-filtered, sent through an all-pass-filter and combined with further effects. Complex sounds like that offer a multitude of parameters that can be the target of modulation. Often, many of these parameters are changed at once to create drastic changes in sound. Such complex arrangements of course allow for much more sophisticated results than the ones presented here. The problem is that the results vary immensely from sound to sound and are

1 Of course all these effects are technically also filters. 7.4. More General Methods for Audio Rendering 93 much harder to categorize. One such combination of sampling, waveshaping, filtering, etc. could be a fantastic sound for sonification -perceived as pleasant by a large number of recipients, good detectability of data features- while another fails to be useful for sonification purposes. This is why the methods presented here where used with such “bare” configurations. Otherwise the already difficult and of course subjective evaluation would have been less useful. This complexity of sound design suggests that further collaboration with people with an “artistic” background (musicians, sound designers) is promis- ing, as this can further boost the perceived results. The challenge here is to quantify the results as to the amount of information that can be detected by the user. A very nice sounding sonification may be highly valuable from an artistic point of view, but the usability for detecting data features must not be neglected.

7.4 More General Methods for Audio Rendering

The methods applied throughout this work where variations of methods al- ready used in music and sound design. Their drawback is their heterogeneous nature. Each method works very different from the others, each has its dis- tinct sound characteristics. Further research in and application of methods that allow a more general approach to timbre design are promising. An at- tempt at this is the presented threestimulus method, which however proved to be too restricted in the timbres it could create. Other methods that use much more dimensions to represent timbre could be a solution to that. Puckette [2004] suggests an analysis method that maps the progression of an arbitrary sound as points in a 10 dimensional space. By also applying this analysis in realtime to an input sound, he is able to control the progression of the sound analyzed beforehand by the progression of the input sound. Such a generic approach to controlling timbre seems promising for sonification. Another high dimensional representation is given by Terasawa et al. [2006]. They use a 13-dimensional representation of timbre based on methods used in speech recognition. This representation has the advantage that it provides an euclidean metric on the timbre space that allows to express the difference in steady timbre of two sounds in a way that correlates with human perception. This approach may work as a valuable hint in determining the effectiveness of a sonification as far is detectability of features in a data stream is concerned. A different approach at describing arbitrary sounds is lined out by M´etois [1997]. Metois describes a way to create a generative model out of an arbi- trary sound source using a high dimensional embedding of sampled sound 94 7. Further Prospects data. The dimension of the model used depends on the complexity of the in- put sound. The novelty of this approach lies in the fact that he does not use a spectral representation of the sound, but the so called lag space. For each data sample in the sampled sound a point in the n-dimensional lag space is created. The first value is the sample value itself, the n−1 other coordinates are the sample values of the n − 1 preceding samples.

7.5 Need for Psychoacoustic Evaluation

The methods applied in this work to improve the aesthetic quality of sonifica- tions were chosen with a good detectability of the underlying data features in mind. A nice sounding sonification is useless if features of the sonified data are not discernable. Though an attempt at a neutral evaluation of these methods was made, those evaluations still remain subjective opinions of the author. To what extend they achieve the goal of transporting information to the recipient still remains to be tested with psychoacoustic experiments. A measure such as the metric proposed by Terasawa et al. [2006] may prove to be a valuable aid in such a comparative analysis. Yet, also this metric has to be justified by perceptual studies.

7.6 Conclusion

The main goal of this work, the improvement of perceived aesthetic quality, was achieved. The first results indicate that this was possible without signif- icant loss of information transfer. The quality of pitch based methods could be improved by simple measures. The modulation of timbre was applied to great effect and a wide range of sound-based methods for sonification were reviewed. A novel musical approach to create sonifications accounting for a musical paradigm, harmony, was presented. The presented methods work in realtime on standard hardware and -through the use of Pure Data- have a simple and interactive user interface. Moreover, the implementations proved to be robust enough for use under real-world conditions. APPENDIX

A. LIST OF PD ABSTRACTIONS AND EXTERNALS

This is a selection of the abstractions and externals created for this work. Each is presented with a screenshot and a short description. For lack of space, the source code / source patches are not displayed here. The objects are listed alphabetically. The objects listed here are mainly the high level objects used for the sonifications. Some of the more interesting low-level abstractions that where used to build the high level objects are also listed, if they implement a particular feature discussed in the previous chapter or if they might be interesting for people expanding the existing set of objects. Objects for tasks like interpolating values, mapping program numbers to GM program names etc. are omitted for brevity, as these tasks are self-explanatory. Abstractions with GUI controls can save their state with the settings object described in section 3.5.1 and thus need a unique name as a first parameter. All other constructor arguments are explained in the respective descriptions. Many of the GUI objects can also receive state information as messages. As this method is not necessary for use in sonification patches, the message descriptions are omitted here. They are explained in the source files of the abstractions. All audio and MIDI sonification objects except the tristimulus and tristimulus control mute their output if no input arrives for a few mil- liseconds. Audio output is simply ramped to zero, MIDI objects emit the necessary Note Off event.

A.1 arpeggiator

The most simple module to create melodic progression with (see section 6.3). It maps the incoming stream to a discrete scale of note values that are all separated by the same interval. The constructor argument determines the size of the interval, in the given example this is a fifth (seven semitones). The first outlet emits the discrete note values, the second outlet the remainder of the incoming value that can be used to control some property of the sound, e.g. the filter cutoff. 98 A. List of Pd abstractions and externals

A.2 arpeggiator scale

This is a more complex version of the previous arpeggiator abstraction that can create arbitrary scales (see section 6.3). The constructor arguments are the note names to which successive values are mapped. As with the arpeggiator , the remainder of the incoming data stream is output at the second outlet to be used to modulate the properties of a sound.

A.3 channel

The channel abstraction (section 3.5.3) can be used like a channel strip in a hardware audio mixer. It has two inlets, one for a monophonic audio signal and one for mes- sages setting the channel parameters. The level meter gives information about the amplitude of the signal (peak and RMS). The two outlets output a stereo audio signal that is panned according to the panorama setting. A clip indicator is automatically checked if the peak amplitude exceeds 1.0.

A.4 clipping

This is a simple clipping waveshaper as described in section 4.4.2. The first inlet receives the audio signal, the second inlet a number which should lie between zero and one. Zero leaves the input signal unchanged, one clips it heavily and leads to much distortion. This is done by amplifying the signal internally.

A.5 crossfading loop sampler

This abstraction was built to re- medy the problem of formant shift when pitching sampled sounds (see section 4.2.1). It takes pitch as in- put and outputs an accordingly pitched sample. The GUI allows loading of a sample map. This is a textfile that consists of several lines, each line pro- viding the information for one sample. A line consists of the original pitch of A.6. derivative 99 the sample, the path to the audio file relative to the map, the sample index of the loop start and the sample index of the loop end. Spaces in file paths must be escaped with a backslash. An example would be:

43 32.Violin\ G2.wav 40951 73404 48 33.C3.wav 42513 74025 53 34.F3.wav 39821 73524 58 35.A#3.wav 37043 69833

The sampler always plays the sample closest to the input pitch. If the closest sample changes, the old sample is cross faded with the new one. The sampler permanently loops the part of the sample specified by the start and end indices given in the text file. Depending on the size of the sample map, loading may cause Pd to freeze for a moment, as every message processing in Pd is an atomic operation, and message processing is not done in parallel. This is only temporary, Pd continues operating normally after loading the samples.

A.6 derivative

This is a very simple implementation of the first derivative. It uses the backward difference of a stream of number messages to calculate the derivative. This was used to determine the phase of the rowing motion a test subject was in (see section 6.5).

A.7 ergometer

The ergometer external reads data from the serial port provided as constructor argument1. It ex- pects the data to be in the FES defined frame format (see section 3.6.3). It polls the ’s se- rial port buffer every millisecond using Pd’s built-in timing mechanism. On receiving a “connect” message it connects to the given serial port, a “disconnect” message closes the port again. The first four outlets emit the four sensor values, the fifth outlet is triggered every- time a frame skip is detected. This low-level object is used by the more comfortable ergometer input .

1 COM[n] for Windows, /dev/ttyS[nn] for Linux 100 A. List of Pd abstractions and externals

A.8 ergometer input

A more user friendly wrapper of the ergometer external. The serial port is again given as first constructor argument. The output is in form of a list of four floats so the input method is inter- changeable with file and network input. A counter informs about the number of frame drops (see sec- tion 3.6.3).

A.9 file input

The file input object is used to read motion data from a textfile (section 3.6.1). Each line in the file must consist of C “printf” style floats separated by whitespace (not by commas or semicolons). If playback is started, each line is output as a list of floats at the rate set in the GUI. While playback is stopped, lines can be output manually with the “manual output” button. “Reset position” rewinds to the beginning of the file.

A.10 floatmap

This external makes the C++ STL map available in Pd. Pd has no built-in map data structure and the extended version of Pd proved to be too unreliable, so this custom implementation was built. The floatmap is needed for the sample lookup functionality of the crossfading loop sampler . Pd is ill suited to built data structures like maps, so the implementation was done in C++. Keys are always floats, the value can be any valid Pd message. A message consisting of a single float performs a lookup, a float followed by more data is considered a “put” operation. Messages beginning with “leq”, “geq” and “closest” return the next less-or-equal, greater-or-equal or closest key/value pair respectively. A “clear” messages removes all data from the map. The A.11. fm1 101 outlets return the key, the value (if one was found) and trigger (if nothing was found).

A.11 fm1

A basic two operator frequency modulation, re- alized by phase modulation. The input is mapped to the modulation index which should lie between 0 and 5, depending on the settings for k and m. The higher the modulation index, the more par- tials are created. The values k and m are the mul- tipliers for the carrier and modulator frequency. The base pitch determines the base frequency ω. Intuitively, you can control the spacing of the partials (the spectral “den- sity” of the sound) with m, the position of the spectral peak with k. For an in-depth discussion of this synthesis, check section 19.

A.12 hold note

The hold note abstraction holds a MIDI note as long as input arrives. On arrival of any Pd message a “Note On” MIDI message is send to the MIDI output. If no input arrives for 200ms, the corresponding “Note Off’ is sent. The parameters of the note can be set with the GUI. This abstraction is meant to be used in combination with sonify control objects, as a MIDI controller needs a sounding note to work on.

A.13 master

This is another of the utilities described in section 3.5.3. The master abstraction is a convenient wrapper around the built-in dac∼ , which outputs audio data to the soundcard. It allows controlling the master volume and offers the possibility to write the incoming audio to a 16 Bit stereo wave file. Note that it takes no name as first argument, as it is supposed to be a global object. If two of these are used in one session, their settings will interfere. It basically works like the channel abstraction. 102 A. List of Pd abstractions and externals

A.14 median

An external implementing a median filter. The constructor ar- gument is the window size. This external is useful for filtering salt- and-pepper noise created by frame drops in serial port input from the ergometer (see section 3.6.3).

A.15 midi channel

The MIDI equivalent of the channel abstrac- tion. It uses MIDI controllers 7 and 10 in an attempt to set channel volume and panorama. Controllers 6, 100 and 101 are used to set pitch bend range, pro- gram change events communicate MIDI patch set- tings. The name of the General MIDI program name is given as a reference. Note that the functionality of this abstraction depends on the abilities of the attached MIDI device, so some or all of the settings may be ignored! For non-GM devices the actual sound patch used will not correspond to the name shown. The constructor argument is the MIDI channel number starting at 1. The program value shown is the actual MIDI data transmitted and thus ranges from zero to 127.

A.16 norm mapping

This abstraction takes a function of one variable as its argument. This func- tion is evaluated with the input to the second inlet. The resulting value is trea- ted as a decibel value with which the audio input to the left inlet is amplified (or attenuated). This is used to achieve the amplitude normalization dis- cussed in section 4.6. A.17. paf 103

A.17 paf

This creates a sound with one formant, whose width is adjustable by the GUI. The base pitch of the sound, which is not modulated, is also set in the GUI. The input is in pitch units and modulates the position of the formant. The effect is described in section 4.4.3. This is a GUI wrapper around the implementation shipped with Pd.

A.18 paf vowel

The paf vowel abstraction uses the PAF algorithm (see 4.4.3) to create a sound with three phase aligned formants. The input should lie between 0 and 1 and interpolates between two formant settings which can be chosen with the selection boxes in the GUI. These settings correspond to human vowels. The base pitch and bandwidth is adjustable as for the paf abstraction. The effects of this sonification method are described in section 21

A.19 pink noise

Using the iemlib pink∼ external, this abstraction creates a 1/f noise whose amplitude is changed by the input to the inlet. The input should lie between 0 (silent) and 1 (loud). Amplitude modulation is done in dB internally, so the input only needs to be scaled. The result sounds like waves on a shore, as 1/f noise mimics closely the noise created by water. For more details see section 4.3. This is a wrapper around the pink noise module shipped with the iemlib. 104 A. List of Pd abstractions and externals

A.20 reverb

This is simply a wrapper with GUI controls around the freeverb∼ object normally shipped with Pd- extended. As the Pd-extended release was not sta- ble enough to use as a basis for the sonifications, the freeverb∼ was recompiled against the stable official release and manually added. The usage of this object improves the sound quality of sonifications. It makes the results sound less “pressing”, as it creates an illusion of space.

A.21 sampleloop

The simplest sampling based object. Takes the relative path to a wav sample as first pa- rameter and the original pitch of that sample as second. Incoming data is treated as pitch values, the sample is constantly looped and output with the pitch received as input data.

A.22 sampleloop filter

This abstraction works like the ordi- nary sampleloop abstraction, except that the input is mapped to the cutoff of a lowpass filter. The input is still treated as pitch units, so an input of 69 maps to a cutoff frequency of 440 Hz. This abstraction was used to test the effect of filtering arbitrary samples as described in section 4.4.1.

A.23 settings

The settings abstraction is used to control the state saving mechanism described in detail in section 3.5.1. It provides the functionality to save and load the state of all sv and ssymbol objects present in the current session to/from a file. Adding a settings object to a patch makes the state of all objects in that patch that are built using sv and ssymbol abstractions savable. It does not need to be connected anywhere, so it has no in- or outlets. A.24. sine pitch 105

A.24 sine pitch

The “Hello World” sonification. A sine wave that is modulated in pitch. The input is in MIDI pitch units. This abstraction is useful for reference tests, as it implements the most simple pitch modulation possible with an easy to detect waveform. As the sinusoid has to other partials except the base frequency it creates hardly any masking effects.

A.25 sonify bend

sonify bend takes pitch bend values ([−8192, 8192]) as input and creates the corresponding pitch bend mes- sages. As long as there is input, a MIDI note is held to create an audible result. The parameters of the basic note can be set via the GUI.

A.26 sonify control

The sonify control abstraction takes MIDI con- troller data values as input ([0 − 127]) and creates the according MIDI controller messages. The parameters of these messages can be set with the GUI. This ab- straction creates no MIDI notes by itself to work on, as often when working with controllers you will want to combine several of them. To create a note on the corresponding channel, either use a hold note object or combine the controller with a pitch sonification that creates notes.

A.27 sonify note cont

This is a combination of pitch bend and MIDI note sonification. It takes floats that lie in the MIDI pitch range as input. It tries to reach the input pitch by pitch bending using the information about the bend range set in the GUI. If the pitch cannot be reached by bending alone, It turns off the current note and emits a new one closer to the desired pitch. That way one has the pitch range of MIDI Notes available, combined with the finer resolution of the pitch bend messages. This also reduces the effect 106 A. List of Pd abstractions and externals of formant shifting when using sample based MIDI devices, as many sam- ples are used instead of one. In this respect this abstraction works like the crossfading loop sampler . The drawback is that the beginning of a new note may be audible depending on the attack characteristics of the used sound, so the audible result is not as continuous as the result of the sonify bend object.

A.28 sonify note dis

The simplest MIDI sonification. Each incoming data value creates a MIDI note with the input pitch. The length of the note (duration between emission of Note On and Note Off events) is controlled by the GUI, as is the velocity of the note.

A.29 sonify scale

This is the GUI for the scaling described in de- tail in section 3.5.2. It provides controls to enter the input and output intervals, the gamma correc- tion factor and the tolerance in percent for the low and high input interval boundaries. The minimum and maximum of the input values so far is displayed in the top two number controls. Clicking the Reset button resets those values, which is useful if the test subject changes and the scaling has to be readjusted. Clicking the Set button sets the currently measured minimum and maximum as input interval. This al- lows for fast adjustment of the scaling for a new test subject:

• Set a sensible tolerance value (e.g. 5-10%) to allow for later variations

• Reset the measured minimum and maximum

• Let the subject perform the motion a few times to measure the indi- vidual extent of the created values

• Then set the measured interval as input interval A.30. ssymbol 107

A.30 ssymbol

The ssymbol can be attached to an object emitting and receiving symbol message to make its value savable. The state is saved in the ssymbol object which can communicate it with a settings object for saving and loading (see section 3.5.1).

A.31 subtractive1

The subtractive1 abstraction has a har- monic sound as a basis (two optionally detuned sawtooth waves) which are sent through a res- onant lowpass filter. The input can range from zero (filter closed, no signal) to one (filter open, bright sound). Optionally, the detuning of the two waveforms can also be modulated. As with all subtractive synthesis methods here, the quality (“q”) of the filter is also adjustable. Higher q settings result in the harmonics close to the cutoff frequency to be amplified, which creates a sharp “edge” around the cutoff frequency and makes changes better detectable. The properties of this mod- ule are discussed in section 4.4.1.

A.32 subtractive2

A very simple form of subtractive synthesis: white noise filtered by a resonant lowpass filter. An input of 0 closes the filter, 1 opens it completely. The higher the q setting the more frequencies at the cutoff frequency are emphasized.

A.33 subtractive3

The same as subtractive2 , but with a highpass filter. 108 A. List of Pd abstractions and externals

A.34 subtractive4

The bandpass version of the filtered noise abstractions. High q values cause a narrow noise band, low values a broad one. Narrower bands make the signal stand out more in sonifications with many audio sources.

A.35 subtractive5

The name of this abstraction is not 100% correct. It is ac- tually a combination of non-linear distortion and subtractive synthesis. The sound generation is similar to subtractive1 , but additionally to manipulating the filter cutoff through inlet one, one can modulate the clipping with inlet two (see section 4.4.2). As with the first inlet, the input ranges from 0 to 1. A value of 0 leaves the signal unaltered, 1 distorts it heavily, which sounds more “aggressive”. As with all subtractive synthesis objects, refer to section 4.4.1 for an in-depth discussion.

A.36 sv

This abstraction can be attached to an object sending and receiving float values (which is most GUI objects). It stores the float value internally and communicates it with a settings object (see section 3.5.1). As it wraps a normal v object, the saved value is also accessible by normal v objects with the same name, as all values in Pd are global.

A.37 tristimulus

This is the actual implementation of the tristimulus model (see section 4.4.4). It takes the frequency in Hz, the strength of the first, mid and high partials, the at- tenuation in each band (0=only first par- tial, 1=all partials in a band have the same strength), the strength of the even partials (0=no even partials, 1=even partials as strong as odd ones) and the width of the noise bands as input. It has two audio outlets, one for the harmonic and one for the noisy signal. This abstraction can be used as an engine inside of user interfaces for the tristimulus model. A.38. tristimulus model 109

A.38 tristimulus model

The GUI wrapper for the plain tristimulus object. The output is the same as for the tristimulus object itself. The first inlet takes a list of two floats as input, which encode the position of the center point in eucledean coordinates. The barycentric coordinates of that point are then calculated and used as values for the fundamental, mid and high partials of the tristimulus model. The actual strength of the three values is calculated according to section 4.4.4. The triangle can also be con- trolled with the mouse. The other tristimulus parameters are set with the GUI.

A.39 waveshaping1

A simple waveshaping method which is normally known as syncing. The input is expected to lie between 0 and 1, 0 results in an ordinary sawtooth wave. When raising the input, pitch and timbre change. 2 When raising the input, the amplitude of the sawtooth is raised. But different from the clipping case of the subtractive5 abstraction, the value is not simply clipped but wrapped around. This means that an amplification with an integer factor creates another sawtooth at a multiple of the original frequency. Non-integer values cause a change in timbre. The resulting modulation is perceived as smooth, though pitch changes discretely.

2 The “normal” implementation of this algorithm used in analogue modular synthesizers uses two detuned oscillators, where one “synchronizes” the other. The moment the master oscillator finishes a signal, it triggers the sync-input of the slave which causes the slave to restart his cycle 110 A. List of Pd abstractions and externals B. PITCH RANGES

The following diagram gives an overview over the pitch ranges of some often used orchestra instruments. This helps selecting proper pitch ranges for sonifications. The mapping of MIDI pitch to actual pitch can sometimes differ from the MIDI notes given here, as MIDI sounds are often optimized for use with a keyboard. Therefore, a sound may have a pitch offset so it is shifted to a certain area of a keyboard controller. This chart also nicely shows the logarithmic charecteristic of human pitch perception: the range between 50 and 2000 Hz is of particular interest and covered by many instruments while pitches above 8000 Hz are not even considered in this chart as they are not used in musical contexts. Pitch ranges for sonifications should not lie outside the range of a piano, as above and below that range hearing degrades significantly. The ranges given are cited from Michels [1977]. The selection of instru- ments is somewhat arbitrary and far from complete. It is supposed to serve as a general guideline for people without much musical background who want to develop sonifications, not as an exact reference for the study of musical in- struments. In practice the ranges may differ slightly from the ones given here, depending on the individual instrument. Many instruments are available in different tunings which may result in a an offset to the ranges presented here. 112 B. Pitch Ranges C. AUDIO EXAMPLES

Example 1: Linearly rising pitch http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 01.ogg

This is a cosine wave rising from 20 Hz to 20,000 Hz. The pitch change is linear at about 2000 Hz per second. Note how the pitch seems to rise very fast in the first 2 seconds and almost seems constant for the remaining 8 seconds.

Example 2: Exponentially rising pitch http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 02.ogg

This is also a cosine wave rising from 20 Hz to 20,000 Hz. This time, the change is about one octave per second. This time the pitch seems to change at a constant rate while in fact the frequency is doubled every second. This is due to the logarithmic nature of the human ear.

Example 3: Comparison of peak and RMS amplitude http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 03.ogg

This example consists of four sounds: a 50% pulse wave and a 5% pulse wave at the same peak amplitude and then again the same two sounds at the same RMS amplitude. The first sound sounds “louder” than the second, though they have the same peak amplitude. The fourth sound is the same sound as the second, this time raised to the same RMS amplitude as the 50% pulse wave. This time both seem to have about the same loudness. This serves to illustrate that the RMS amplitude is a better measure for loudness than peak amplitude. 114 C. Audio Examples

Example 4: Comparison of signals with constant amplitude but different pitch http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 04.ogg

We hear three successive signals, all with the same peak and RMS ampli- tude but at different frequencies (110 Hz, 440 Hz and 7040 Hz). The sound chosen is again a simple cosine wave. That way, the perceived loudness only depends on the frequency and not the spectrum. The perceived result is different for each individual. Most people perceive the lowest (first) tone as the most muted one, the medium tone as the loudest and the high tone as somewhere in between.

Example 5: Frequency masking http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 05.ogg

This is an example of a louder sound masking a more silent one. The masking sound is 1 second of white noise, the masked sound a 30 ms sinusoid at 440 Hz. In the first example, the sine wave has approximately the same RMS amplitude as the noise. They are first played separately, then together. Due to it’s high amplitude, the sine wave is still recognizable against the noise. Then this is repeated with the sinusoid’s amplitude lowered by 20 dB. Again, we first hear the noise and sine wave separately, then together. This time the sine wave is masked by the noise.

Example 6: Temporal masking http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 06.ogg

As with the previous example, the signals are 1 second of white noise and a 30 ms sinusoid at 440 Hz. The example is repeated five times with the noise and sinusoid at the same amplitude. The sinusoid is started 200, 20 and 0 milliseconds after the end of the noise, 10 ms before the end of the noise (that way, 10 ms coincide with the noise and 20 ms do not) and right in the middle of the noise. Again, it is recognizable everytime due to it’s high amplitude. Then the sinusoids amplitude is lowered by 30 dB, played alone as a reference and the experiment is repeated. This time the sinusoid becomes harder to recognize the smaller the gap between the end of the noise and the sinusoid gets. 115

Example 7: Zipper noise http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 07.ogg

This is an example of zipper noise. The incoming data stream is the the force on the handle of a rowing motion. It is mapped to the cutoff of a lowpass filter filtering white noise. The zipper noise is recognizable as a constant crackling.

Example 8: Zipper noise removed by interpolation http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 08.ogg

This is the same as example 7 except that this time the incoming data stream is linearly interpolated. This removes the zipper noise effectively.

Example 9: Foldover created by the tristimulus model http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 09.ogg

This is an example of a high tristimulus sound without clipping the par- tials at half the sampling frequency. The high partials “fold over“, thus landing on non-integer multiples of the base frequency. This creates a disso- nant, metallic sound.

Example 10: The tristimulus model with foldover correction http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 10.ogg

This is the same sound as in example 9. This time, the frequency of the partials is limited up to half the sampling frequency, which eliminates aliasing.

Example 11: Formant shift of an oboe sample http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 11.ogg

This is the oboe sound with which figures 4.6 to 11 were created. First we hear an oboe sample at pitch B1, then the same sample pitched up by two octaves. The unnatural sound is clearly audible. Apart from that, due to the four times higher playback speed, the loop does not sound like a vibrato anymore but is too fast and annoying. This is another problem that can 116 C. Audio Examples occur when sample are pitched over a too large range. This effect is known as “beating”. The third sound is an oboe sample playing a B3. This time it is not a pitched sample but a sample of an oboe playing that note.

Example 12: Crossfading a set of oboe samples http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 12.ogg

In this example the crossfading loop sampler was used to continuously change the pitch of the oboe sound over two octaves, again from B1 to B3. The first example uses a multisample. You can try to listen for the crossfading between the samples, there are fourteen samples which results in thirteen crossfades. The second example does the same but without crossfading and multisampling. Just one sample is pitched over two octaves. There is no switching of samples, instead a strong formant shift.

Example 13: Dissonant voices with different ranges http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 13.ogg

This example is made with two sampled sounds: strings (depending on the pitch bass, cello and viola) and a choir. At first we hear a partial rowing sonification with the position of the seat mapped to the strings’ pitch and the force on the footrest mapped to the choir’s pitch. We hear this sonification twice, once with both sounds in the same range, then with two octaves be- tween them. Then we hear both sounds at constant pitch, 6 semitones apart. This is a very dissonant interval, the tritonus. Then we place the sound 18 semitones apart, i.e. we separate them by an octave. Technically this is still a dissonant interval, but the dissonance is no perceived so strongly.

Example 14: Sonification with extremely high pitch ranges http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 14.ogg

The pitch ranges chosen for this sonification are extremely high. As expected, the result sound specially annoying.

Example 15: Linear and exponential filter sweep http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 15.ogg 117

This is an example of white noise filtered with a resonant lowpass filter. First, the cutoff frequency is swept linearly from 20 to 20000 Hz, then expo- nentially. Note how there seems to be a fast “jump” at the beginning of the first example and then little change for most of the time followed by a fast “drop”. In the second example there is a constant change in timbre.

Example 16: Highpass filtered string sample http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 16.ogg

This is a highpass filter applied to a looped sample of strings. The filter frequency is swept up to 20000 Hz in four seconds and then back.

Example 17: Filter sweep with and without resonance http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 17.ogg

We hear two examples of lowpass filtered white noise. The first without resonance, the second with huigh resonance values. In the second example the partials around the cutoff frequency are much more pronounced which helps tracking the frequency.

Example 18: Harmonic and noisy sounds bandpass filtered http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 18.ogg

We hear two bandpass filtered sounds. The first is a looped string sample, i.e. an harmonic signal, the second white noise. The first sound changes strongly in character, the amplitude depends on the spectral energy in the currently passing frequency range. The second sweep maintains a constant character.

Example 19: Clipping http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 19.ogg

This is an example of a harmonic sound that is amplified and then clipped. The sound consists of two sawtooth waves, one octave apart, slightly detuned and lowpass filtered, a typical synthesizer sound. At first the example is unaltered, then amplification steadily raises for 10 seconds and the returns to normal in 10 seconds. The effect of the clipping starts abruptly once 118 C. Audio Examples the signal comes into clipping range, which makes mappings using clipping complicated.

Example 20: Harmonic and non harmonic FM sound http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 20.ogg

This example consists of two sounds. The first is an FM sound where carrier and modulator have a common base frequency. The modulation index is swept from 0 to 2.7 and back. The second is also a sweep of the modulation index, but this time the carrier and modulator do not have a common base frequency. The result is much harsher and has a dissonant metallic quality.

Example 21: Formant sound created by the PAF algorithm http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 21.ogg

We hear two examples of the PAF algorithm. The first is a sweep of the formant position. Then, the formant is again swept up, but then remains at a constant frequency while the bandwidth of the formant is changed.

Example 22: Vowels created by the PAF algorithm http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 22.ogg

First we hear the four vowel types that were created. Then we hear a sweep from “Aah” to “Eeh”, then one from “Ow” to “Eeh”.

Example 23: Tristimulus sound with changing amplitude http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 23.ogg

The tristimulus sound in this example changes his timbre so at first only the fundamental, then only the middle partials, then just the high partials and finally again just the fundamental is audible. The amplitude is lowest at the beginning and highest when the many high partials are audible. The difference is about 8 dB in RMS and about 18 dB in peak amplitude.

Example 24: Normalized tristimulus model http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 24.ogg 119

This example is the same as the previous one. This time the coefficients for the partials are correctly calculated to guarantee a constant RMS ampli- tude. There is still a difference in peak amplitude between the settings but the perceptionally more important RMS amplitude remains constant.

Example 25: Tristimulus compared to sawtooth and square wave http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 25.ogg

First we hear the tristimulus model with a sound with all three bands equally strong. The strength of the even partials is modulated. Then we hear a sawtooth and square between which is crossfaded. The two effects sound very similar.

Example 26: Tristimulus attenuation compared to lowpass filter http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 26.ogg

Here we hear the effect of a change in the attenuation parameter to a tristimulus sound. Then we mimick the same effect with a mixture of sawt- toth and squarewave combined with a lowpass filter. Again, the effects are very similar. The main difference is that the tristimulus sound maintains one strong high partial even if the attenuation parameter is set to a minimum.

Example 27: Tristimulus bands compared to equalizers http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 27.ogg

In the final example we compare the effect of controlling the three tristim- ulus bands —which is the most important feature of the tristimulus model— to controlling the gain of three equalizers that have their center frequencies set to approximately the centers of the three bands. As suspected, the sound results resemble each other very much.

Example 28: Rowing sonification using a purely pitch based approach http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 28.ogg

A MIDI-based rowing sonification modulating the pitch of four sounds. See page 76. 120 C. Audio Examples

Example 29: Rowing sonification using pitch and lowpass filtering http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 29.ogg

A sonification of the rowing motion using two sample sets and two lowpass filtered signals. See page 76

Example 30: Rowing sonification using bandpass filtering and the tristimulus model http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 30.ogg

This sonification of the rowing motion uses bandpass filtered white noise and two instances of the tristimulus model. See page 77.

Example 31: Walking sonification using the PAF algorithm and pink noise http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 31.ogg

This pleasant sonification uses two simple PAF formant sounds and am- plitude modulated pink noise. See page 77.

Example 32: Walking sonification using vowel sounds and lowpass filtered samples http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 32.ogg

A walking sonification using PAF generated vowel sounds and lowpass filtered samples. See page 78.

Example 33: Walking sonification using FM synthesis http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 33.ogg

A walking sonification using solely Frequency Modulation. See page 78.

Example 34: Rowing sonification using arpeggios http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 34.ogg

This is a sonification of the rowing motion using two lowpass filtered sample sounds (strings and choir) for the distances and two FM sounds for the measured forces. Unlike the sound based sonifications, we map the incoming 121 data stream to pitch. The data stream is mapped to a discrete scale of notes which creates four arpeggio-like melodies. The advantage is that it sounds much better than the usual, continuous pitch based methods. The disadvantage is of course a loss of precision. On the other hand, the discrete mapping creates rhythmical patterns that are not present in an ordinary pitch based sonification which might give useful additional information about the structure of a data set, as it is a coarse orientation of the speed with which the data values change. Rhythmical patterns are likely to be easier learned and compared than abstract continuous pitch progressions. The scale for the arpeggio is a little unusual, it is c d d# f# g. This is a chromatic pitch progression that is not part of a single major or minor scale. It was chosen due to it’s colorful, harmonically ambiguous character. As this example has no harmonic progressions, this chromatic selection of notes makes it more interesting to listen to in the long run than a simple major or minor chord.

Example 35: Rowing sonification using arpeggios and sound manipulation http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 35.ogg

This as the same sonification as example 34, only that this time the re- mainder of the data values is mapped to a sound parameter of the underlying sound generation method. For FM synthesis this is of course the modulation index, for filtered sounds this is the filter cutoff. That way, the fine grained resolution of the data that was lost in the previous method is re-introduced in the sonification.

Example 36: Rowing sonification using a simple harmonic progression http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 36.ogg

This is similar to example 35, except that now a second scale is used for the first part of the motion. This adds another information level to the sonification. The notes chosen for the second part of the motion are c f g and b. This is a section of the circle of fifths, which gives the part an open undetermined character. This is to serve as a counterweight to the first part of the motion.

Example 37: Rowing sonification using another simple harmonic progression http://cg.cs.uni-bonn.de/project-pages/sonification/CG-2007-4/example 37.ogg 122 C. Audio Examples

This is the same sonification as example 36, only the used chords are different. Now the chord used for the first part of the motion (the pulling) is a Gmaj7 (g b d f#), the release part is simply C (c e g). C is the tonic to the dominant G. The major seventh was chosen instead of the minor seventh for no particular reason. These chords are a standard combination (except for the major seventh) and it was chosen as an additional example as the chords used for the previous example may sound too dissonant for some people. This example has less “tension”. BIBLIOGRAPHY

S. A. Brewster, P. C. Wright, and A. D. N. Edwards. A detailed investigation into the effectiveness of earcons. In G. Kramer, editor, Auditory Display - Sonification, Audification and Auditory Interfaces, volume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 471–498, Reading, Massachsetts, 1994. Addison-Wesley Publishing Company.

J. Bylund and M. Cole. Sonification of a meteorological data set. Technical report, University of Sydney, 2001.

R. B. Dannenberg. Nyquist Reference Manual Version 2.33, Jan. 2007. URL http://www.cs.cmu.edu/∼rbd/doc/nyquist/nyquistman.pdf.

D. de la Motte. Harmonielehre. Deutscher Taschenbuch Verlag / B¨arenreiter Verlag, M¨unchen, 1976. ISBN 3-423-30166-X.

D. de la Motte. Kontrapunkt. Deutscher Taschenbuch Verlag / B¨arenreiter Verlag, M¨unchen, 1981. ISBN 3-423-30146-5.

A. Effenberg, J. Melzer, A. Weber, and A. Zinke. Motionlab sonify: A framework for the sonification of human motion data. In Ninth Inter- national Conference on Information Visualisation (IV ’05), pages 17–23. IEEE Press, July 2005.

A. O. Effenberg. Synergien der Sinne f¨urdie Bewegungsregulation, volume 7 of Sportpsychologie. Europ¨aischer Verlag der Wissenschaften, Frankfurt am Main, 2004.

A. O. Effenberg. Movement sonification: Effects on perception and action. IEEE Multimedia, 12(2):53–59, Apr. 2005.

A. O. Effenberg. Sonification - ein akustisches Informationskonzept zur men- schlichen Bewegung, volume 111 of Beitr¨agezur Lehre und Forschung im Sport. Verlag Karl Hoffmann, Schorndorf, 1996. ISBN 3-778016113.

A. A. Ekdale and A. C. Tripp. Paleontological sonification: Letting music bring fossils to your ears. Journal of Geoscience Education, 53(3):271–280, 124 BIBLIOGRAPHY

May 2005. URL http://www.nagt.org/files/nagt/jge/abstracts/ Ekdale v53n3p271.pdf.

W. T. Fitch and G. Kramer. Sonifying the body electric: Superiority of an auditory over a visual display in a complex, multivariate system. In G. Kramer, editor, Auditory Display - Sonification, Audification and Au- ditory Interfaces, volume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 307–325, Reading, Massachsetts, 1994. Addison- Wesley Publishing Company.

J. Fox and J. Carlile. Sonimime: Movement sonification for real-time timbre shaping. Technical report, Stanford University, May 2005. URL http: //ccrma.stanford.edu/∼jrobfox/fox carlile nime05.pdf.

W. W. Gaver. Using and creating auditory icons. In G. Kramer, editor, Auditory Display - Sonification, Audification and Auditory Interfaces, vol- ume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 417–446, Reading, Massachsetts, 1994. Addison-Wesley Publishing Company.

H. Gockel. Auditory Discrimination of Spectral Shape - Cues and Limits in Perception. PhD thesis, Universit¨at Oldenburg, April 1996.

C. Hayward. Listening to the earth sing. In G. Kramer, editor, Auditory Display - Sonification, Audification and Auditory Interfaces, volume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 369–404, Reading, Massachsetts, 1994. Addison-Wesley Publishing Company.

T. Hermann. Sonification for Exploratory Data Analysis. PhD thesis, Universit¨at Bielefeld, February 2002. URL http://www.techfak. uni-bielefeld.de/ags/ni/projects/datamining/datason/files/ HermannPhD2002.pdf.

D. H. Jameson. Sonnet: Audio-enhanced monitoring and debugging. In G. Kramer, editor, Auditory Display - Sonification, Audification and Au- ditory Interfaces, volume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 253–265, Reading, Massachsetts, 1994. Addison- Wesley Publishing Company.

G. Kramer. An introduction to auditory display. In G. Kramer, editor, Audi- tory Display - Sonification, Audification and Auditory Interfaces, volume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 1–77, Reading, Massachsetts, 1994. Addison-Wesley Publishing Company. BIBLIOGRAPHY 125

G. Kramer, B. Walker, T. Bonebright, P. Cook, J. Flowers, N. Miner, J. Neuhoff, R. Bargar, S. Barrass, J. Berger, G. Evreinov, W. T. Fitch, M. Gr¨ohn, S. Handel, H. Kaper, H. Levkowitz, S. Lodha, B. Shinn- Cunningham, M. Simoni, and S. Tipei. Sonification report: Status of the field and research agenda. Technical report, National Science Foun- dation, 1999. URL http://sonify.psych.gatech.edu/publications/ pdfs/1999-NSF-Report.pdf.

K. Lee, G. Sell, and J. Berger. Sonification using digital waveguides and 2- and 3- dimensional digital waveguide mesh. In Proceedings of ICAD 05-Eleventh Meeting of the International Conference on Auditory Display, pages 140–145, July 2005. URL http://ccrma.stanford.edu/∼kglee/ pubs/klee icad05.pdf.

J. Melzer. Ein rahmenwerk f¨urdie sonification von bewegungsdaten. Mas- ter’s thesis, Institut f¨urInformatik, Abteilung II, Rheinische Friedrich- Wilhelms-Universit¨atBonn, 2005.

E. M´etois. Musical Sound Information - Musical Gestures and Embedding Synthesis. PhD thesis, Massachusetts Institute of Technology, February 1997. URL http://xenia.media.mit.edu/∼metois/Phd/eymphd.pdf.

U. Michels. Systematischer Teil, Historischer Teil: Von den Anf¨angenbis zur Renaissance, volume 1 of dtv-Atlas zur Musik. Deutscher Taschenbuch Verlag / B¨arenreiter-Verlag Karl V¨otterle, M¨unchen, 1977.

SMDIPROT - Documentation of the SCSI Musical Data Interchange Proto- col. Peavey Electronics Corporation, 1991.

H. Pollard and E. Jansson. A tristimulus method for the specification of musical timbre. Acustica, 51, 1982.

M. Puckette. Pd Documentation, 2007. URL http://crca.ucsd.edu/∼msp/ Pd documentation/index.htm.

M. Puckette. Low-dimensional parameter mapping using spectral envelopes. In Proceedings, ICMC, 2004.

M. Puckette. Theory and techniques of electronic music v 0.08, March 2006. URL http://crca.ucsd.edu/∼msp/techniques/v0.08/book.pdf.

M. Puckette. Formant-based audio synthesis using nonlinear distortion. Jour- nal of the Audio Engineering Society, 43(1):40–47, 1995. URL http: //www.crca.ucsd.edu/∼msp/Publications/jaes95.ps. 126 BIBLIOGRAPHY

A. Riley. A real-time tristimulus synthesizer. Technical report, University of York, June 2004. URL http://www-users.york.ac.uk/∼dmh8/papers/ tristim-riley.PDF. W. Salmen and N. J. Schneider, editors. Der musikalische Satz. Helbling, Rum/Innsbruck, 1987. ISBN 3-900590-03-6. T. Stilson and J. Smith. Alias-free synthesis of classic analog waveforms, 1996. URL http://www-ccrma.stanford.edu/∼stilti/papers. H. Terasawa, M. Slaney, and J. Berger. Determining the euclidean distance between two steady state sounds. In Proceedings of the 9th International Conference on Music Perception & Cognition, International Conference on Music Perception & Cognition, 2006. URL http://ccrma.stanford.edu/ ∼hiroko/timbre/Terasawa2006 ICMPC9.pdf. B. Vercoe. The Canonical Csound Reference Manual, 2006. URL http: //ecmc.rochester.edu/ecmc/docs/csound5 manual.pdf. R. M. Warren. Perception of acoustic sequences: global integration versus temporal resolution. In S. McAdams and E. Bigand, editors, Thinking in Sound - The Cognitive Psychology of Human Audition, chapter Three, pages 37–68. Clarendon Press, Oxford, 1993. E. M. Wenzel. Spatial sound and sonification. In G. Kramer, editor, Auditory Display - Sonification, Audification and Auditory Interfaces, volume XVIII of Santa Fe Insitute Studies in the Sciences of Complexity, pages 127–150, Reading, Massachsetts, 1994. Addison-Wesley Publishing Company. M. Wright. OpenSound Control Specification v 1.0. Center for New Mu- sic and Audio Technology (CNMAT), University of California, Berkeley, Mar. 2002. URL http://www.cnmat.berkeley.edu/OpenSoundControl/ OSC-spec.html. M. Wright, A. Freed, and A. Momeni. Opensound control: State of the art 2003. In Proceedings of the 2003 Conference on New Interfaces for Musical Expression (NIME-03), Montreal, Canada, pages 153–159, 2003. URL www.cnmat.berkeley.edu/Research/NIME2003/NIME03 Wright.pdf. I. Xenakis. Formalized Music. University of Indiana Press, 1971. H. Zhao, B. K. Smith, K. L. Norman, C. Plaisant, and B. Shneiderman. Interactive sonification of choropleth maps. IEEE MultiMedia, 12(2):26– 35, 2005. URL http://doi.ieeecomputersociety.org/10.1109/MMUL. 2005.28. BIBLIOGRAPHY 127

F. Zimmer, editor. bang. Pure Data (1. International PD-Convention Graz). Wolke Verlag, Hofheim, Aug. 2006. URL http://pd-graz.mur.at/ label/book01/bangbook.

J. Zm¨olnig. HOWTO write an External for puredata. Institute of Electronic Music and Acoustics, Graz, Sept. 2001. URL http://iem.kug.ac.at/pd/ externals-HOWTO/pd-externals-HOWTO.pdf.