<<

Institute of Electronic Music and - IEM, University for Music and Dramatic Arts Graz

Science By Ear. An Interdisciplinary Approach to Sonifying Scientific Data Alberto de Campo

Dissertation

Graz, February 23, 2009

Supervisor: Prof Dr Robert H¨oldrich (IEM/KUG), Prof Dr Curtis Roads (MAT/UCSB) ii

Science By Ear. An Interdisciplinary Approach to Sonifying Scientific Data

Author: Alberto de Campo Contact: [email protected] Supervisor: Prof Dr Robert H¨oldrich (IEM/KUG), Prof Dr Curtis Roads (MAT/UCSB) Contact: [email protected], [email protected]

Dissertation Institute of Electronic Music and Acoustics - IEM, University for Music and Dramatic Arts Graz Inffeldgasse 10, A-8020 Graz, Austria

February 23, 2009, 211 pages

Abstract Sonification of Scientific Data is intrinsically interdisciplinary: It requires collaboration between experts in the respective scientific domains, in , in artistic design of synthetic , and in working with appropriate programming environments. The SonEnvir project hosted at IEM Graz put this view into practice: in four domain sciences, sonification designs for current research questions were realised. This dissertation contributes to sonification research in three aspects: The body of sonification designs realised within the SonEnvir context is described, which may be reused in sonification research in different ways. The software framework built with and for these sonification designs is presented, which supports fluid experimentation with evolving sonification designs. A theoretical model for sonification design work, the Sonification Design Space Map, was synthesised based the analysis of this body of sonification designs (and a few selected others). This model allows systematic reasoning about the process of creating sonifica- tion designs, and provides concepts for analysing and categorising existing sonifications designs more systematically.

Deutsche Zusammenfassung - German abstract Die Sonifikation von wissenschaftlichen Daten ist intrinsisch interdisziplin¨ar:Sie verlangt Zusammenarbeit zwischen ExpertInnen in den jeweiligen wissenschaftlichen Gebieten, in Psychoakustik, in der k¨unstlerischenGestaltung von synthetischem Klang, und in der Arbeit mit geeigneten Programmierumgebungen. Das Projekt SonEnvir, das am IEM Graz stattfand, hat diese Sichtweise in die Praxis umgesetzt: in vier wissenschaftlichen Gebieten (domain sciences) wurden Sonifikations-Designs zu aktuellen Forschungsfragen realisiert. iii

Diese Dissertation tr¨agtdrei Aspekte zur Sonifikationforschung bei: Der Korpus der im Kontext von SonEnvir entwickelten Sonification Designs wird detail- liert beschrieben; diese Designs k¨onnen in der Forschungsgemeinschaft in verschiedener Weise Weiterverwendung finden. Das Software-Framework, das f¨urund mit diesen Designs gebaut wurde, wird beschrieben; es erlaubt fliessendes Experimentieren in der Entwicklung von Sonifikationsdesigns. Ein theoretisches Modell f¨urdie Gestaltung von Sonifikationen, die Sonification Design Space Map, wurde auf Basis der Analysen dieser (und ausgew¨ahlteranderer) Designs synthetisiert. Dieses Modell erlaubt systematisches Nachdenken (reasoning) ¨uber den Gestaltungsprozess von Sonifikationsdesigns, und bietet Konzepte f¨urdie Analyse und Kategorisierung existierender Sonifikationsdesigns an.

Keywords: Sonification, Sonification Theory, Perceptualisation, Interdisciplinary Re- search, Interactive Software Development, Just In Time Programming iv Acknowledgements

First of all, I would like to thank Marianne Egger de Campo for designing several versions of the XENAKIS proposal with me - a sonification project with European partners that eventually became SonEnvir. Then, I would like to thank my research partners in the SonEnvir project: Christian Day´e,Christopher Frauenberger, Kathi Vogt and Annette Wallisch, without whom this work would not have been possible. I would like to thank Robert H¨oldrichfor his collaboration on the grant proposals, and for his contribution to the EEG realtime sonification; and Gerhard Eckel for leading the SonEnvir project for most of its lifetime. I would like to thank the participants of the Science By Ear workshop, who have been very open to a very particular experimental setup in interdisciplinary collaboration, especially for the discussions which eventually led to formulating the concept of the Sonification Design Space Map. A very special thank you is in order for the brave people who were willing to try programming sonification designs just-in-time within this workshop: Till Bovermann, Christopher Frauenberger, Thomas Musil, Sandra Pauletto, and Julian Rohrhuber. For the Spin Models, the following Science By Ear participants also worked on a sonifica- tion design for the Ising model (besides the SonEnvir team): Thomas Hermann, Harald Markum, Julian Rohrhuber and Tony Stockman. Concerning the background in theo- retical physics, we would also like to thank Christof Gattringer, Christian Bernd Lang, Leopold Mathelitsch and Ulrich Hohenester. For the piece Navegar, I would to thank Peter Jakober for researching the detailed timeline, and Marianne Egger de Campo for suggesting the Gini index as an interesting variable.

Alberto de Campo Graz, February 23, 2009 Contents

1 Introduction1 1.1 Motivation...... 2 1.2 Scope...... 3 1.3 Methodology...... 4 1.4 Overview of this thesis...... 5

2 Psychoacoustics, Perception, Cognition, and Interaction6 2.1 Psychoacoustics...... 6 2.2 Auditory perception and memory...... 8 2.3 Cognition, action, and embodiment...... 10 2.4 Perception, perceptualisation and interaction...... 11 2.5 Mapping, mixing and matching metaphors...... 12

3 Sonification Systems 13 3.1 Background...... 14 3.1.1 A short history of sonification...... 14 3.1.2 A taxonomy of intended sonification uses...... 17 3.2 Sonification toolkits, frameworks, applications...... 18 3.2.1 Historic systems...... 18 3.2.2 Current systems...... 19 3.3 Music and sound programming environments...... 20 3.4 Design of a new system...... 23 3.4.1 Requirements of an ideal sonification environment...... 23 3.4.2 Platform choice...... 24 3.5 SonEnvir software - Overall scope...... 24 3.5.1 Software framework...... 25

v vi

3.5.2 Framework structure...... 25 3.5.3 The Data model...... 26

4 Project Background 29 4.1 The SonEnvir project...... 29 4.1.1 Partner institutions and people...... 29 4.1.2 Project flow...... 30 4.1.3 Publications...... 31 4.2 Science By Ear - An interdisciplinary workshop...... 32 4.2.1 Workshop design...... 32 4.2.2 Working methods...... 32 4.2.3 Evaluation...... 33 4.3 ICAD 2006 concert...... 34 4.3.1 Listening to the Mind Listening...... 34 4.3.2 Global Music - The World by Ear...... 34

5 General Sonification Models 37 5.1 The Sonification Design Space Map (SDSM)...... 38 5.1.1 Introduction...... 38 5.1.2 Background...... 38 5.1.3 The Sonification Design Space Map...... 41 5.1.4 Refinement by moving on the map...... 43 5.1.5 Examples from the ’Science by Ear’ workshop...... 47 5.1.6 Conclusions...... 50 5.1.7 Extensions of the SDS map...... 51 5.2 Data dimensions...... 52 5.2.1 Data categorisation...... 52 5.2.2 Data organisation...... 52 5.2.3 Task Data analysis - LoadFlow data...... 53 5.3 Synthesis models...... 56 5.3.1 Sonification strategies...... 57 5.3.2 Continuous Data Representation...... 57 5.3.3 Discrete Data Representation...... 61 5.3.4 Parallel streams...... 62 vii

5.3.5 Model Based Sonification...... 63 5.4 User, task, interaction models...... 64 5.4.1 Background - related disciplines...... 64 5.4.2 Music interfaces and musical instruments...... 65 5.4.3 Interactive sonification...... 66 5.4.4 ”The Humane Interface” and sonification...... 67 5.4.5 Goals, tasks, skills, context...... 69 5.4.6 Two examples...... 71 5.5 Spatialisation Model...... 74 5.5.1 Speaker-based sound rendering...... 75 5.5.2 Headphones...... 77 5.5.3 Handling speaker imperfections...... 80

6 Examples from Sociology 81 6.1 FRR Log Player...... 82 6.1.1 Technical background...... 82 6.1.2 Analysis steps...... 84 6.1.3 Sonification design...... 86 6.1.4 Interface design...... 87 6.1.5 Evaluation for the research context...... 88 6.1.6 Evaluation in SDSM terms...... 88 6.2 ’Wahlges¨ange’ - ’Election Songs’...... 90 6.2.1 Interface and sonification design...... 91 6.2.2 Evaluation...... 93 6.3 Social Data Explorer...... 94 6.3.1 Background...... 94 6.3.2 Interaction design...... 94 6.3.3 Sonification design...... 96 6.3.4 Evaluation...... 97

7 Examples from Physics 98 7.1 Quantum Spectra sonification...... 100 7.1.1 Quantum spectra of baryons...... 101 7.1.2 The Quantum Spectra Browser...... 101 viii

7.1.3 The Hyperfine Splitter...... 104 7.1.4 Possible future work and conclusions...... 107 7.2 Sonification of Spin models...... 109 7.2.1 Physical background...... 109 7.2.2 Ising model...... 110 7.2.3 Potts model...... 111 7.2.4 Audification-based sonification...... 114 7.2.5 Channel sonification...... 116 7.2.6 Granular sonification...... 117 7.2.7 Sonification of self-similar structures...... 119 7.2.8 Evaluation...... 120

8 Examples from Speech Communication and Signal Processing 122 8.1 Time Series Analyser...... 122 8.1.1 Mathematical background...... 123 8.1.2 Sonification tools...... 124 8.1.3 The PDFShaper...... 124 8.1.4 TSAnalyser...... 125 8.2 Listening test...... 127 8.2.1 Test data...... 127 8.2.2 Listening experiment...... 128 8.2.3 Experiment results...... 129 8.2.4 Conclusions...... 132

9 Examples from Neurology 134 9.1 Auditory screening and monitoring of EEG data...... 134 9.1.1 EEG and sonification...... 134 9.1.2 Rapid screening of long-time EEG recordings...... 135 9.1.3 Realtime monitoring during EEG recording sessions...... 136 9.2 The EEG Screener...... 136 9.2.1 Sonification design...... 136 9.2.2 Interface design...... 138 9.3 The EEG Realtime Player...... 140 9.3.1 Sonification design...... 141 ix

9.3.2 Interface design...... 143 9.4 Evaluation with user tests...... 144 9.4.1 EEG test data...... 144 9.4.2 Initial pre-tests...... 145 9.4.3 Tests with expert users...... 145 9.4.4 Analysis of expert user tests EEG Screener 1 vs. 2...... 146 9.4.5 Analysis of expert user tests - RealtimePlayer 1 vs. 2...... 147 9.4.6 Qualitative results for both players (versions 2)...... 149 9.4.7 Conclusions from user tests...... 149 9.4.8 Next steps...... 150 9.4.9 Evaluation in SDSM terms...... 150

10 Examples from the Science by Ear Workshop 151 10.1 Rainfall data...... 151 10.2 Polysaccharides...... 156 10.2.1 Polysaccharides - Materials made by nature...... 156 10.2.2 Session notes...... 157

11 Examples from the ICAD 2006 Concert 160 11.1 Life Expectancy - Tim Barrass...... 160 11.2 Guernica 2006 - Guillaume Potard...... 162 11.3 ’Navegar E´ Preciso, Viver N˜ao E´ Preciso’...... 163 11.3.1 Navigation...... 163 11.3.2 The route...... 164 11.3.3 Data choices...... 166 11.3.4 Economic characteristics...... 167 11.3.5 Access to drinking water...... 168 11.3.6 Mapping choices...... 168 11.4 Terra Nullius - Julian Rohrhuber...... 169 11.4.1 Missing values...... 169 11.4.2 The piece...... 170 11.5 Comparison of the pieces...... 172

12 Conclusions 175 x

12.1 Further work...... 176

A The SonEnvir framework structure in subversion 177 A.1 The folder ’Framework’...... 177 A.2 The folder ’SC3-Support’...... 178 A.3 Other folders in the svn repository...... 178 A.4 Quarks-SonEnvir...... 179 A.5 Quarks-SuperCollider...... 179

B Models - code examples 180 B.1 Spatialisation examples...... 180 B.1.1 Physical sources...... 180 B.1.2 Amplitude panning...... 181 B.1.3 Ambisonics...... 183 B.1.4 Headphones...... 184 B.1.5 Handling speaker imperfections...... 186

C Physics Background 189 C.1 Constituent Quark Models...... 189 C.2 Potts model- theoretical background...... 192 C.2.1 Spin models sound examples...... 193

D Science By Ear participants 195

E Background on ’Navegar’ 197

F Sound, meaning, language 198 List of Tables

5.1 Scale types...... 52 5.2 The Keys...... 54 5.3 The Task...... 54 5.4 The Data/:...... 54 5.5 The Data:...... 55

6.1 Sectors of economic activities...... 95

9.1 Equally spaced EEG band ranges...... 135 9.2 Questionnaire scales for EEG sonification designs...... 146

11.1 Navegar - Mappings of data to sound parameters...... 169 11.2 Some stations along the timeline of ’Navegar’...... 170

B.1 Remapping spatial control values...... 182

E.1 Os Argonautas - Caetano Veloso ...... 197

xi List of Figures

2.1 Some aspects of auditory memory, from Snyder(2000)...... 9

3.1 Inclined plane for Galilei’s experiments on the law of falling bodies.... 15 3.2 UML diagram of the data model...... 27

5.1 The Sonification Design Space Map...... 42 5.2 SDS Map for designs with varying numbers of streams...... 46 5.3 All design steps for the LoadFlow dataset...... 48 5.4 LoadFlow - time series of dataset (averaged over many households)... 55 5.5 LoadFlow - time series for 3 individual households...... 55

6.1 The toilet prototype system used for the FRR field test...... 83 6.2 Graphical display of one usage episode (Excel)...... 85 6.3 FRR Log Player GUI and mixer...... 87 6.4 SDS Map for the FRR Log Player...... 89 6.5 GUI Window for the Wahlges¨angeDesign...... 91 6.6 SDS-Map for Wahlges¨ange...... 94 6.7 GUI Window for the Social Data Explorer...... 96

7.1 Excitation spectra of N (left) and ∆ (right) particles...... 101 7.2 The QuantumSpectraBrowser GUI...... 103 7.3 The Hyperfine Splitter GUI...... 106 7.4 Schema of spins in the Ising model as an example for Spin models.... 110 7.5 Schema of the orders of phase transitions in spin models...... 111 7.6 GUI for the running 4-state Potts Model in 2D...... 113 7.7 Audification of a 4-state Potts model...... 115 7.8 Sequentialisation schemes for the lattice used for the audification..... 115 7.9 A 3-state Potts model cooling down from super- to subcritical state... 117

xii xiii

7.10 Granular sonification scheme for the Ising model...... 118 7.11 A self similar structure as a state of an Ising model...... 119

8.1 The PDFShaper interface...... 125 8.2 The TSAnalyser interface...... 126 8.3 The interface for the time series listening experiment...... 128 8.4 Probability of correctness over ∆ kurtosis in set 1...... 129 8.5 Probability of correctness over ∆ kurtosis in set 2...... 130 8.6 Probability of correctness over ∆ skew in set 2...... 130 8.7 Probability of correctness over ∆ skew and ∆ kurtosis in set 2...... 131 8.8 Number of replays over ∆ kurtosis in set 2...... 132

9.1 The Sonification Design Space Map for both EEG Players...... 137 9.2 The EEGScreener GUI...... 138 9.3 The Montage Window...... 139 9.4 EEG Realtime Sonification block diagram...... 142 9.5 The EEG Realtime Player GUI...... 143 9.6 Expert user test ratings for both EEGScreener versions...... 147 9.7 Expert user test ratings for both RealtimePlayer versions...... 148

10.1 Precipitation in the Alpine region, 1980-1991...... 152 10.2 Orography of the grid of regions...... 153 10.3 SDSM map of Rainfall data set...... 156

11.1 Magellan’s route in Antonio Pigafetta’s travelogue...... 165 11.2 Magellan’s route, as reported in wikipedia...... 166 11.3 The countries of the world and their Gini coefficients...... 167 11.4 Terra Nullius, latitude zones...... 171 11.5 SDSM comparison of the ICAD 2006 concert pieces...... 173

B.1 The Spectralyzer GUI window...... 187

C.1 Multiplet structure of the baryons as a decuplet...... 191 Chapter 1

Introduction

Sonification of Scientific Data, i.e., the perceptualisation of data by means of sound in order to find structures and patterns within them, is intrinsically interdisciplinary: It requires collaboration between experts in the respective scientific domains the data come from, in psychoacoustics, in the artistic design of synthetic sound, and in working with appropriate programming environments to realise successful sonification designs. The concept of the SonEnvir project (hosted at IEM Graz from 2005 to 2007) has put this view into practice: in four science domains, sonification designs for current research questions were realised in close collaboration with audio programming specialists. The research reported here mainly took place in the SonEnvir project context. This dissertation contributes to sonification research in three ways:

• The body of sonification designs realised within SonEnvir is described in detail. They may be reused in sonification research by the community, both as concepts and as open-source implementations on which new solutions can be based.

• For realising these sonification designs, a software framework was built in the lan- guage SuperCollider3 that allows for flexible, rapid experimentation with evolving sonification designs (in Just In Time programming style). Being open-source, this framework may be reused and possibly maintained by the research community in the future.

• The analysis of this body of sonification designs (and a few others of interest) has eventually led to a general model of sonification design work, the Sonification Design Space Map. This contribution to sonification theory allows systematic reasoning about the process of developing sonification designs; based on data properties and context, it suggests candidates for the next experimental steps in the ongoing design process. It also provides concepts for analysing and categorising existing sonifications designs more systematically.

1 2 1.1 Motivation

Data are pervasive in modern societies: Science, politics, economics, and everyday life depend fundamentally on data for decisions. Larger and larger amounts of data are being acquired in the hope of their usefulness, taking advantage of continuing progress in information technology. While data may contain obvious information (i.e., well-understood ’content’), very often one also assumes they contain implicit or even hidden facts about the phenomena ob- served; understanding these hitherto unknown facts is highly desired. The research field that most directly addresses this interest is Data Mining, or Exploratory Data Analysis. Two approaches are in common use for extracting new information from data: One is statistical analysis, the other is data perceptualisation, i.e, making data properties perceptible to the human senses; and many existing software tools combine both: from statistics programs like Excel and SPSS, science and engineering environments like MAT- Lab and Mathematica, to a host of special-purpose tools for specific domains of science or economy. For scientists, perceptualisation of data is of vital interest; it is almost exclusively ap- proached by visual means for a combination of reasons1. Visualisation tools have per- meated scientific cultures to the point of being invisible; many scientists are well-versed in tools that visualize their results, and rarely do scientists question how accurately and adequately visual representations represent the data content. Many Virtual Reality sys- tems, such as the CAVE (Cruz-Neira et al.(1992)) and others, claim scientific data exploration as one of their stronger usage scenarios. Nevertheless, sound often seems to be added to such systems only as an afterthought, usually with the intention to achieve better ’immersion’ and emotional engagement (sometimes even alluding to cinema-like effects as the inspiration for the approach intended). Sonification, the representation of data by acoustic means, is a potentially useful al- ternative and complement to visual approaches that has not reached the same level of acceptance. This is the starting point for the research agenda described here: To create an interdisciplinary research setting where scientists from different domains (’domain scientists’) and specialists in artistic audio design and programming (’sound experts’) work together on auditory representations (’sonification designs’) for specific scientific data sets and their context. Such a venture should be well positioned to contribute to the progress of sonification as a scientific discipline. This has been the guiding strategy for the research project SonEnvir, described in some detail in section 4.1. The thesis presented here analyses sonification design work done within the SonEnvir project2. From these designs, it abstracts a general model for approaching sonification

1Availability, traditions of scientific cultures, ease of publishing on paper, and many others. 2 These analyses follow the notion of providing ’rich context’, taken from Science Studies (see e.g. 3 design work, from the general Sonification Design Space Map to detailed models of synthesis, spatialisation, and user interaction, presented in chapter5. This abstraction process is based on Grounded Theory (Glaser and Strauss(1967)), aiming to design flexible theoretical models that capture and explain as much detail as possible of the ob- servation data collected. Such an integrative approach appears to be the most promising way forward for sonification as a research discipline. Finally, it should be noted that scientists are not the only social group that is interested in the role of data for modern societies: Artists have always taken part in the general discourse in society, and in recent years, media artists as well as musicians and sound artists have become interested in creating works of art that represent data in artistically interesting ways. This aspect certainly played a role in my personal motivation for this dissertation project.

1.2 Scope

While multimodal display systems are extremely interesting for data exploration, the complexity of interactions between modalities and individual differences in perception is considerable. Therefore, the research work in this thesis has been intentionally limited to audio-centric data representation; however, simple forms of visual representations and haptic interaction have been provided where it seemed appropriate and helpful. Abstract representations of data by auditory means are not at all well understood yet; thus providing collections of different approaches for discussion may well be fruitful for the community. Special importance has been given to design methodology, and to considering the human-computer interaction loop; ranging from interaction in the design process to interactive choices and control in a realtime sonification design. Sonification designs may be intended for several different uses, with different aims. To give a few examples: Presentation entails clear, straightforward auditory demonstration of finished results; this may be useful in conference lectures, science talks, and similar situations. Exploration is all about interaction with the data, ’acquiring a feeling for one’s data’; this must necessarily remain informal, as it is a heuristic for generating hypotheses, which will be cross-checked and verified later with every analysis tool available. Analysis requires well-understood, reliable tools for detecting specific phenomena; ac- cepted by the conventions in the scientific domain they belong to. In Pedagogy, different students may learn to understand structures/patterns in data better when presented in different modalities; the auditory approach may be more ap-

Latour and Woolgar(1986); Rheinberger(2006)). 4 propriate and useful for some cases, e.g. people with visual impairments. This thesis focuses on studying the viability of exploration and analysis of scientific data by means of sonification; thus we (meaning the author and the SonEnvir team) developed exemplary cases in close collaboration with the domain scientists, implemented sonifica- tion designs for these cases, and analysed them to understand their general usefulness. We built a software framework to support the efficient realisation of these sonification designs; this is reported on in section 3.5.1, and available as open-source code here3. The sonification design prototypes developed are also accessible online4 and can be re-used both as concepts and as fully functional code. Note that the SonEnvir software environ- ment is not a complete ’big system’, but a flexible, extensible collection of approaches, and the infrastructure needed to support them. Ths software environment is freely extensible by others (being open source), and it aims to shorten development times for Auditory Display design sketches, thus allowing for freely moving between discussion and fast redesign. It also supports Auditory Display design pedagogy, as well as other uses, such as artistic projects involving data-related control of sound and image processes.

1.3 Methodology

The methodology employed in the SonEnvir project is centered on interdisciplinary col- laboration - domain scientists bring current questions and related data from their research context, and learn the basic concepts of sonification and auditory perception. The ques- tions are addressed with sonification design prototypes which are refined in iterative steps; common understanding and patience while learning is the key to eventual success. This concept was condensed into an experimental setting of the interdisciplinary work process: The Science By Ear workshop brought together international sonification ex- perts, mostly Austrian domain scientists, and audio programming specialists to work on sonification designs in a very controlled setting, within very short time frames. This workshop has been received very favorably by the participants, and is reported on in section 4.2. The methodology of the thesis is based on Grounded Theory5 (Glaser and Strauss (1967), see also section 5.1): By looking at a body of sonification designs, and analysing their context, design approaches and decisions, a general, practice-based model is ab- stracted: the Sonification Design Space Map (SDSM). Aspects of this model that

3 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/ 4 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/ 5In sociology, Grounded Theory is used inductively to create new hypotheses from observations or data collected with few pre-assumptions; this is in contrast to formulating hypotheses a priori and testing them by experiments. 5 warrant further detail are given: models for synthesis approaches, spatialisation, and user/task/interaction. The sonification designs analysed stem from the following sources:

• Work with SonEnvir domain scientists

• The Science By Ear workshop

• Submissions to the ICAD 2006 concert

1.4 Overview of this thesis

Chapter2, Psychoacoustics, Perception, Cognition, and Action, provides the necessary background in psychoacoustics, covering mainly psychoacoustics and auditory cognition literature that is directly relevant to sonification design work in more detail, rather than giving a general overview of the psychoacoustics literature. Chapter3, Sonification Systems, provides an introduction to sonification and its history, and covers some current systems that support sonification design work. The software system implemented for the SonEnvir project is described here from a more general perspective. Chapter4, Sonification and Interdisciplinary Research, provides further details on the interdisciplinary nature of sonification research; here, the research design of the SonEnvir project, and two activities within it, namely the Science By Ear workshop and the ICAD 2006 Concert, are described. Chapter5, General Sonification Models, is the main contribution to sonification theory in this thesis. It describes a general model for sonification design work, divided into several aspects: Overall design decisions and strategies are covered by the Sonification Design Space Map (SDSM); appropriate synthesis approaches are covered in the Synthe- sis model; user interaction is covered in the User Interaction model; and spatial aspects of sonification design are covered in the Spatialisation model. Chapters6,7,8, and9 present example sonification designs from the four domain sciences in SonEnvir, chapter 10 presents designs for two datasets explored in the Science By Ear workshop, and chapter 11 discusses and compares four works from the ICAD 2006 concert. This is the main practical and analytic contribution in this thesis. These chapters describe much of the body of sonification designs created within the SonEnvir project, as well as some others; this body of designs provided the background material for creating the General Sonification Models. Chapter 12, Conclusions, positions the scope of work presented within the wider context of sonification research, and concludes the insights gained. Chapter 2

Psychoacoustics, Perception, Cognition, and Interaction

2.1 Psychoacoustics

Psychoacoustics is a branch of psychophysics, the psychological discipline which studies the relationship between (objective) physical stimuli and their subjective perception by human beings; psychoacoustics then studies acoustic stimuli and their auditory percep- tion. Consequently, much of its literature is mainly concerned with the physiological base of auditory perception, i.e., finding out how perception works by creating stimuli that force the auditory system into specific interpretations of what it hears. When considering the stimuli used in traditional psychoacoustics experiments as a world of sounds, this world has an extremely reduced vocabulary. Of course this reduction makes perfect sense for experiments which try to clarify how (especially lower level, more physiological) perceptual mechanisms (assumed to be hard-coded in the ’neural hardware’) work, but the knowledge thus acquired is often only indirectly useful for sonification design work. A number of works are considered major references for the field: For psychoacoustics in general, Psychoacoustics - Facts and Models (Zwicker and Fastl(1999)) is very comprehensive; a good introductory textbook that is also accessible for non-specialists is An Introduction to the Psychology of Hearing (Moore(2004)); Bregman thoroughly studies the organisation of auditory perception in more complex (and thus nearer to everyday life) situations in Auditory Scene Analysis (Bregman(1990)); for the spatial aspects of human hearing, the standard reference is Spatial Hearing (Blauert(1997)). The typical background of psychoacoustics research is speech, spatial hearing, and music; sonification is fundamentally different from all of these, possibly with the exception of conceptual similarity to experimental strands of electronic music. The main concepts in these sources which are relevant for sonification research are:

6 7

Just Noticeable Differences (JNDs) for different audible properties of sounds (and consequently, the corresponding synthesis parameters) have been studied exten- sively; being aware of these helps to make sure that differences in synthetic sounds will be noticeable by users with normal hearing.

Masking Effects can occur when sonifications produce dense streams of events; under- standing how these depend on properties of the individual events is important to avoid perceptually ’losing’ information in the soundscape created by a sonification design.

Auditory Stream Formation and its rules are essential for multiple stream sonifica- tion; here it is important to control whether streams will tend to perceptually segregate or fuse into merged percepts.

Testing Methodology can be employed to verify that sonification users are physically able to perceive the sensory differences of interest. In effect, this entails writing auditory tests for sonification designs, such that designers can test that they can hear the differentiation they are aiming for, and that users can acquire analytical listening skills from well-controlled examples.

Cognitive and Memory Limits determine how we understand common musical struc- tures, and in fact, much music intended to be ’accessible’ is created (unknowingly) conforming to these limits. Sonification design issues from choices for time scal- ings, to user interface options for quick repetitions, choosing segments to listen to, and others, also crucially depend on these limits.

More recent research assumes the perspective of Ecological Psychoacoustics (Neuhoff (2004)), which takes into account that in daily life, hearing usually deals with complex environments of sounds, and thus allows for considering sonification designs from the perspective of ecologies of sounds that coexist. However, in a way sonification research and design work addresses a problem that is inverse to what psychoacoustics studies: rather than asking how we perceive existing worlds of sounds, the question in sonification is, how can we create a world of sounds that can communicate ’meaning’ by aggregates of finely differentiated streams of sound events? Bob Snyder actually addresses this inverse problem (i.e., how to create worlds of sounds that can communicate meaning) directly, if for a more traditional purpose: Music and Memory (Snyder(2000)) is a textbook for teaching composition to non-musicians in a perceptually deeply informed way, in a course Snyder gives at the Art Institute of Chicago. He describes how limitations of perception and memory influence artistic choices, and explains and demonstrates these with examples from a very wide range of musical cul- tures and traditions, almost entirely without traditional (Western) music notation. This 8 is intended to give musicians/composers informed free choice to stay within these limi- tations (and be ’accessible’), or approach and transgress them intentionally. By covering a wide range of psychoacoustics and auditory perception literature from the perspective of art practice, and describing it in terms accessible for art students, many of whom do not have traditional musical or scientific training, Snyder has created a very useful resource for practicing sonification designers who are willing to learn more about creating perceptually informed (and artistically interesting) worlds of sounds.

2.2 Auditory perception and memory

This section is a brief summary of the first part of Music and Memory, to provide enough background for readers to follow auditory perception-related arguments made later. Figure 2.1 shows a symbolic representation of the current models of both bottom-up and top-down perceptual processes. Bottom-up processes begin with sound exciting the eardrums, which gets translated into firing patterns of a large number of auditory nerves (ca. 30.000) coming from the ears. For a short time, a ’raw’ representation of the sound just heard remains in echoic memory. This raw signal is being held available for many concurrent feature extraction processes: these processes can include rather low- level aspects (which are almost certainly built into the neural ’hardware’) like ascribing sound components coming from the same direction to the same sound source, but also higher-level aspects like a surprising harmonic modulation in a piece of music (which is certainly culturally learned). The extracted features are then integrated into higher level percepts, often in several stages; in this process of abstraction, finer details are discarded, e.g. pitches in a musical context are categorised into a familiar tuning system, and nuances in rhythm and articulation usually also fade quickly from memory, unless one makes a special effort to retain them. Feature extraction interacts very strongly with long term memory: personal auditory experience determines what is in long term memory, so for any listener, the extracted features will unconsciously activate related memory content, which may or may not become conscious. Note that unconsciously activated memories feed back into the feature extraction processes, potentially priming the perceptual binding that happens toward specific cultural or personal notions. Short term memory (STM) is the only conscious part in figure 2.1: perceptual awareness of what one is hearing now, as well as the few related memories that become activated enough are the only results of perception one becomes consciously aware of. Short term memory content can be rehearsed, and thus kept in working memory for a while, which increases its chance of being committed to long term memory eventually. On average, 9

Figure 2.1: Some aspects of auditory memory, from Snyder(2000), p.6. The connections shown are only a momentary configuration of the perceptual system, and will continuously change quite rapidly. short term auditory memory can keep several seconds of sound around. This depends on ’chunking’: generally, it is assumed that one can keep 7 +- 2 items in working memory at any moment; however, one can and does increase this number by forming groups of multiple items, which are then treated by memory as single (bigger) items (again with a limit of ca. 7 applying). 10

The longer the auditory structures one tries to keep in memory, the more this depends on abstraction; i.e. forming categories, simplifying detail, and grouping into higher level items. This imposes a limit that is relevant for sonification contexts: comparing a hard to categorize structural shape that only becomes recognizable over two minutes to a potentially similar episode of two minutes one hears an hour later is very difficult. Generally, while bottom-up processes (usually equalled with perception) are usually as- sumed to be properties of the human neural system, and thus quite universal for all people with normal hearing, top-down processes (often equalled with cognition) are more per- sonal: they depend on cultural learning and are informed by individual experience, and thus can vary much more between individuals.

2.3 Cognition, action, and embodiment

A closer connection to sonification research, as well as some useful terminology, can be found in Music Cognition research: Recent work, e.g. by Marc Leman (Leman and Camurri(2006), and Leman(2006)), defines terminology that works well for describing what sonification can achieve. Leman talks of proximal and distal cues: Proximal (near) cues refer to the perceptually relevant features of auditory events, i.e. the audible properties ’on the surface’ of a sound event; by contrast, distal (further away) cues are actions inferred by the listeners that are likely to have caused the proximal cues. One example of distal actions would be a musician’s physical actions; and a little further away, a performer’s likely intentions behind her actions would also be considered distal cues. In recent years, Cognition research has widely moved away from the traditionally abstract notion of ’cognitive’ (meaning only dealing with symbols, and thus easy to model by computation); today the idea is widely accepted that cognition is deeply intertwined with the body, resulting in the concept of Embodied Cognition (see e.g. Anderson(2003)); applying this idea to auditory cognition, Leman says that the perception of gesture in music involves the whole body (of the performer and the listener). Music listeners who engage with listening may spontaneously express this by moving along with the music; when asked in experimental settings to make movements that correspond to the music they are listening to, even musically untrained listeners can be remarkably good at imitating performer gestures. Appropriating this terminology and applying it to sonification, one can describe soni- fication elegantly in these terms: sound design decisions inform details of the created streams of sound, i.e. they determine the proximal cues; ideally, these design decisions lead to perceptual entities (’auditory gestalts’), which can create a sensation of plausible distal cues behind the proximal cues. In case of success, these distal cues, which arise 11 within the listener’s perception, create an ’implied sense’ in the sounds presented (which could be called the ’sonificate’); thus these distal cues are likely to be closely related to ’data meaning’ (the equivalent to performers’ gestures, which are commonly taken to correspond closely to their intentions). In reflecting on his research on design of experimental electronic music instruments, David Wessel argues that the equivalent of the ’babbling phase’ (of small infants) is really essential for electronic music instruments: free-form, purpose-free interaction with the full possibility space of an instrument allows for more efficient and meaningful learning of what the instrument is capable of; just like children learn the phonetic possibilities of their vocal tract by (seemingly random) exploration (Wessel(2006)). He cites a classic experiment by Held and Hein, where two kittens are acquiring visual perception skills in very different ways: one kitten can move about the space, while the other kitten gets the same visual stimuli, but does not make its own choices of where to move - instead, it has the moving kitten’s choices imposed on it. This second kitten sustained considerable perceptual impairments. Wessel argues that the role of sensory-motor engagement is essential in auditory learning, but not well understood yet; he suggests designing electro-acoustic musical instruments such that they allow for the described forms of interaction by providing ’control intimacy’, in short low-latency, high- resolution, multichannel control data from performer gestures. This strategy should create a long term chance of arriving at the equivalent of virtuosity on (or at least mastery of) that instrument. Transposed to the context of sonification for scientific data, this is in full agreement with an Embodied Cognition perspective, and is another strong argument for allowing as much ’user’ interaction with sonification tools as possible: from haptic interfaces used e.g. for dynamic selection of data subsets, to access for tuning sound design parameters, to fully accessible code that defines how a particular sonification design operates.

2.4 Perception, perceptualisation and interaction

Perception of the physical world is intuitively non-modal and unified: events in the world are synchronous, so sensory input from different modalities is too1; many multimodal data exploration projects use virtual environments so that they can provide integrated visual, auditory and haptic modes for perception and interaction. The argument that learning is strongly dependent on sensory-motor involvement has found its way into HCI research literature; here, the common term is ’closing the human-computer interaction loop’ (see e.g. Dix(1996)).

1One interesting exception here is far away events that are both visible and audible; the puzzling difference between speeds of sound and light has led to the first measurements of the speed of sound. 12

In the context of sonification research, this has led to a special conference series, the Interactive Sonification workshops (ISon)2, so far held at Bielefeld (2004) and York (2007). In a special issue of IEEE Multimedia resulting from ISon2004, the editors emphasize that learning how to ’play’ a sonification design with physical actions, in fact similar to a musical instrument, really helps for an enactive understanding of both the nature of the perceptualisation processes involved and of the data under study (Hermann and Hunt(2005)). They find that there is a lack of research on how learning in interactive contexts take place; obviously this applies equally to interactive visual display applications.

2.5 Mapping, mixing and matching metaphors

Mapping data dimensions to representation parameters always involves choices. Walker and Kramer(1996) report interesting experiments on this topic: They play through a number of different permutations of mappings of the same data to the same set of display parameters, rated by the designers as ’intuitive’, ’okay’, ’bad’, and random, and they test how well users accomplished defined tasks with them. Expert assumptions turned out to be not as accurate as they expected; users could learn quite arbitrary mappings nearly as well as supposedly more ’natural ones’3. Whether this also holds true for exploratory contexts, when there is no pre-defined goal to be achieved, is an open question. Here, performance in an easy-to-measure (but trivial) task is not a very interesting criterium for sonification designs. On the other hand, it is of course good design to reduce cognitive load while users are involved in data exploration (by using cognitively simple mappings). For visualisation systems designed for exploration, the idea of measuring ’insight’ and the number of hypotheses formed in the exploration process has been suggested recently (Saraiya et al.(2005)); as far as we know this evaluation strategy has not been applied to exploratory sonification yet. In de Campo et al.(2004), we make the case that the impression of perceiving the sources of representations (in Leman’s terms, the distal cues) becomes easier when the metaphorical distance between the data dimension and the audible representation appears smaller; i.e., when a reasonably similar concept in the world of sound was found for the data property to be communicated. For example, almost all time-series data can be treated as if they were acoustic waveforms, which is what ’audification’ essentially does. With more complex data, the option of accessing data subsets by interactive choice, browsing through the data space with different auditory perspectives, can potentially allow forming new hypotheses on the data.

2 http://interactive-sonification.org/ 3This paper was republished in a recent special issue of IEEE Spectrum Multimedia on Sonification, with a new commentary (Walker and Kramer(2005a,b)) Chapter 3

Sonification Systems

In a certain Chinese Encyclopedia, the Celestial Emporium of Benevolent Knowledge, ”it is written that animals are divided into: (a) Those that belong to the Emperor, (b) embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those included in the present classification, (i) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very fine camelhair brush, (l) others, (m) those that have just broken a flower vase, (n) those that from a long way off look like flies.” in Jorge Luis Borges - The Analytical Language of John Wilkins Borges(1980)

Perceptualisation of scientific data by visualisation has been extremely successful. It is by now completely established scientific practice, and a wide variety of visualisation tools exist for a wide range of applications. Given the different set of perceptual strengths of audition compared to vision, sonification has long been considered to have similar potential as an exploratory tool for scientists which is complementary to visualisation and statistics. One strategy to realize more of this potential of sonification is to create a general software environment that supports fast development of sonification designs for a wide range of scientific applications, a design process in close interaction with scientific users, and simple exchange of fully functional sonification designs. This is the central idea of the SonEnvir project, as described in detail (in advance of the project itself) in de Campo et al.(2004). There are a number of software packages for sonification and auditory display (Ben-Tal et al.(2002); Pauletto and Hunt(2004); Walker and Cothran(2003), and others), all of which make different choices: whether they are to be used as toolkits to integrate into applications, or whether they are full applications already; which data formats or real- time input modalities are supported; what sonification models are assumed (sometimes

13 14 implicitly); and what kinds of interaction modes are possible and provided. This chapter provides a very short overview of the history of sonification, and describes the most common uses of sonification. Then, some historical and current sonification toolkits and environments are described, and the main types of audio and music program- ming environments. Finally, the system developed for the present thesis is described.

3.1 Background

3.1.1 A short history of sonification

The prehistory and early history of sonification is covered very interestingly (within a very good general overview) in Gregory Kramer’s Introduction to Auditory Display (Kramer (1994a)). Employing auditory perception for scientific research was not always as unusual as it is considered in today’s visually dominated scientific cultures; in fact, sonification can be said to have had a number of precursors: In medicine, the practice of auscultation, i.e., listening to the body’s internal sounds for diagnostic purposes, seems to have been present in Hippocrates’ time (McKusick et al. (1957)). This was long before Laennec, who is usually credited with the invention of the stethoscope in 1819. In engineering, mechanics tend to be very good at hearing which parts of a machine they are familiar with are not functioning well; just consider how much good car mechanics can tell just from listening to a running engine. Moving on to technically mediated acoustic means of measurement, there is evidence that Galileo Galilei employed listening for scientific purposes: Following Stillman Drake’s biography of Galilei (Drake(1980)), it seems plausible that Galilei used auditory infor- mation to verify the quadratic law of falling bodies (see figure 3.1.1). By running strings across the plane at distances increasing according to the quadratic law ( 1, 4, 9, 16, etc.), the ball running down the plane would ring the bells attached to the strings in a regular rhythm. In a reconstruction of the experiment, Riess et al.(2005) found that time measuring devices of the 17th century were likely too imprecise, while listening for rhythmic precision works well and is thus more plausible to have been used. An early example of a technical device rendering an environment variable perceptible which humans do not naturally perceive is the Geiger-M¨uller-Counter: Incidence of a particle generated by radioactive decay on the detector causes an audible click; the density of the irregular sequence of such clicks informs users instantly about changes in radiation levels. 15

Figure 3.1: Inclined plane for Galilei’s experiments on the law of falling bodies. This device was rebuilt at the Istituto e Museo di Storia della Scienza in Florence. c Photo Franca Principe, IMSS, Florence.

Sonar is another interesting case to consider: Passive , where one listens to un- derwater sound to determine distances and directions of ships, has apparently been experimented with by Leonardo da Vinci (Urick(1967), cited in Kramer(1994a)); in Active Sonar, sound pulses are projected in order to penetrate visually opaque volumes of water, listening to reflections to understand local topography, as well as moving objects of interest, be they vessels, whales, or fish swarms. In seismology, Speeth(1961) had subjects try to differentiate between seismograms of natural earthquakes and artificial explosions by playing them back speeded up. While subjects could classify the data very successfully, and rapidly (thanks to the speedup), little use was made of this until Hayward(1994) and later Dombois(2001) revived the practice and the discussion. 16

An interesting case of auditory proof of a long-standing hypothesis was reported in Pereverzev et al.(1997): In the early 1960s, Josephson and Feynman had predicted quantum oscillations between weakly coupled reservoirs of superfluid helium; 30 years later, the effect was verified by listening to an amplified vibration sensor signal of these mass-current oscillations (see also chapter7). One can say that the history of sonification research officially began with the first Inter- national Conference for Auditory Display (ICAD) in 1992, organised by Gregory Kramer to bring all the researchers working on related topics, but largely unaware of each other, into one research community. The extended book version of the conference proceedings (Kramer(1994b)) is considered the main founding document of this research domain, and the yearly ICAD conferences are still the central event for researchers, generating much of the body of sonification research literature. In 1997, the ICAD board wrote a report for the NSF (National Science Foundation) on the state of the art in sonification1; and more recently, a collection of seminal papers mostly presented at ICADs between 1992 and 2004 appeared as a special issue of ACM Transactions on Applied Perception(TAP, ACM(2004)), which shows the range and quality of related research. Many interesting applications of sonification for specific surposes have been made: Fitch and Kramer(1994) showed that an auditory display of medical patients’ life signs can be superior to visual displays; Gaver et al.(1991) found that monitoring a vir- tual factory (ArKola) by acoustic means works remarkably well for keeping it operating smoothly. The connection between neural signals and audition has its own fascinating history, from early neurophysiologists like Wedensky(1883) listening to nerve signals by telephone, to current EEG sonifications like Baier et al.(2007); Hermann et al.(2006); Hinterberger and Baier(2005); as well as musicians’ fascination with brainwaves, beginning with Alvin Lucier’s Music for Solo Performer (1965), among many others. (See also the ICAD concert 2004, described in section 4.3.) The idea of listening for scientific insight keeps being rediscovered by researchers even if they seem to be unaware of sonification research; e.g., what James Gimzewski calls Sonocytology (Pelling et al.(2004), see also here 2) is (in auditory display terminology) a form of audification of signals recorded with an atomic force microscope used as a vibration sensor. There are also current uses in Astronomy by NASA (Candey et al.(2006)), where one of the motivations given is providing better data accessibility for visually impaired scientists; and at University of Iowa3, mainly dealing with electromagnetic signals.

1http://icad.org/node/400 2 http://en.wikipedia.org/wiki/James Gimzewski 3 http://www-pw.physics.uiowa.edu/space-audio/ 17

Nevertheless, a large number of scientists still appear quite surprised when they hear of the idea of employing sound to understand scientific data.

3.1.2 A taxonomy of intended sonification uses

Sonification designs may be intended for a wide range of different uses, with substantially different aims4:

Presentation calls for clear, straightforward, auditory demonstration of finished results; this may be useful in conference lectures, science talks, teaching contexts, and other situations.

Exploration is very much about interaction with the data, ’acquiring a feeling for the data’; while this seems a rather fuzzy target, and is in fact hard to measure, it is actually indispensible and central. Following Rheinberger(2006), exploration must necessarily remain informal; it is a heuristic for generating hypotheses - once they appear on the epistemic horizon, they will be cross-checked and verified with every analysis tool available. So generating some hypotheses that turn out to be wrong eventually is not a problem at all; in the worst case, if too many hypotheses are wrong, this can be an efficiency issue.

Analysis requires well-understood, reliable tools for detecting specific phenomena, which are accepted by the conventions in the scientific domain they belong to. The prac- tice of auscultation in medicine may be considered to belong into this category, even though it only relies on physical means, with no electronic mediation. Also the informal practice of listening to seismic recordings belongs here.

Monitoring is intended for a variety of processes that benefit from continuous moni- toring by human observers, whether in industrial production, in medical contexts like intensive care units, or in scientific experiments. Human auditory perception habituates quickly to soundscapes with little change; any sudden changes, even of an unexpected nature, in the soundscape are easily noticed, and enable the observer to intervene if necessary.

Pedagogy - Different students may learn to understand structures/patterns in data better when presented in different modalities; an auditory approach to presentation may be more appropriate and useful in some cases. For example, students with visual impairments may benefit from data representations with sound, as research on auditory graphs shows (e.g. Harrar and Stockman(2007); Stockman et al. (2005)).

4Note that the points separated here may overlap; e.g. presentation and pedagogy certainly do. 18

Artistic Uses - Many works in sound art are sonification-based, whether they are sound- only installations, or more generally, data-driven multimedia works. The recent appearance of special topics issues like Leonardo Music Journal, Volume 16 (2006) confirm this trend, as do sonification research activities at art institutions like the Bern University of Arts5.

The intended uses a specific sonification system has been designed for largely determine the scope of its functionality, and its usefulness for different contexts.

3.2 Sonification toolkits, frameworks, applications

A number of sonification systems have been implemented and described since the 1980s. They all differ in scope of features, and limitations; some are historic, meaning they run on operating systems that are obsolete, while others are in current use, and thus alive and well; most of them are toolkits meant for integration into (usually visualisation) applications. Few are really open and easily extensible; some are specialised for very particular types of datasets. Current systems are given more space here, as they are more interesting to compare with the system developed for this thesis.

3.2.1 Historic systems

The Porsonify toolkit (Madhyastha(1992)) was developed at a time when realtime syn- thesis was still out of reach on affordable computers; thus Porsonify aimed to provide an interface for the Sun Sparc’s audio device and two MIDI synthesizers. Behaviour defined for a single sound event (usually triggered from a single data point) is formulated in sonic widgets, which generate control commands for the respective sound device. Example sonifications were created using data comparing living conditions of different U.S. cities (cf. the accompanying CD to Kramer(1994b)), and multi-processor performance data. The LISTEN toolkit (Wilson and Lodha(1996)) was written for SGI workstations, using (alternatively) the internal sound chip, or external MIDI as sound rendering; it was meant to be easy to integrate into existing visualisation software, which was done for visualising geometric uncertainty of surface interpolants, and for algorithmic uncertainty in fluid flow. The Musical Data Sonification Toolkit, or MUSE (Lodha et al.(1997)), was a followup project, aiming to map scientific data to musical sound. Also written for SGI, it uses mapping to very traditional musical notions: timbres are traditional orchestra instruments and vowel sounds generated with CSound instruments, rhythms come from a choice of

5see http://www.hkb.bfh.ch/y.html 19 seven dance rhythms, pitch is defined from the major scale, following rules for melodic shapes, and harmony is based on overtone ratios. It has been applied ”to visualize (sic) uncertainty in isosurfaces and volumetric data”. A later incarnation, MUSART (Musical Audio transfer function Real-time Toolkit, see Joseph and Lodha(2002)) sonifies data by means of musical sound maps. It converts data dimensions into ’audio transfer functions’, and renders these with CSound instru- ments. Users can personalise their auditory displays by choosing which data dimensions to map to which display parameters. In the article cited, the authors report uses for exploring seismic volumes for the oil industry. Again, the authors emphasize their use of musical concepts for sonification design. While not a single software system, Auditory Information Design by Stephen Barrass (Barrass(1997)), is a fascinating collection of multiple concepts (all with catchy names): it encompasses a task-data analysis method (’TaDa’), a collection of use cases for finding auditory metaphors for design (’ear-benders’), a set of design principles (’Hearsay’), a perceptually linearised information sound space (’GreyMUMS’), and tools for designing sonifications (’Personify’). The practical implementations described show a wide variety of approaches; they all share unix flavor, often being shell scripts that connect command- line programs. Thus it is not one consistent framework, but rather a collection of how-to examples. For data treatment, mostly perl scripts are used; for sound synthesis, CSound, which at the time was non-realtime. Some examples also appeared in the CSound book (Boulanger(2000)) mentioned below.

3.2.2 Current systems xSonify (Candey et al.(2006)) has been developed at NASA; it is also based on Java, and runs as a web service6. It aims at making space physics data more easily accessible to visually impaired people. Considering that it requires data to be in a special format, and that it only features rather simplistic sonification approaches (here called modi), it will likely only be used to play back NASA-prepared data and sonification designs. SonART (Ben-Tal et al.(2002); Yeo et al.(2004)) is a framework for data sonifica- tion, visualisation and networked multimedia application. In its latest incarnation, it is intended to be cross-platform and uses OpenSoundControl for communication between (potentially distributed) processes for synthesis, visualisation, and user interfaces. The Sonification Sandbox (Walker and Cothran(2003)) has intentionally limited range, but it covers that range well: Being written in Java, it is cross-platform; it generates MIDI output e.g. to any General MIDI synth (such as the internal synth on many soundcards). One can import data from CSV textfiles, and view these with visual graphs; a mapping editor lets users choose which data dimension to map to which sound parameter: Timbre

6http://spdf.gsfc.nasa.gov/research/sonification 20

(musical instruments), pitch (chromatic by default), amplitude, and (stereo) panning. One can select to hear an auditory reference grid (clicks) as context. It is very useful for learning basic concepts of parameter mapping sonification with simple data, and it may be sufficient for many auditory graph applications. Development is still continuing, as the release of version 4 (and later small updates) in 2007 show. The Sonification Integrable Flexible Toolkit (SIFT, see Bruce and Palmer(2005)) is again a toolkit for integration into other applications, typically for visualisation. While it is also written in Java and uses MIDI for sound rendering, it emphasizes realtime data input support from network sources. It has been used for oceanographic data sets; however, the paper cited describes the first prototype of this system, and no later versions of it seem to have been developed. Sandra Pauletto’s toolkit for Sonification (Pauletto and Hunt(2004)) is based on Pure- Data (see section 3.3 below), and has been used for several application domains: Elec- tromyelography data for Physiotherapy (Hunt and Pauletto(2006)), helicopter flight data, and others. While it supports some data types well, adapting it for new data is rather cumbersome, mainly because PureData is not a general-purpose programming language. SoniPy is a very recent and quite ambitious project, written in the Python language, and described in Worrall et al.(2007). It is still in the early stages of development at this time, but may well become interesting. Being an open source project, it is hosted at sourceforge7; at the beginning of this thesis, it did not exist yet. All these toolkits and applications are limited in different ways, based on resources for development available to their creators, and the applications envisioned for them. For the broad parallel approach we had in mind, and the flexibility required for it, none of these systems seemed entirely suitable, so we chose to build on a platform that is both a very efficient realtime performance system for music and audio processing and a full-featured modern programming language: SuperCollider3 (McCartney(2007)). To provide some more background, here is an overview of the three main families of music programming environments.

3.3 Music and sound programming environments

Computer Music has been dealing with programming to create sound and music struc- tures and processes for over fifty years now; current music and sound programming environments offer many features that are directly useful for sonification purposes as well. Mainly, three big families of programs have evolved; most other music programming

7http://sourceforge.net/projects/sonipy 21 systems are conceptually similar to one of them:

Offline synthesis - MusicN to CSound

MusicN languages started in 1957/58, from the Music I program developed at by Max Mathews and others; Music IV (Mathews and Miller(1963)), already featured many central concepts in languages, such as the idea of a Unit Generator as the building block for audio processes (unit generators can be e.g. oscillators, noises, filters, delay lines, and envelopes). As the first widely used incarnation, Music V, was written in FORTRAN and thus relatively easy to port to new computer architectures, it spawned a large number of descendants. The main strand of successors in this family is CSound, developed at MIT Media Lab beginning in 1985 (Vercoe(1986)), which has been very popular in academic computer music. Its main approach is to use very reduced language dialects for orchestra files (con- sisting of descriptions of DSP processes called instruments), and score files (descriptions of sequences of events that each call one specific instrument with specific parameters at specific times). A large number of programs were developed as compositional front- ends, to write score files based on algorithmic procedures, such as Cecilia (Pich´eand Burton(1998)), Cmix, Music, and others; so CSound has in fact created an ecosystem of surrounding software. CSound has a very wide range of unit generators and thus synthesis possibilities, and a strong community; e.g. the CSound Book (Boulanger(2000)) demonstrates its scope impressively. However, for sonification, it has a few disadvantages: Even though it is text- based, it uses specialised dialects for music, and thus is not a full-featured programming language. Any control logic and domain-specific logic would have to be built in other languages/applications, while CSound could provide a sound synthesis back-end. Being originally designed for offline rendering, and not built for high-performance realtime demands, it is not an ideal choice for realtime synthesis either. CSound has been ported to very many platforms.

Graphical patching - Max/FTS to Max/MSP(/Jitter) to PD/GEM

The second big family of music software began with Miller Puckette’s work at IRCAM on Max/FTS in the mid-1980s, which later evolved into Opcode Max, which eventually became Cycling’74’s Max/MSP/Jitter environment. In the mid-1990s, Puckette began developing an open source program called PureData (Pd), later extended with a graphics system called GEM. All these programs share a metaphor of ’patching cables’, with essentially static object allocation of both DSP and control graphs. This approach was never meant to be a full programming language, but a simple facility 22 to allow for patching multiple DSP processes written in lower-level (and thus more efficient) languages; with Max/FTS, the programs actually ran on a DSP card built by IRCAM. Thus, the usual procedure for making patches for more complex ideas often entails writing new Max or Pd objects in C; while these can run very efficiently if well written, special expertise is required, and the development process is rather slow. In terms of sound synthesis, Max/MSP has a much more limited palette than CSound, though a range of user-written MSP objects exist; support for graphics with Jitter has become very good recently. Both Max and Pd have a strong (and somewhat overlap- ping) user base; Pd is somewhat smaller, having started later than Max. While Max is commercial software with professional support by a company, Pd is open-source software. Max runs on Mac OS X and Windows, but not on linux, while Pd runs best on linux, reasonably well on Windows, and less smoothly on OS X.

Realtime text-based environments - SuperCollider, ChucK

The SuperCollider language and realtime system came from the idea of having both realtime synthesis and musical structure generation in one environment, using the same language. Like Max/PD, it can be said to be an indirect descendant of CSound. From SC1 written by James McCartney in 1996, it has gone through three complete rewriting cycles, thus the current version SC3 is a very mature system. In version 2, SC2, it inherited much of its language characteristics from Smalltalk; in SC3 the language and the synthesis engine were split into a client/server architecture, and many syntax features from other languages were adopted as options. Its sound synthesis is fully dynamic like CSound, it has been written for realtime use with scientific precision, and being a text- based, modern, elegant, full programming language, it is a very flexible environment for very many uses, including sonification. The range of unit generators is quite wide, though not as abundant as in CSound; synthesis in SC3 is very efficient. SC3 also provides a GUI system with a variety of interface widgets, but its main emphasis is on stable realtime synthesis. SC3 has a somewhat smaller user community, which is nevertheless quite active. Having become open source with version 3, it has since flourished in terms of development activity. SC3 runs very well on OS X, pretty well on Linux, and less well on Windows (though the SonEnvir team put some effort into improving the Windows port). The ChucK language has been written by Ge Wang and Perry Cook, starting in 2002. It is still under development, exploring specific notions such as being strongly-timed, and others. Like SC3, it is not really intended as a general purpose language, but as a music-specific environment. While being cross-platform, and having interfacing options similar to SC3 and Max, it has a considerably smaller palette of unit generator choices. One possible advantage of ChucK is that it has very fine-grained control over time; both synthesis and control can have single-sample precision. 23 3.4 Design of a new system

As the existing systems did not have the scope we required, we designed our own. A full description of the design of the Sonification Environment as it was before the SonEnvir project started is given in de Campo et al.(2004); the following section is updated from a post-project perspective.

3.4.1 Requirements of an ideal sonification environment

The main design aim is to allow fluid development of new and modification of existing sonification designs. By using modular software design which decouples components like basic data handling objects, data processing, sound synthesis processes, mappings used, playback approaches, and real-time interaction possibilities, all the individual aspects of one sonification design can be re-used as starting points for new designs. A Sonification Environment should:

• Read data files in various formats. The minimum is human-readable text files for small data sets, and binary data files for fast handling of large data sets. Reading routines for special file formats should be writable quickly. Realtime input from network sources should also be supported.

• Perform basic statistics on the data for user orientation. This includes (for ev- ery data channel): minimum, maximum, average, standard deviation, and simple histograms. This functionality should be user-extensible in a straightforward way.

• Provide basic playback facilities like ordered iteration (in effect, a play button with a speed control), loop playback of user-chosen segments, zooming while playing, data-controlled playback timing, and 2D and 3D navigation along user-chosen data dimensions. Later on, navigation along data-derived dimensions such as lower-dimensional projections of the data space is also desirable.

• Provide a rich choice of interaction possibilities: Graphical user interfaces, MIDI controllers, graphics tablets, other human interaction devices, and tracking data should be supported. (The central importance of interaction only became clear in the course of the project.)

• Provide a variety of possible synthesis approaches, and allow for changing and refining them while playing. (The initial design suggested a more static library of synthesis processes, which turned out to be unnecessary.)

• Allow for programming domain-specific models to run and generate data to sonify. This strongly suggests a full modern programming language. (This requirement only came up in the course of the project, for the physics sonifications.) 24

• Store sonification designs in human-readable text format: This allows for long- term platform independence of designs, provides possibilities for informal rapid exchange (text is easy to send by e-mail), and can be an appropriate and useful publication format for sonification designs that employ user interaction.

• Serve to build a library/database of high-quality sonification designs made in this environment, with real research data coming from a diverse range of scientific fields, developed in close collaboration with experts from these domains.

More generally, the implementation should be kept as lightweight, open, and flexible as possible to accommodate evolving new understanding of the design issues involved.

3.4.2 Platform choice

While PureData was a platform option for a while, we soon decided to stay entirely in SuperCollider3, based on the list of requirements given above. This decision had some benefits, as well as some drawbacks. The benefits we experienced were:

• A fully open source programming language is easy to extend in ways that are useful for a wider community;

• Interpreted languages like SC3 provide relatively simple entry to users programming (starting with little scripts, and changing details for experimentation);

• Readability has turned out to be very useful, as the code script is also a full technical documentation;

• An interactive development environment encourages code literacy, and thus general competence, of sonification ’users’. In this context, the notion of Just In Time Programming (as described e.g. in Rohrhuber et al.(2005)) has turned out to be extremely useful for interdisciplinary team development sessions, see chapter4.

The main drawback we encountered was that SC3 only runs really well on OS X, a bit more uncomfortably on linux (which was not used by any of the team members), while on Windows (which we had to support), it was initially quite unusable; this led to SonEnvir taking care of substantially improving the Windows port.

3.5 SonEnvir software - Overall scope

The main goal of the SonEnvir sonification framework is to allow for the creation of meaningful and effective sonifications more easily. Such a sonification environment sup- 25 ports sonification designers by providing software components, and concepts for using them. It combines all the important aspects that need to be considered: data represen- tation, interaction, mapping and rendering. A famous phrase about computer music programming systems is that they are kitchens, not restaurants, which also applies to SonEnvir: rather than giving users a menu of finished dishes to choose from (which other people created), it provides ingredients, utensils, recipes and examples.

3.5.1 Software framework

SuperCollider3 has a very elegant extension system; one can assemble components to be published in different ways: Classes, their respective Help files, UnitGenerator plugins, and all kinds of support files can be combined into packages which can be downloaded, installed, and de-installed directly from within SC3. Such packages are called Quarks. Currently, most of the code created in the project is under version control with Subversion at the SonEnvir website8. In order to achieve maximum reuse, some parts have been converted into Quarks, while for others, this is still in process. Many items of general usefulness have already been migrated directly into the main SC3 distribution. The sonification-specific components will remain available at the SonEnvir website, as will the collection of sonification designs. (For an overview, see the end of this section.) The subsequent sections briefly describe the overall structure of the framework and the design and implementation of the data representation module. For reference, the framework structure in the subversion repository is described in appendixA.

3.5.2 Framework structure

The SonEnvir framework implements a generic sonification model consisting of four aspects:

Data model The data model unifies the notions of how data are handled in the frame- work and deals with the diversity of data types that can be used for sonification.

User-Interaction model This aspect deals with all aspects of interactive model for exploration and analysis of data. It is mainly implemented in the JInT package (see below).

Synthesis model The mapping onto properties of sound or the creation of more com- plex structures of sound by a sonification model. As all the needed code infras-

8https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/ 26

tructure existed in the JITLib library within SC3, it is not coded as classes, but only a conceptual model, described in section 5.3, Synthesis Models.

Spatialisation model This model takes care of the audio rendering of the designed sonification for different requirements and playback environments. It is described in detail in section 5.5, Spatialisation Model. Its code components reside partially in SC3 itself, in the Framework/Rendering folder, and in the AmbIEM package9, which is now a SuperCollider quark package.

All these models taken together allow for designing sonifications in a flexible way. As the data model is the most implementation-related aspect, it is described in detail here, and not in the more conceptual chapter on the general models (chapter5).

3.5.3 The Data model

The aim of the data model is to provide a unified representation of different types of data that can be used in the sonification framework. This demands a highly flexible and abstract model as data may have very different structures. The data model also provides functionality for input/output in the original form the data are supplied in, and includes various statistical functions for data analysis. All models are object-oriented in design, and the classes and their inter-relations are described using UML (Unified Modelling Language) charts. In order to avoid possible name-space conflicts with other class definitions on any target platform, the classes in the SonEnvir framework have a prefix ”SE”. Figure 3.2 illustrates the design of the data model in a UML graph. The SEData class is central to the design of the data model. It is the highest abstraction of any kind of dataset to be sonified. Besides providing properties for the name and the data source, the actual data is organised in channels. An SEData object contains instances of SEDataChannel, which is the base class for all different types of data channels and represents a single dimension in a dataset. Data channels can be numerical data, but also any sort of nominal data with the only restriction that they are organised as a sequence and addressable by index. SENumDataChan specifies that the data values in the given channel are all numbers, and provides a basic set of numerical properties of this set of numbers. Besides the usual minimum, maximum, mean, and standard deviation values, it also implements functions that proved to be useful for sonifications, such as removing offsets or a drift, as well as normalising and ’whitening’ the numbers. Another important subclass of numeric data channel is covering all time-based data channels. These basically refer to two types: time series (SETimeSeriesCh) providing

9AmbIEM is a port of a subset of a system by Musil et al.(2005); Noisternig et al.(2003). 27

Figure 3.2: UML diagram of the data model. a sample rate, and data with time stamps (SETimeStampsCh). Although basically a numeric data channel as well, we decided to introduce another basic type for vector based data with a subclass for 3D spatial data. Any of the data channel types mentioned above may be combined in order to form a dataset described through SEData. For convenience, there are two predefined classes derived from SEData that cover some common combinations of data channels: SETimeData and SESpatialData. Every SEData instance is associated with a SEDataSource. This class abstracts the 28 access to the raw data material. It takes care that the space required for big datasets is made available when needed, and uses different parsers for reading different file formats. If needed, it can be extended to include network resources and real-time data. Each SEDataSource also provides information about the type of each data series that is con- tained in the raw data. This might be available from headers of some data formats, or it has to be set explicitly such that SEData can create the appropriate SEDataChannels. Like the entire framework, the data model is provided as a class library for SuperCollider3. Once the library is brought into place, it is compiled at startup of the SuperCollider3 language. The following listing illustrates using SEData objects in SC3:

// Example listing of data model usage in SC3. ( // read an ascii data file ~vectors = FileReader.readInterpret( "~/data/C179_T_s.dat", true, true ); // supply data channel names by hand ~chanNames = [’temperature’, ’solvent’, ’specificHeat’, ’marker’];

// make an SEData object ~phaseData = SEData.fromVect( ’phaseData’, ~chanNames, ~vectors, SENumDataChan // all numerical data, so use SENumDataChan class. ); // provide simple statistics ~phaseData.analyse; ~phaseData.means.postln; ~phaseData.stdDevs.postln; ) Chapter 4

Project Background

A physicist, a chemist, and a computer scientist try to go up a hill in an ancient car. The car crawls, stutters, and then stalls. The physicist says, ”The transmission ratio is wrong - I’ll take a look at it.”; the chemist says, ”No, the fuel mix is wrong, I’ll experiment with it.”; the computer scientist says, ”why don’t we all get out, close the doors, get back in, and try again.”.

This chapter describes the research design for and the working methodology developed within the SonEnvir project, the design and the process of the Workshop ’Science By Ear’ the project team held in March 2006, and the concert the team organized for the ICAD 2006 conference in London. As most of the work presented in this dissertation was done within the context of the SonEnvir project, it is helpful to provide some background on that context here.

4.1 The SonEnvir project

The central concept of the SonEnvir project was to create an interdisciplinary setting in which scientists from different domains and sonification researchers could learn how to work on data perceptualisation by auditory means. The project took place from January 2005 to March 2007, and it was the first collaboration of all four universities in Graz. SonEnvir was funded by the Future Funds of the Province of Styria.

4.1.1 Partner institutions and people

The project brought together the following institutions as partners:

• the Institute of Electronic Music and Acoustics (IEM), at the University of Music and Dramatic Arts Graz;

29 30

• the Theoretical Physics Group - Institute of Physics, at the University of Graz;

• the Institute for Sociology, at the University of Graz;

• the University Clinic for Neurology, at the Medical University Graz;

• and the Signal Processing and Speech Communication Laboratory SPSC, at the University of Technology Graz.

The IEM was the host institution coordinating the project, and the source of audio design and programming as well as sonification expertise in the project. The main researcher here was the author of this dissertation. From the Institute of Sociology, Christian Day´eprovided data from a variety of sociologi- cal contexts, and co-designed and experimented with sonifications for them, as discussed in section6. He was also responsible for feedback and evaluation of the interdisciplinary work process from the perspective of sociology of science. The Physics group had changing members in the course of the project: initially Bianka Sengl provided data from quantum physics research, namely from competing Constituent Quark models, as discussed in section 7.1 and appendix C.1. Later on, Katharina Vogt worked on a number of different physics topics and sonifications for them, including the Ising and Potts models discussed in section 7.2. The Signal Processing and Speech Communication Laboratory was represented by Chris- topher Frauenberger, who worked on a number of different sonification experiments, among others on propagation of electromagnetic waves, and time series classification, as discussed in section8. He also contributed substantially to the code implementa- tions, and has become the main developer for the python-based Windows version of SuperCollider3. For the Institute of Neurology, Annette Wallisch was the main researcher. She provided a variety of EEG data for experimenting with sonification designs for screening and monitoring, as described in section9. She also dealt with an industry research partner, the company BEST medical systems (Vienna), and she wrote a dissertation (Wallisch (2007), in German) on the research done within SonEnvir.

4.1.2 Project flow

In order to create a broad base of sonification designs for a wide range of data from the scientific contexts described, the project was structured in three iterations. Each iteration began with identifying potentially interesting research questions from the do- mains, and collecting example data for these. Then sonification designs were created and tested, which became a more collaborative and experimental cooperation as the project proceeded. 31

In each of the scientific fields, we started by building simple sonification designs to begin the discussion process. The key question here has turned out to be learning how to work in such a highly interdisciplinary group, how to build bridges for common understanding, and to develop a common language for collaboration. We focused on building sonification designs that demonstrate the usefulness of sonifi- cation by showing practical benefit for the respective scientific field. Identifying good research questions at this intermediate level of complexity was not trivial. Nevertheless, being able to come up with sufficiently convincing examples to reach the immediate partner ’audience’ is very important. Finally, the project goal was to integrate all the approaches that worked well in one context into a single software framework that includes all the software infrastructure, thus making them re-usable for a wide range of applications; this was intended to result in a meaningful contribution to the sonification community. The diversity of the research group and their problem domains forced us toward very flexible and re-usable solutions. By making our collection of implemented sonification designs freely accessible, we hope to capture much of what we have learned in a form that other researchers can build on.

4.1.3 Publications

Many research results were published in conference and journal papers, which are indi- cated in the respective chapters, and briefly listed here: de Campo et al.(2004) was a project plan for SonEnvir before the fact. Papers on sociological data (Day´eet al.(2005)), quantum spectra (de Campo et al.(2005d)), and the project in general (de Campo et al.(2005a)) were presented at ICAD and ICMC 2005. We wrote some papers with external collaborators, on electrical systems (Fickert et al.(2006)), and various kinds of lattice data (de Campo et al.(2005c), de Campo et al.(2006b), de Campo et al.(2006c), de Campo et al.(2005b)). For ICAD 2006, we contributed an overview paper, de Campo et al.(2006a), and organ- ised a concert of sonifications described in section 4.3, contributing a piece described in de Campo and Day´e(2006) and in section 11.3. At ICAD 2007, we presented papers on EEG (de Campo et al.(2007)), time series (Frauenberger et al.(2007)), Potts models (Vogt et al.(2007)), and on the Design Space Map concept (de Campo(2007b)). At the ISon workshop in York 2007, we presented work on juggling sonification (Bovermann et al.(2007)) and the Sonification Design Space Map (de Campo(2007a)). Some project results and insights in the sociological context were also presented in two journal publications: Day´eet al.(2006) and Day´eand de Campo(2006). 32 4.2 Science By Ear - An interdisciplinary workshop

This workshop was in our opinion the most innovative experiment in methodology within SonEnvir. Aiming to intensify the interdisciplinary work setting within SonEnvir, we brought in both sonification experts and scientists from different domains to spend three days working on sonification experiments. Considering participant responses (both during and after the event), this workshop was very successful. Detailed background is available online here1.

4.2.1 Workshop design

We chose the participants to invite so they would form an ideal combination of com- petences: Eight international sonification experts, eight domain scientists (mainly from Austria), six audio specialists and programmers, and (partially overlapping with the above) the SonEnvir team itself (see appendixD). This group of ca. 24-28 people was just large enough to allow for different combinations for three days, but still small enough to allow for good group cohesion. The workshop program consisted of five short lectures by the sonification experts, which served to inform less experienced domain scientists about sonification history, method- ology, and psychoacoustics. This helped to bring all participants closer to a common language. Most of the workshop time was spent in sonification design sessions. For each day, three interdisciplinary teams were formed, composed of the three categories; 2-3 sonification experts, 2-3 domain scientists, 2 audio programmers, 1 moderator (a SonEnvir member). These sessions typically lasted 2 hours, after which the group would report to the plenary about their results. For the first two days, all three teams worked on the same problem at the same time (in parallel), which allowed for good comparisons of design results. On the last day, each group worked on a separate problem for two sessions to allow working more in depth on the exploration of ideas.

4.2.2 Working methods

The design sessions focused on data submitted by the participating domain scientists; the scientific domains included Electrical Power Systems, EEG Rhythms, Global Social data, meteorology in the Alpine region, computational Ising models, Ultra-Wide-Band communication, and research in biological materials called Polysaccharides. The parallel sessions began with a talk by the submitting domain specialist introducing the problem dataset, for to the plenary group. Then the group split into the three teams,

1http://sonenvir.at/workshop/ 33 and the teams began their parallel sessions. The typical sequence in a session was to do some brainstorming first, to get ideas what sonification strategies may be applica- ble. Once a few candidate ideas were around, experimentation began by coding little sonification designs (some administrative code like data reading routines was prepared beforehand). Time tended to be rather short, so decisions what to try first were often based on what seemed doable within limited time. Toward the end of the session time, the group began preparing what they would report to the plenary meeting. This usually consisted of little demos of what the group had tried, many more ideas for experiments to do as follow-up steps, and an informal evaluation of what the group felt they had learned. On the final workshop day, spending two sessions on a topic was a welcome change. Having more time to experiment, and especially taking a break and then continuing work on a problem allowed for more sophisticated mini-realisations. Having a wiki set up for the workshop allowed to distribute latest versions of information materials, all the code examples written, and the notes that were taken during all sessions. Furthermore, most sessions and discussions were recorded (audio, and some video) to allow later analysis of the working process and the interactions taking place.

4.2.3 Evaluation

Many of the designs ended up being adapted in some form for later work in SonEnvir; two that were not used elsewhere are described in section 10 for completeness. Based on feedback given by the workshop participants, it can be considered a highly successful experiment in methodology. Many participants commented very positively on the innovative aspects of this workshop: Actually doing design work in an interdisciplinary group setting rather than going through prepared examples was considered remarkable. The major design tradeoff that was also discussed in the responses was how much time to spend on each data problem: time pressure limited the eventual usefulness of the designs that were created, so the alternative of working on much fewer data sets for much longer may be worth trying - at the potential risk of having less comprehensive overall scope. Christian Day´emade a qualitative and quantitative content analysis of the audio record- ings of the sessions that confirmed the overall positive response (publication still in progress), and he developed a number of guidelines for future similar events:

Prepare and distribute basic literature on the domains well beforehand. In the SBE workshop, there was sometimes a tendency that domain scientists would mainly listen, thus leaving the sonification experts and programmers to do most 34

of the talking. From an interdisciplinary point of view, this is not ideal, as it does not create equally shared understanding.

Do more technical preparation together with the programmers beforehand. In some sessions problems came up with reading and handling data properly, which made them less practical than intended.

Have a scientist from the problem domain in every group. As the SBE workshop covered a wide range of problems, this was not feasible in the parallel sessions. This strategy would work well in combination with a more limited set of problems to work on.

4.3 ICAD 2006 concert

While the ICAD has been holding conferences since 1992, the first ever concert of sonifications at an ICAD conference was only in 2004.

4.3.1 Listening to the Mind Listening

For the ICAD conference in Sydney 2004, Stephen Barrass organised a concert of sonifi- cations of brain activity, called Listening to the Mind Listening2. The concert call3 invited participants to create sonifications of neural activity: a dataset was provided with five minutes of multichannel EEG recording of a person listening to a piece of music. A jury selected ten submissions for the concert which took place in the Sydney Opera House. Even though the pieces were constrained to adhere to the time axis of the recording, the diversity of the approaches taken, and the variety in the sounding results was extremely interesting. The pieces can be listened to here4 and the organisers published an analytical paper in Leonardo Music Journal comparing all the entries in a number of different ways (Barrass et al.(2006)). The concert was a great success, so it seemed likely to become a regular event at ICAD.

4.3.2 Global Music - The World by Ear

In 2006 the author was invited to be Concert Chair for the ICAD conference in London. Together with SonEnvir colleagues Christopher Frauenberger and Christian Day´e,we agreed that social data would be an interesting and accessible topic for a sonification

2http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm 3http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert call.htm 4http://www.icad.org/websiteV2.0/Conferences/ICAD2004/concert.htm 35 concert/competition, and we proceeded to collect and prepare social data of 190 nations represented in the United Nations. The concert call5 invited participants to contribute a sonification that illuminates aspects of the social, political and economic circumstances represented in the data. The following quote is the central part of the concert call.

Motivation Werner Pirchner, Ein halbes Doppelalbum, 1973: ”The military costs every person still alive roughly as much as half a kilogram of bread per day.” Global data are ubiquitous - one finds them in every newspaper, and they cover a range of themes, from global warming to increasing poverty, from individual purchasing power to the ageing of the world’s population. Obvi- ously these data are of a social nature: They describe specific aspects (e.g. ecological or economic) of the environment in which societies exist, which taken together determine culture, i.e. the way people live. Rising awareness of these global interdependencies has led both to fear and concerns (e.g. captured in the notion of the risk society, see Beck(1992); Giddens(1990, 1999)), as well as hopes for eventual positive consequences of globalisation. Along with developments like the scientisation of politics (see Drori et al.(2003)), this growing understanding of global issues has re- defined the context of the political discourse in modern societies: As modern societies claim to steer their own course based on self-observation by means of data, an information feedback loop is realised. Alternative choices of data that are important to consider, which data should be set in relation to each other, and a consideration of how to perceptualise these data choices meaningfully can enrich this discourse. Closing the feedback loop by informing society about its current state and its development is a task that both scientists and artists have responded to, and this is the key point of this call:

• You can contribute to the discourse by perceptualising aspects of world societal developments, • search for data that concern interesting questions, and devise strategies for investigating them, and • demonstrate that sound can communicate information in an accessible way for the general public.

5http://www.dcs.qmul.ac.uk/research/imc/icad2006/concertcall.php 36

The reference dataset of 190 countries included data ranging from commonly expected dimensions like geographical data (capital location, area), population number, to basic social indicators such as GDP, access to sanitation and drinking water, and life ex- pectancy. An extended dataset included data on education (years in school for males and females), illiteracy, housing situation, economic independence of males and females, and others. The call went on to specify the following constraints: Using this reference dataset was mandatory, so countries, capital locations, population and area data should be used. Participants were strongly encouraged to extend this dataset with more dimensions, and possible resources for such data extensions were pointed out. The concert sound system was to be a symmetrical ring of eight speakers, so any spa- tialisation used in pieces should employ such a configuration. Finally, participants had to provide a short paper that documents the context and back- ground of their data choices and sonification design. An international jury composed of sociologists, computer musicians/composers, and sonification specialists wrote reviews rating the anonymous submissions, and eight pieces were finally selected for the concert6. Four of these pieces are described in more detail in section 11.

6 Papers and headphone-rendered mp3 files for all pieces are available at http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html. Chapter 5

General Sonification Models

A British Euro-joke tells of a meeting of officials from various countries who listen to a British proposal, nodding sagely at its numerous benefits; the French delegate stays silent until the end, then taps his pencil and remarks: ”I can see that it will work in practice. But will it work in theory?” reported in Barnes(2007)

In this chapter, several models are proposed to allow better understanding of the main aspects of sonification designs:

Sonification Design Space Map - General orientation in the design process

Synthesis Model - Considerations of and examples for synthesis approaches

User Interaction Model - Understanding sonification usage contexts and users’ goals and tasks to be achieved

Spatialisation Model - Using spatial distribution of sound for sonification

Note the entangled nature of these aspects: splitting sonification designs into aspects is only a simplification that is temporarily useful for grasping the concepts. Because of their close connections, it will be necessary to cross-reference between sections. Generally, because of these interdependencies the understanding of these sections will benefit from re-reading.

37 38 5.1 The Sonification Design Space Map (SDSM)

5.1.1 Introduction

This section describes a systematic approach for reasoning about experimental sonifi- cation designs for a given type of dataset. Starting from general data properties, the approach recommends initial strategies, and lists possible refinements to consider in the design process. An overview of the strategies included is presented as a mental (and visual) map called the Sonification Design Space Map (SDSM), and the refinement steps to consider correspond to movements on this map. The main purpose of this approach is to extract ’theory’ from ’observation’ (in our case, of design practice), similar to Grounded Theory in sociology (Glaser and Strauss(1967)): to make implicit knowledge (often found in ad hoc design decisions which sonification experts consider ’natural’) explicit and thus available for reflection, discussion, learning, and application in design work. This approach is mainly the result of studying design sessions which took place in the interdisciplinary sonification workshop ’Science By Ear’, described in detail in section 4.2. In order to explain the concept in practice as well, a set of workshop sessions on one simple dataset is analysed here in the terms proposed; in the chapters on implemented designs, many more of these are described in detail using SDSM terms.

5.1.2 Background

When collaborations on sonification for a new field of application start, sonification researchers may know little about the new domain, its common types of data, and its interesting research questions; similarly, domain scientists may know little about sonification, its general possibilities, and its possible benefits for them. In such early phases of collaboration, the task to be achieved with a single particular sonification is often difficult to define clearly, so it makes sense to employ an exploratory strategy which allows for mutual learning and exchange. Eventually, the interesting tasks to achieve become clearer in the process. Note that even when revisiting familiar domains, it is good methodological practice to start with as few implicit assumptions as possible, and introduce any concepts from domain knowledge later, and transparently and explicitly, in the course of the design process. Rheinberger(2006) describes that researchers deal with ’epistemic things’, which are by definition vague at first (they can be e.g. physical objects, concepts or procedures whose usefulness is only slowly becoming clear); they choose ’experimental setups’ (ensembles of epistemic things and established tools, devices, procedures), which allow for endless 39 repetitions of experiments with minimal variations. The differential results gained from this exhaustion of a chosen area in the possibility space can allow for new insights. Then, an experimental setup can collapse into an established device or practice, and become part of a later experimental setup. From this perspective, sonification designs start their lifecycle as epistemic things, which need to be refined under usage; they may in time become part of experimental setups, and if successful, eventually ’disappear’ into the background of a scientific culture as established tools.

Some working definitions

The objects or ’content’ to be perceptualised can be well-known information, or new unknown data (or shades of gray in between). The aims for these two applications are very different: for information, establishing easy-to-grasp analogies is central, for data, enabling the perceptual emergence of latent phenomena of unforeseeable type in the data. As working terminology for the context here, we propose to define the following three terms: Auditory Display is the rendering of data and/or information into sound designed for human listening. This is the most general, all-encompassing term (even though the term ’display’ has a visual undertone to it). We further propose to differentiate between two subspecies of Auditory Displays: Auditory Information Display is the rendering of well-under-stood information into sound designed for communication to human beings. It includes speech messages such as in airports and train stations, auditory feedback sounds on computers, alarms and warning systems, etc. Sonification or Data Sonification is the rendering of (typically scientific) data into (typically non-speech) sound designed for human auditory perception. The informational value of the rendering is often unknown beforehand, particularly in data exploration. The model described here focuses on Data Sonification in the narrower sense. These definitions are quite close to the current state of the evolving terminology; In the International Ecyclopedia of Ergonomics and Human Factors, Walker and Kramer (2006) define the terms quite similarly:

”Auditory display is a generic term including all intentional, nonspeech audio that is designed to transmit information between a system and a user. ... Sonification is the use of nonspeech audio to present data. Specifically, sonification is the transformation of data relations into auditory relations, for the purpose of studying and interpreting the data.” 40

Common sonification strategies

The literature usually classifies sonification approaches into Audification and Parame- ter Mapping (Kramer(1994b)), and Model-Based Sonification (Hermann(2002)). For the context here, we prefer to differentiate the categories more sharply, which will be- come clear along the way; so, our three most common approaches are: Sonification (or generally, perceptualisation) by Continuous Data Representation, Discrete Point Data Representation, and Model-Based Data Representation. Continuous Data Representation treats data as quasi-analog continuous signals, and relies on two preconditions: equal distances along at least one dimension, typically time and/or space; and sufficient (spatial or temporal) sampling rate, so that the signals is free of sampling artifacts, and interpolation between data points is smooth. Both simple audification and parameter mapping involving continuous sounds belong in this category. Its advantages include: subjective perceptual smoothness; interpolation can make the sampling interval (which is an observation artifact) disappear; perception of continuous shapes (curves) can be appropriate; audition is very good at structures in time; mapping data time to listening time is metaphorically very close and thus easy to understand. Its drawbacks include: it is often tied to linear movement along one axis only; and events present in the data (e.g. global state changes in a system) may be difficult to represent well. Discrete Point Data Representation creates individual events for every data point. Here, one can easily arrange the data in different orders, choose subsets based on special criteria (e.g. based on navigation input), and when special conditions arise, they can be expressed well. Its advantages include: more flexibility, e.g. subset selections of changeable sizes, based on changeable criteria, and random iterations over the chosen subsets; and the lack of illusion of continuity may be more accurate to the data. Its drawbacks include: attention may be drawn to data independent display parame- ters, such as a fixed grain repetition rate; at higher event rates, interactions between overlapping sound events may occur, such as phase cancellations. Model-Based Data Representation employs more complex mediation between data and sound rendering by introducing a model, whose properties are informed by the data. Its advantages include: apart from data properties, more domain knowledge can be captured and employed in the model; and models may be applicable to datasets from a variety of contexts, as is commonly aimed for in Data Mining. Its drawbacks include: assumptions built into models may introduce bias leading away from understanding the domain at hand; there may be a sense of disconnection be- tween data and sounding representations; higher complexity of model metaphors may be 41 difficult to understand and interpret.

5.1.3 The Sonification Design Space Map

Task/Data Analysis (Barrass(1997)) focuses on solving well-defined auditory informa- tion design problems: How to design an Auditory Display for a specific task, based on systematic descriptions of the task and the data. Here, the phenomena to be perceptu- alised are known beforehand, and one tries to render them as clearly as possible. The Sonification Design Space Map given here addresses a similar but different problem: The aim to be achieved here is to find transformations that let structures/patterns in the data (which are not known beforehand) emerge as perceptual entities in the sound which jump to the foreground, i.e. as identifiable ’interesting audible objects’; these are closely related to ’sound objects’ in the electronic music field (from ’objets sonores’, see Schaeffer(1997)), and in psychoacoustics literature, ’auditory gestalts’ (e.g. Williams (1994)). In other words, the most general task in data sonification designs for exploratory pur- poses is to detect auditory gestalts in the acoustic representation, which one assumes correspond to any patterns and structures in the data one wants to find.

SDS Map axes

To facilitate this search for the unknown, the Design Space Map enables a designer, researcher, or artist to engage in systematic reasoning about applying different sonifica- tion strategies to his/her task or problem, based on data dimensionality and perceptual concepts. Especially while the task is not yet clearly understood and defined (which is often the case in exploratory contexts), reasoning about data aspects, and making well-informed initial choices based on perceptual givens can help to develop a clearer formulation of useful tasks. So, the proposed map of the Sonification Design Space (see figure 5.1) has these axes:

X-axis: the number of data points estimated to be involved in forming one gestalt, or ’expected gestalt size’;

Y-axis: the number of data dimensions of interest, i.e. to be represented in the current sonification design;

Z-axis: the number of auditory streams to be employed for data representation. 42

Figure 5.1: The Sonification Design Space Map The overlapping zones are fuzzy areas where different sonification approaches apply; the arrows on the right refer to movements on the map, which correspond to design iterations. For detailed explanations see sections 5.1.3 and 5.1.4.

To ensure that the auditory gestalts of interest will be easily perceptible, the most fundamental design decision is the time scale: In auditory gestalts (or sound objects) of 100 milliseconds and less it becomes more and more difficult to discern meaningful detail, while following a single gestalt for longer than say 30 seconds is nearly impossible, or at least takes enormous concentration; thus, a reasonable rule of thumb for single gestalts is to time-scale their rendering into the duration of echoic memory and short term memory, i.e. on the order of 1-3 seconds (Snyder(2000)). Sounds up to this duration can be kept in working memory with much detail information, keeping all the nuances and inflections while more perceptual processing goes on. This time frame can be called ’echoic memory time frame’. The ’expected gestalt size’ is the number of data points (of the dataset under study) that should be represented within this time frame to allow for perception of individual gestalts at this data subset size. Note that the three-second time frame does not impose a limit on the number of data points represented: as a deep exploration of the world of Microsound (Roads(2002)) shows, clouds of short sound events can happen at very high densities in the micro-time scale; in fact this is a fascinating area for creating sound that is rich in perceptual detail and artistic possibilities. 43

SDS Map zones

The zones shown in figure 5.1 do not have hard borders; their extensions are only meant to give an indication how close-by (and thus meaningfully applicable) which strategies are for a given data ’gestalt size’ and dimensions number. Similarly, the number ranges given below are only approximate orders of magnitude, and mainly based on personal experience both in electronic music and sonification research. The Discrete-Point zone ranges roughly from gestalt size 1 - 1000 and from dimensions number 1 - 20; the transitions shown in the map from note-like percepts via textures to granular events which merge into clouds of sound particles are mainly perceptual. The Continuous zone ranges roughly from gestalt size 10 - 100.000 and from dimensions number 1 - 20; the main transition here is between parameter mapping and audification, with various technical choices indicated along the way, such as using the continuous data signal as a modulation source, band splitting it, and/or applying filtering to it. The Model-Based zone ranges roughly from gestalt size 10 - 50.000 and from dimensions number 8 - 128; because the approach is so varied and flexible, there are no further orientation points in it yet. Existing varieties of model-based approaches are still to be analysed in the terms of this Sonification Design Space, and can eventually be integrated in appropriate locations on the map. All these zones apply mainly for single auditory streams; generally, when multiple streams are used in a sonification design, the individual streams can and should use fewer dimen- sions. In fact, using multiple streams is the main strategy for reducing the number of dimensions while keeping the overall density of presentation constant.

5.1.4 Refinement by moving on the map

In the evolution of a sonification design, all intermediate incarnations can be conceptu- alised easily as locations on the map, based on how many data points are rendered into the basic time interval, how many data dimensions are being used in the representation, and how many perceptual streams are in use. A step from one version to the next can then be considered analogous to a movement on the map. This mind model aims to capture the design processes we could observe in concentrated form in the Science by Ear workshop (’SBE’, described in detail in section 4.2), and in extended form in the development work in the main strands of the SonEnvir project.

Data anchor

For exploring a dataset, one can start by putting a reference point on the map, which we call Data Anchor: This is a point on the map corresponding to the full number of data 44 points and data dimensions. A first synopsis, or more properly Synakusis, of the entire dataset (within the echoic memory time frame of ca. 3 seconds) can then be created with one of the nearest sonification strategies on the map. Subsequent sonification designs and sketches will typically correspond to a movement down from this point, i.e. toward using fewer dimensions at a time, and to the left, toward listening to less than the total number of data points in the echoic memory time frame. Of course one can still listen to the entire dataset, total presentation time will simply become longer.

Shift arrows

Shift arrows, as shown in figure 5.1 on the right hand side, allow for moving one’s current ’working position’ on the Design Space Map, in effect deploying different sonification strategies in the exploration process. Note that some shifting operations are used for ’zooming’, and leave the original data untouched, while others employ (temporary) data reduction, extension, and transformation; in any sonification design one develops, it is essential to differentiate between these kinds of transformations and document the steps taken clearly. Finally, one can decide to defer such decisions and turn them into interaction possibilities, so that e.g. subsets are selected interactively. A left-shifting arrow can be used to reduce the ’expected gestalt size’, in effect using fewer data points within the echoic memory time frame. Some options are: investigat- ing smaller, user-chosen data point subsets (this can be by means of interaction, e.g. ’tapping’ on a data region and hearing that subset); downsampling; by subsets chose by appropriate random functions; and other forms of data preprocessing. A down-shifting arrow can be used to reduce the ’dimensions number’, i.e. to employ less data properties (or dimensions) in the presentation. Some options are: dimensionality reduction by preprocessing (e.g. statistical approaches like Principal Component Analysis (PCA), or using locality-preserving space-filling curves in higher-dimensional spaces, e.g. Hilbert curves); and user-chosen data property subsets, keeping the option to explore others later. Model-based sonification concepts may also involve dimensionality reduction techniques, yet they are in principle quite different from mapping-based approaches.1 An up-shifting arrow can be used to increase the number of dimensions used in the sonification design; e.g. for better discrimination of components in mixed signals, or to increase ’contrast’ by emphasizing aspects with relevance-based weighting. Some options are: time series data could be split into frequency bands to increase detail resolution; extracting the amplitude envelope of a time series and using it to accentuate its dynamic range2; other domain-specific forms of preprocessing may be appropriate for adding secondary data dimensions to be used in the sonification design.

1Thomas Hermann, personal communication, Jan 2007. 2Whether such transformations happen in the data preprocessing stage or in the audio DSP imple- mentation of a sonification design makes no difference to the conceptual reasoning process. 45

A right-shifting arrow can be used to increase the number of data points used, which can help to reduce representation artifacts. Some options are: interpolation of signal shape between data points; repetition of data segments (e.g. granular synthesis with slower-moving windows); local waveset audification (see section 5.3); and model-based sonification strategies can be used to create e.g. physical vibrational models, whose state may be represented in larger secondary datasets informed by comparatively few original data points. Interpolation in time-series data is often employed habitually without further notice; the model proposed here strongly suggests notating this transformation as a right-shifting arrow. If one is certain that the sampling rate used was sufficient, using cubic (or better) interpolation instead of the actually measured steps creates a smoother signal which is nearer to the phenomenon measured than the sampled values. When such a smoothed signal is used for modulating an audible synthesis parameter, the potentially distracting presence of the time step unit should be less apparent.

Z axis shifts

So far, all arrows have concerned movement in the front plane of the map, where only a single auditory stream is used for data representation. After the time scale, the number of streams is the second most fundamental perceptual design decision. By presenting some data dimensions in parallel auditory streams (especially data dimensions of the same type, such as time-series of EEG measurements for multiple electrodes), overall display dimensionality can be increased in a straightforward way, while dimensionality in each individual stream can be lowered substantially, thus making each single stream easier to perceive. (The equivalent movement is difficult to represent well visually on a 2D map, but easy to imagine in 3D space. Figure 5.2 shows a rotated view.) For multiple streams, all previous arrow movements apply as above, and two more arrows become available: An inward arrow can be used to increase the number of parallel streams in the represen- tation. Some options are: multichannel audio presentation; and setting one perceptual dimension of the parallel streams to fixed values with large enough differences to cause stream separation, thus in effect labelling the streams. An outward arrow can be used to decrease the number of parallel streams in the repre- sentation. Some options are: selecting fewer streams to listen to; intentionally allowing for perceptual merging of streams. Experimenting with different numbers of auditory streams can be very interesting, as multiple perspectives on the same data ’content’ may well contribute to more intuitive understanding of the dataset under study. Figure 5.2 shows the range of hypothetical variants of a sonification design for a dataset with 16 dimensions; the graph plane is at 46

Figure 5.2: SDS Map for designs with varying numbers of streams. Hypothetical variants of a sonification design for a dataset with 16 dimensions; see text. an expected gestalt size of 100 data points, and the axes shown are Y (number of data properties mapped) and Z (number of auditory streams employed). Different designs might employ, for example, one stream with 16 mapped parameters, 2 streams with 8, 4 streams with 4, 8 with 2 and 16 streams with a single parameter. Of course, depending on the character of the data dimensions, other, more asymmetrical combinations may be worth exploring; these will typically be located below the diagonal shown. Note that the map is slightly ambiguous between number of generated versus perceived streams; parallel streams of generated sound may fuse or separate based on perceptual context. This is a very interesting phenomenon whch can be quite fruitful: perceptual fusion between streams can be an appropriate expression of data features, e.g. in EEG recordings, massive synchronisation of signals across electrodes may cause the streams to fuse, which can represent the nature of some epileptic seizures well. 47 5.1.5 Examples from the ’Science by Ear’ workshop

In order to clarify the theoretical considerations given so far, we now turn to analysing design work done in an interdisciplinary setting. We report one exemplary set of design sessions as they happened, with added after-the-fact analysis in terms of the Sonification Design Space Map concept (short: SDSM). Where SDSM strongly calls for additional designs, these are provided and marked as additions. This is intended to demonstrate the potential of going from practice-grounded theory back to theory-informed practice. The workshop concept is described in section 4.2.

The workshop setting

True to the inherently interdisciplinary nature of scientific data sonification, the SBE workshop brought together three groups of people for three days: Domain scientists who were invited to supply data they usually work with; an international group of sonification experts; and audio programmers/sound designers. Apart from invited talks by the soni- fication experts, the main body of work consisted of sonification design sessions, where interdisciplinary groups (ca. 8 people, domain scientists, sonification experts, program- mers, and a moderator) spent 2 hours discussing one submitted data set, experimenting with different sonification designs, and then discussing results across groups in plenary meetings. In each session, discussion notes were taken as documentation, where possible the soni- fication designs were kept as code, and all the sound examples played in the plenary meetings were rendered as audio files. All this documentation is available online3.

Load Flow - data background

The particular data set serving as a starting example came from electrical power systems: It captures electrical power usage for one week (December 18 - 24, 2004) across five groups of power consumers: households, trade and industry, agriculture, heating and warm water, and street lighting; a sum over all consumer groups was also provided. Clear daily cycles were to be expected, as well as changes between workdays and week- ends/holidays. While this is not scientifically challenging, it is a good example of simple data with everyday relevance. We chose this dataset for the first parallel session, and it did serve well for exploring basic sonification concepts with novices. The full documen- tation for these sessions is available online here4. 3http://sonenvir.at/workshop/ 4http://sonenvir.at/workshop/problems/loadflow/. All sound examples can be found here, in the folders TeamA, TeamB, TeamC, and Extras; for layout reasons, relative links at this site are given here as ./TeamX/name.mp3 etc. 48

Figure 5.3: All design steps for the LoadFlow dataset. Steps are shown as locations labeled with team name and step number (A1, B2, C3, etc.), and arrows between locations.

The dataset was an excel file with 5 columns for the consumer groups, and consumption values were sampled at 15 minute intervals; so for a week, there are 24 * 4 * 7 = 672 data points for the entire dataset. In SDSM terms, this puts the Data Anchor for this set right in the middle of the Design Space Map, in the overlap zone between Discrete-Point and Continuous sonification, see section 5.3.

Sonification designs

All sonification designs are shown as locations on the Design Space Map in figure fig:loadmap labeled as A1, B1, C3 etc. Teams A and B created their design sketches in SuperCollider3, while Team C worked with the PureData environment. [A1] Team A began by sonifying the entire dataset as five parallel streams, scaled to 13 seconds, i.e. one day scaled to ca. 2 seconds; power values were mapped to frequency with identical scaling for all channels5. The resulting five parallel streams were panned into one stereo panorama. After experimenting with larger and smaller timescales, agreement was reached that the

5 ./Team A/TeamA 1 FiveSines PowersToFreqs.mp3 49 initial choice of timescale was appropriate and useful. In SDSM terms, this means the team was looking for auditory gestalts at the scale of single days. [A+] As SDSM recommends starting with a synakusis into a timeframe of 3 seconds, this is provided here6. This was only added after the workshop. Then, alternative sound parameter mappings were tried out based on team suggestions: [A2] Mapping powers to amplitudes of five tones labeled with different pitches7. While this is closer in metaphorical distance, it is perceptually less successful: one could not distinguish much shape detail in amplitude changes. [A3] Mapping powers to amplitudes and the cutoff frequencies of resonant lowpass filters of five differently pitched tones8. This was clearer again, but still not as differentiated as mapping to tone frequencies. [A4] Going back to mapping to frequencies, each tone was labeled with a different phase modulation index (essentially, different levels of brightness)9. While this allowed for better stream identification, the (very quickly chosen) scaling was not deemed very pleasant, if inadvertently amusing. [A5] Finally, the team tried using less parallel streams, and adding secondary data: the phase modulation depth (basically, the brightness) of both channels (household and agriculture) was controlled from the difference between the two data channels10. While this did not work very well, it seemed promising with better secondary data choices; however, at this point session time was over. In SDSM terms, design A5 is a move down - to less channels - and a move back up - derived data used to control additional parameters (the map only shows the resultant move). Team B chose to do audification (following one sonification expert’s request), and to use an interactive sonification approach: Their design loaded the entire data for one channel (672 values, equivalent to one week of data time) into a buffer, and played back a movable 96-value segment (equal to one day) as a looped waveform. The computer mouse position was used to control which 24hour-segment is heard at any time. This maps the signal’s local ’jaggedness’ into spectral richness and its overall daily change into amplitude. (For the non-interactive sound examples that follow, the mouse is moved automatically through the week within 14 seconds.) While the team found the data sample rate and overall data size too low for much detail, an interesting side effect turned up: when audifying segments in this fashion, the difference between the same time of day for two adjacent days was emphasized; large

6./extras/LoadflowSynakusis.mp3 7./Team A/TeamA 2 FiveTones PowersToAmps.mp3 8./Team A/TeamA 3 FiveTones PowersToAmpsAndFilterfreqs.mp3 9./Team A/TeamA 4 FiveFMSounds IDbyModDepth.mp3 10./Team A/TeamA 5 TwoFMSounds DiffToModDepth.mp3 50 differences at a specific time between adjacent days created strong buzzing11. In the next design step, 2 channels, households (left) and agriculture (right) were compared side by side12, and for clearer separation, they were labeled with different loop frequencies 13. The final design example maps the power values corresponding to the current mouse position directly to the amplitude of a 50Hz (European mains frequency) filtered pulse wave 14. As above, in the fixed rendering here, the mouse moves through the week at constant speed within 14 seconds. In SDSM terms, the initial choices were to move all the way down on the map (to only 1, and then 2 out of 5 channels at a time), and essentially a move to the left: a one-day window chosen data subset was played by moving a one-day window within the data. Note that this move is actually creating an interaction parameter for sonification design users, which is one the many advantages of current interactive programming environments. Note that the interpolation commonly used in audification is actually slightly dubious here: There may well have been meaningful short-time fluctuations within 15 minute intervals which would not have been captured in the data as supplied. Team C used PureData as programming environment. Their approach was quite similar to Team A, with interesting differences: They began with scaling each single data channel into 3 seconds, mapping power in that channel both to frequency and to amplitude, and subsequently rendered all channels in this fashion15. Finally, this team also produced a version with six parallel streams (including the sum value), scaled into 12 seconds, and with different timbres16. In SDSM terms, they first moved to the bottom of the map, while keeping full data scale, i.e. a synakusis-sized time window; example 7 moves back up (using all channels), and to the left (i.e. toward higher time resolution, gestalts on the order of single days of data).

5.1.6 Conclusions

Conceptualising the sonification design process in terms of movements on a design space map, one can experiment freely by making informed decisions between different strategies to use for the data exploration process; this can help to arrive at a representation which produces perceptible auditory gestalts more efficiently and more clearly. Understanding the sonification process itself, its development, and how all the choices made influence

11./Team B/1 LoadFlow B Households.mp3 12./Team B/2 LoadFlow B households agriculture.mp3 13./Team B/3 LoadFlow B households agriculture.mp3 14./Team B/4 LoadFlow B households agriculture.mp3 15http://sonenvir.at/workshop/problems/loadflow/Team C/, sound examples 1-6. 16./Team C/TeamC AllChannels.mp3 51 the sound representation one has arrived at, is essential in order to attribute perceptual features of the sound to their possible causes: They may express properties of the dataset, they may be typical features of the particular sonification approach chosen, or they can be artifacts of data transformation processes used. As these analyses of some rather basic sonification design sessions show, the terminology and map metaphor provide valuable descriptions of the steps taken; having the map available (mentally or physically) for a design work session seems very likely to provide good clues for next experimental steps to take. Note that the map is open to extensions: As new sonification strategies and techniques evolve, they can easily be classified as either new zones, areas within existing zones, or as transforms belonging to one of the directional arrows categories; then their appropriate locations on the map can easily be estimated and assigned.

5.1.7 Extensions of the SDS map

There are several ways to extend the map, and make it more useful, and this dissertation aims to provide some of them: More and richer detail can be added by analysing the steps taken in observed design sessions, classifying them as strategies, and adding them if new or different. This is the object of chapters6,7,8,9, 10, and 11, the example sonification designs from different SonEnvir research activities. A more detailed analysis of the existing varieties model-based sonification can be under- taken, and that understanding can and should be expressed in the terms of the conceptual framework of the map; however, this is beyond the scope of this thesis. Expertise can be integrated by interviewing sonification experts, tapping into their expe- rience, inquiring about their favorite strategies, or decisions they remember that made a big difference for a specific design process. One can imagine building an application that lets designers navigate a design space map, on which simple example data sets with coded sonification designs are located. When one moves in an area that corresponds to the dimensionality of the data under study, the nearest example pops up, and can be adapted for experimentation with one’s own data. Obviously such examples should be canonical and capture established sonification best practices and guidelines, e.g. concerning mapping Walker(2000), as well as sonification design patterns Barrass and Adcock(2004). Finally, many of the strategies need not be fixed decisions made once; being able to delay many of the strategic choices, and to make them available as interaction parameters when exploring a dataset can be extremely valuable. 52 5.2 Data dimensions

Before proceeding to synthesis models, it will be helpful to discuss the nature of data dimensions in more depth.

5.2.1 Data categorisation

In data analysis, data dimensions are classified by scales: data may capture categorical differences, ordered differences, which may have a metric, and a natural zero.

Table 5.1: Scale types

Scale: Characteristics: Example: nominal difference without order kind of animal ordinal difference with order degrees of sweetness interval difference with order and metric temperature ratio difference, order, metric, and natural zero length

For nominal scales (such as ’kind of animal’) and ordinal scales, it is useful to know the set of all occurring values, or categories (such as cat, dog, horse). The size of this set greatly influences the choices of possible representations of the values in this data dimension. For metrical scales (interval and ratio), it is necessary to know the numerical range in order to make scaling choices; also knowing the measurement resolution or increment (for example, age could measured in full years, or days since birth) and precision (e.g. tolerances of a measuring device) is useful.

5.2.2 Data organisation

Apart from the phenomena recorded, and their respective values, data may have differ- ent forms of organisational structure: Individual data points may have different kinds of neighbour relations to specific other data points. The simplest case would be no organisation at all: Measuring all the individual weights of a herd of cows is just a set of measured values with no order. When recording health status at the same time, each data point has two dimensions, but there is still no order. If the cows’ identities are recorded as well, similar measurements at different times can be compared. If the cows have names, the data can be sorted alphabetically (nominal scale); if the cows’ birth dates are known as well, the data can also be sorted by age 53

(interval). Both sortings are derived from data dimensions, and there is no obvious ’best’, or preferable order. Often the order in which data are recorded is considered an implicit order; however, in the example given, it may simply be the order in which the cows happened to be weighed. In social statistics, data for individuals or aggregates without obvious neighbour relations are the most frequent case. When physical phenomena are studied, measurements and simulations are often organ- ised in time (e.g. time series of temperature) and space (temperature in n measuring stations in a geographical area, or force field simulations in a 3D grid of a specific res- olution). These orders can actually be considered separate data dimensions; for clear differentiation one may call a dimension which expresses a value (such as temperature) a value dimension, while a dimension that expresses an order (e.g. a position in time or space) can be called ordering dimension or indexing dimension. TaskData analysis by Barrass(1997), chapter 4, provides a template that captures data dimensions systematically, as well as initial ideas for desirable ways of representation of and interaction with the data under study. As a practical example, the TaDa Analysis made for the LoadFlow dataset as a preparation for the Science By Ear workshop is reproduced here.

5.2.3 Task Data analysis - LoadFlow data

Name of Dataset: Load Flow Date: March 12, 2006 Authors: Walter Hipp, Alberto de Campo (TaDa) File: LoadFlow.xls (original), .tab, .mtx. Format: excel xls original, tab delimited for Sc3, mtx format for pure data. The file contains 672 lines with date and time, total electrical power consumption, and consumption for five groups of power consumers.

Scenario

The Story: Load Flow describes how the electrical power consumption of different groups of con- sumers changes in time. A time series was taken for a week (in Winter 2004) of 15 minute average values, documenting date and time, total power consumption, and con- sumption for a) households, b) trade, c) Agriculture, d) heating and warm water, and e) street lighting. Tasks for this data set: 54

• Find out which kinds of patterns can be discerned at which time domain; e.g. daily cycles versus shorter fluctuations.

• Since all five individual channels have the same unit of measurement, find ways to represent them in a way that their values and their movements can be compared directly.

Table 5.2: The Keys

Question: Who uses how much power when? Are there patterns that recur? At what time scales? Are there overall periodicities? Answers: One or several of the channels; Yes/No, days/hour/times of day; categories of pattern shapes Subject: Relative proportions, patterns of change in time Sounds: ? (none at the time the TaDa analysis was written)

TaDa

Table 5.3: The Task

Generic question: What is it? How does it develop? Purpose: Identify, compare Mode: interactive Type: continuous Style: exploration

Table 5.4: The Data/Information:

Level: Intermediate and global Reading: Conventional (possibly direct) Type: 5 channels, ratio Range: continuous Organization: time 55

Table 5.5: The Data:

Type: 5 channels of ratio scale with absolute zero Range: Individual channels 0 2.24, total power 1.08 4.55 Organisation: Time

Appendix

Figure 5.4: LoadFlow - time series of dataset (averaged over many households)

Figure 5.5: LoadFlow - time series for 3 individual households 56 5.3 Synthesis models

Perceptualisation designs always require decisions in what manner precisely data values (the ’sonificate’) determine perceptible representations (in the case of auditory repre- sentation, the sonifications). While section 5.1 focused on which data subsets are to be presented in the rendering, this section covers the question which technical aspects of the sound synthesis algorithms deployed are to be determined by which data dimensions. The three sonification strategies defined in section 5.1.2 are discussed in more depth, and concrete examples of synthesis processes are provided in ascending complexity. With all strategies from the very simplest to the most complex model-based designs, decisions of mappings (of data dimensions or model properties) to synthesis parameters are required; these decisions need to be informed by perceptual principles such as those covered in chapter2. While building sonification designs may be technically simple, mapping choices are by no means trivial. One aspect to consider is metaphorical proximity: Mappings that relate closely to concepts in the scientific domain may well reduce cognitive load and thus allow for better concentration on explorations tasks. (For a discussion of performance of clearly defined tasks with ’intuitive’, ’okay’, random, and intentionally ’bad’ mappings, see Walker and Kramer(1996), described in section 2.5.) Another aspect is the clarity of the communicative function to be fulfilled in the research context: What will a perceptible aspect of the sound serve as? Some possible categories are: analogic display of a data dimension - a value dimension mapped to a synthesis pa- rameter which is straightforward to recognise and follow a label identification for a stream - needed when several streams are heard in par- allel an indexing strategy - ordering the data by one dimension, then indexing into subsets context information/orientation - mapping non-data; e.g. using clicks to represent a time grid

Finally, it is essential to understand the resolution of perceptual dimensions, such as Just Noticeable Differences (JNDs) of perceptual dimensions. Note that sound process parameters need not be directly perceptible; they may govern aspects of the sound that will indirectly produce differences that may be described perceptually in other terms. Perceptual tests can be integrated into the sonification design process, like writing tests to verify that new code works as intended. Writing examples that test whether a specific concept produces audible differences for the data differences of interest can provide 57 such immediate confirmatory feedback, as well as direct learning experience for the test listeners immediately at hand. Such examples also provide a good base for discussions with domain specialists. Similar mapping decisions come up in the process of designing electronic or software- based music instruments; how the ranges of sensor/controller inputs (the equivalent to data to be sonified) are scaled into synthesis parameter ranges determines how playing that instrument will feel to a performer.

5.3.1 Sonification strategies

The three most common concepts, Continuous Data Representation, Discrete Point Data Representation, and Model-Based Data Representation, correspond closely to the approaches described first in Scaletti(1994). The examples given again use the LoadFlow dataset, and loosely follow the order given by Scaletti. Pauletto and Hunt(2004) briefly describe how different data characteristics sound under different sonification methods: Static areas, trends, single outliers, discontinuities, noisy sections, periodicities (loops), or near-periodicities are simple characteristics that may occur in a single data dimension, and will be used as examples of easily detectable phenomena. The data for the code examples can be prepared as follows:

( // load data file q = q ? (); // a dictionary to store things by name // load tab-delimited data file: q.text = TabFileReader.read( "LoadFlow.tab".resolveRelative, true, true ); // keep the 5 interesting channels, convert to numbers q.data = q.text.drop(1).collect { |line| line[3..7].collect(_.asFloat) }; // load one data channel into a buffer on the server q.buf1 = Buffer.loadCollection(s, q.data.flop[0]); // households );

5.3.2 Continuous Data Representation

Audification is the simplest case of continuous data representation: Typically, converting the numerical values of a long enough time series into a soundfile is a good first pass at finding structures in the data. Scaletti(1994) calls this 0th order sonification. Scaling the numerical values is straightforward, as one only needs to fit them into the legal range for the type of soundfile to be used; for high precision, 32 bit floating point data can be converted to sound file formats without any loss of information. For audification, one can simply scale the (expected or actual) maximum and minimum values to the conventional -1.0 to +1.0 range for audio signals at full level. This maps the data dimension under 58 study directly to the amplitude of the audible signal. By making the playback rate user-adjustable allows for simple time-scaling, one can change expected gestalt size interactively. The fastest timescaling value will typically be around 40-50 kHz, which includes the default sample rates of most common audio hardware; this puts roughly 100.000 data points into working memory, which makes audification the fastest option for screening large amounts of data with minimal prepro- cessing. Typical further operations to provide are: selection of an index range in the data, options for looped and non-looped playback, and synchronised visual display of the waveform under study. The EEGScreener described in chapter 9.1 is an example of a powerful, flexible audification instrument. Of the phenomena to be detected, static values will become silent: the human ear does not hear absolute pressure values, and while audio hardware may output DC offsets, loudspeakers do not render these as reproducible pressure offsets. Trends are also not represented clearly: Ramp direction is not an audible property. Single outliers become sharp clicks, and discontinuities (e.g. large steps) become be loud pops. Rapidly fluctu- ating sections will sound noisy, and periodicities will be easy to discern even in they are only weak components in mixed signals. Code examples for 0th order - audification. p = ProxySpace.push; // prepare sound ~audif.play; // start an empty sound source

// play entire week once, within 0.05 seconds ~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05) * 0.1 };

// try agriculture data q.buf1.loadCollection(s, q.data.flop[2]);

// play the entire week looped ~audif = {PlayBuf.ar(1, q.buf1, q.buf1.duration / 0.05, loop: 1) * 0.1 };

The next example loops over an adjustable range of days; starting day within the week and loop length can be set in days.

( ~audif = { |dur = 0.05, day=0, length=1| var stepsPerDay = 96; var start = day * stepsPerDay; var rate = q.buf1.duration / dur; // read position in the data buffer 59

var phase = Phasor.ar(1, rate, start, start + (length * stepsPerDay)); BufRd.ar(1, q.buf1, phase, interpolation: 4) * 0.1; }; )

The next example loops a single day, and allows moving the day-long time window, thus navigating by mouse - this is the solution SBE TeamB developed.

( ~audif = { var start = MouseX.kr; // time in the week (0 - 1) var range = BufFrames.kr(q.buf1); // full range is one week. var rate = 1 / 10; // guess a usable rate var phase = Phasor.ar(0, rate, 0, range / 7) + (start * range);

var out = BufRd.ar(1, q.buf1, phase, interpolation: 4); out = LeakDC.ar(out * 0.5); // remove DC offset }; )

Parameter mapping continuous sonification, or what Scaletti calls 1st-order sonification, maps data dimensions onto parameters that control a directly audible synthesis parame- ter, such as pitch, amplitude (of a carrier signal), brightness, etc. Here, the simplest case would be mapping to frequency (a synthesis parameter) respectively pitch (a perceptual property of the rendered sound). The first example maps the data range of 0 - 2.24 into pitch range of (midinote) 60 - 96, or frequencies between ca 260 and 2000 Hz, time-scaled into 3 seconds.

// loop a week’s equivalent of data ( ~maptopitch.play; ~maptopitch = { | loopdur = 3| var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1); var pitch = datasignal.linlin(0, 2.24, 60, 96); // scale into 3 octaves; var sound = SinOsc.ar(pitch.midicps) * 0.2; Pan2.ar(sound); } )

It may seem a little over-engineered here, but in general, it is a good idea to consider here what the smallest data variations of interest are, and whether they will be audible in the mapping used. 60

While data for Just Noticeable Differences for some perceptual properties of sound exist in the literature, their values will depend on the experimental context and circumstances. Thus, rather than relying only on experiments which were conducted for other purposes, it makes sense to do at least some perceptual tests for the intended usage context. For example given above, data resolution is 0.01 units; scaled from range [0, 2.24] into [60, 96] creates a minimum step of 0.01 * 36 / 2.24, or 0.16 semitone steps. The literature agrees that humans are most sensitive to pitch variation when it occurs at a (vibrato) rate of ca. 5 Hz, so a first test may use a pitch of 78 (center of the chosen range), a drift/variation rate of 5 Hz, and a variation depth of +-0.08 semitones; all of these can be adjusted to find the marginal conditions where pitch variation is just noticeable.

( ~test.play; ~test = { |driftrate = 5, driftdepth = 0.08, centerpitch = 78| var pitchdrift = LFNoise0.kr(driftrate, driftdepth); SinOsc.ar( (centerpitch + pitchdrift).midicps) * 0.2 }; )

Changing driftrate, driftdepth and center pitch will give an impression of how this be- haves; to my ears, 0.08 is in fact very near the edge of noticeability. One could sys- tematically test this by setting drift depth to random start values above and below the expected JND, and having test persons do e.g. forced choice tests that would converge on the border for a given drift rate and center pitch. The next example maps the same data values to amplitude, which could seem metaphor- ically closer - the data value is consumed energy, and amplitude is directly correlated to acoustical energy. However, the rendering is perceptually not very clear: humans are good at filling in dropouts in audio signals, such as speech phonemes masked in noisy environments, or damaged by bad audio connections, such as intermittent telephone lines. The patterns that emerged in the pitch example, where the last three days are clearly different, almost disappear. Changing to linear mapping instead of exponential makes little difference.

( ~maptoamp.play; ~maptoamp = { | loopdur = 3| var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1); var amp = datasignal.linlin(0, 2.24, -60, -10).dbamp; // var amp = datasignal * 0.2; // linear mapping var sound = SinOsc.ar(300) * datasignal * 0.2; Pan2.ar(sound); } 61

)

The next example shows what Scaletti calls a second-order mapping. The data are mapped to a parameter that controls another parameter, phase modulation depth; how- ever, perceptually this translates roughly to brightness (which could be considered a first-order audible property).

( ~maptomod.play; ~maptomod = { | loopdur = 3| var datasignal = PlayBuf.ar(1, q.buf1, q.buf1.duration / loopdur, loop: 1); var modulator = SinOsc.ar(300) * datasignal * 2; var sound = SinOsc.ar(300, modulator) * 0.2; Pan2.ar(sound); } )

5.3.3 Discrete Data Representation

An alternative approach to creating continuous signals based on data dimensions, one can also create streams of events, which may sound note-like when slower than ca. 20 events per second; at higher rates, they can best be described with Microsound terminology, as granular synthesis. The example below demonstrates the simplest case: one creates one synthesis event for each data point, with a single data dimension mapped to one parameter. A duration of 3 seconds will create a continuous-seeming stream; 10 seconds will sound like very fast grains, while 30 seconds takes the density down to 22.4 events per second, which can seem like very fast marimba-like sounds.

( ~grain.play; ~grain = { |pitch=60, pan| var sound = SinOsc.ar(pitch.midicps); var envelope = EnvGen.kr(Env.perc(0.005, 0.03, 0.2), doneAction: 2); Pan2.ar(sound * envelope, pan) }; // ~grain.spawn([\pitch, 79]);

Tdef(\data, { var duration = 10; var datachannel = q.data; var power; 62

q.data.do { |chans| power = chans[0]; // households; ~grain.spawn([\pitch, power.linlin(0, 2.24, 60, 96)]); (duration / datachannel.size).wait; }; }).play; )

5.3.4 Parallel streams

When the dimensions in a data set are directly comparable (like here, where they are all power consumption measured in the same units at the same time instants), it is concep- tually convincing to render them as parallel streams. Auditory streams, as discussed in Bregman(1990) and Snyder(2000), are a perceptual concept: a stream is formed when auditory events are grouped together perceptually, and multiple streams can form when all the auditory events separate into several groups. With the example above, a minimal change can be made to create two parallel streams: Instead of creating one sound event for one data dimension, one creates two, and pans them left and right for separating the two streams by spatial location.

( Tdef(\data, { var duration = 10; var datachannel = q.data; var powerHouse, powerAgri; ~grain.play; q.data.do { |chans| powerHouse = chans[0]; powerAgri = chans[2]; ~grain.spawn([\pitch, powerHouse.linlin(0, 2.24, 60, 96), \pan, -1]); ~grain.spawn([\pitch, powerAgri.linlin(0, 2.24, 60, 96), \pan, 1]); (duration / datachannel.size).wait; }; }).play; )

When presenting several data dimensions simultaneously, one can obviously map them to multiple parameters of a single synthesis process, thus creating one stream with multiparametric controls. This makes the individual events fairly complex, and may require that each event has more time to unfold perceptually. (In the piece Navegar, a fairly complex mapping is used, see section 11.3.) 63

It should be noted that what is technically created as one stream of sound events is not guaranteed to fuse into one perceptual stream - it may split into several layers, just like separately created multiple streams may perceptually merge into single auditory stream. In fact, as perception is strongly influenced by a listener’s attitude, one can intentionally choose analytic or holistic listening attitudes; either focusing on details of rather few streams, or listening to the overall flow of the unfolding soundscape - whether it is a piece of music or a sonification.

5.3.5 Model Based Sonification

In Model Based Sonification (Hermann and Ritter(1999)), the general concept is that the data values are not mapped directly, but inform the state of a model; properties of that model (which is a kind of front-end) are then accessed when user input demands it (e.g. by exciting the model with energy, somewhat akin to playing a musical instrument). The model properties then determine how the sound engine renders the current user input; this backend inevitably contains some mapping decisions to which the considerations given here can be applied. Till Bovermann’s example implementation of the Data Sonogram (Bovermann(2005)) is a good compact example for MBS. The approach is to treat the data values as points in n-dimensional space (for the example Iris data set, 4); then user input triggers a circular energy wave propagating from a current user-determined position, and the reflections of each data point are simulated by mapping distance (in 4D space) to amplitude and delay time, as if in natural 3D space. The other parameters for the sound grains (frequency, number of harmonics) are also determined by data based mappings. The Wahlges¨angesonification based on this examples uses somewhat more elaborate mapping: Distance in 2D is mapped to delay and amplitude, with user-tunable scaling; panning is determined by 2D circular coordinates; the data value of interest (voter percentage) is mapped to the sound grain parameter pitch, and controls for attack/decay times make the tradeoff between auditory pitch resolution and time resolution explicit. Both of these examples are too extended for the context here; but they are both available online, and Wahlges¨angeis described in more detail in section 6.2. While it would be worthwhile to analyse more MBS examples in detail, this is beyond scope of the present thesis. Further research will be necessary for a more fine-grained integration of the model-based approach into the context of the sonification models given here. 64 5.4 User, task, interaction models

Humans experience the world with all their senses, and interacting with objects in the world is the most common everyday activity they are well trained at. For example, handling physical objects may change their visual appearance, and touching, tapping or shaking them may produce acoustic responses they can use to learn about objects of interest. Perception of the world, action in it, and learning are tightly linked in human experience, as discussed in section 2.3. In artificial systems that model aspects of the world, from office software to multimodal display systems, or sonification systems in particular, interaction crucially determines how users experience such a system: whether they can achieve tasks correctly (effectiveness) with it, whether they can do so in reasonable amounts of time (efficiency), and whether they enjoy the working process (positive user experience, pleasantness). This section looks at potential usage situations of sonification designs and systems: the people working in these contexts (’sonification users’); the goals they will want to pursue by means of (or supported by) sonification; the kinds of tasks entailed in pursuing these goals; the kinds of interfaces and/or devices that may be useful for these goals; and some notions of how to go about matching all of these.

5.4.1 Background - related disciplines

Interaction is a field where a number of disciplines come into play: Human Computer Interaction (HCI) studies the alternatives for communication between humans and computers (from translating user actions into input for a computer system to rendering computer state into output for the human senses), sometimes to amazing depths of detail and variety (Buxton et al.(2008); Dix et al.(2004); Raskin(2000)). Musical instruments are highly-developed physical interfaces for creating highly differ- entiated acoustic sound, with a very long tradition; in electronic music performance, achieving similar degrees of control flexibility (or better, control intimacy) has long been desirable. While the mainstream music industry has focused on a rather restricted set of protocols (MIDI) and devices (mostly piano-like keyboards, simulated tape machines, and mixing desks), experimental interfaces that allow very specific, sometimes idiosyn- cratic ideas of musical control have been an interesting source of problems for interested engineers. The research done at institutions like STEIM17 (see Ryan(1991)) and CN- MAT18 (Wessel(2006)) has made interface and instrument design its own computer music sub-discipline, with its own conference (NIME19, or ’New Instruments/Interfaces

17http://www.steim.nl 18http://cnmat.berkeley.edu 19http://www.nime.org 65 for Musical Expression’, since 2001). Computer game controllers tend to be highly ergonomic and very affordable; thus they have become a popular resource for artistic (re-)appropriation as cheap and expressive music controllers: Gamepads, and more recently, Wii controllers, have both been adopted as is, and creatively rewired for specialised artistic uses. This has been part of an emerging movement toward more democratic electronic devices: Beginning with precursors like Circuit Bending (Ghazala(2005)), extending the design of sound devices my introducing controlled options for what engineers might consider malfunction), designers have created open-source hardware - such as the Arduino mi- crocontroller board20 - to simplify experimentation with electronic devices. With these developments, finding ways to create meaningful connections and new usage contexts for object-oriented hardware (Igoe(2007)) has become interesting for a much larger public than strictly electronics engineers and tinkerers.

5.4.2 Music interfaces and musical instruments

CD/DVD players or MP3 players tend to have rather simple interfaces: play the current piece, make it louder or softer, go to the next or previous track, use randomised or ordered playback of tracks. A piano has a simple interface for playing single notes: one key per note, ordered systematically, and hitting the key with more energy will make it louder. Thus, beginners can experience rather fast success at finding simple melodies on this instrument. Playing polyphonic music really well on piano is a different matter; as Mick Goodrick puts it, in music there is room for infinite refinement (Goodrick(1987)). On a violin, learning to produce good tone already takes a lot of practice; and playing in tune (for whichever musical culture one is in) requires at least as much practice again. (One is reminded of the joke where a neighbour asks, ”why can’t your children spend more time practicing later, when they can already play better?”) Instruments from non-western cultures may provide interesting challenges: Playing nose flutes is a good example of an instrument that involves the coordination of unusual combinations of body parts, thus developing (in Western contexts) rather unique skills. However, a violin allows very subtle physical interaction with musical sound while it is sounding, and in fact requires that skill for playing expressively. On piano, each note sounds by itself once it has been struck, thus the relations between keys pressed, such as chord balance, micro-timing between notes, and agogics are the main strategies for playing expressively on the piano. In Electronic Music performance, mappings between user actions as registered by con-

20http://www.arduino.cc 66 trollers (input devices like the ones HCI studies, buttons, sliders, velocity-sensitive keys, sensors for pressure, flexing, spatial position etc.) and the resulting sounds and musical structures are essentially arbitrary - there are no physical constraints as in physical instru- ments. Designing satisfying personal instruments with digital technology is an interesting research topic in music and media art; e.g. Armstrong(2006) bases his approach on a deep philosophical background, and discusses his example instrument in these terms; Jord`aPuig(2005) provides much historical context of electronic instruments, and dis- cusses an array of his own developments in that light. Thor Magnusson’s (and others’) ongoing work with ixi software21 explores applying intentional constraints to interfaces for creating music in interesting ways.

5.4.3 Interactive sonification

The main researchers who have been raising awareness for interaction in sonification are Thomas Hermann and Andy Hunt, who started the series of Interactive Sonification workshops, or ISon22. In the introduction to a special issue of IEEE Multimedia resulting from ISon2004, the editors give the following definition: ”We define interactive sonification as the use of sound within a tightly closed human- computer interface where the auditory signal provides information about data under analysis, or about the interaction itself, which is useful for refining the activity.” (Her- mann and Hunt(2005), p 20) In keeping with Hermann’s initial descriptions of Model-Based Sonification (Hermann (2002)); they maintain that learning to ’play’ a sonification design with physical inter- action, as with a musical instrument, really helps users acquire an understanding of the nature of the perceptualisation processes involved and of the data to be explored. They find that there is not enough research on how learning in interactive contexts actually occurs. The Neuroinformatics group at University Bielefeld (Hermann’s research group) has studied a number of very interactive interfaces in sonification contexts: recognizing hand postures to control data exploration (Hermann et al.(2002)), a malleable surface for interaction with model-based sonifications (Milczynski et al.(2006)), tangible data scanning using a physical object to control movement in model space (Bovermann et al. (2006)), and others. At University of York, in Music Technology, Hunt has studied both musical interface design issues (e.g. Hunt et al.(2003)) and worked on a number sonification projects mainly with Sandra Pauletto (e.g. Hunt and Pauletto(2006)). Pauletto’s PhD thesis, Interactive non-speech auditory display of multivariate data (Pauletto(2007)), discusses

21http://www.ixi-software.net/ 22 http://interactive-sonification.org/ 67 interaction and sonification in great detail (pp. 56-67), and studies central sonification issues with user experiments: The first two experiments compare listening to auditory displays of data (audifications of helicopter flight data, sonifications of EMG (elec- tromyelography) data) with their traditional analysis methods (visually reading spectra, signal processing analysis). In both cases, auditory display of large multi-variate data sets turned out to be an effective choice of display. Her third experiment directly studies the role of interaction in sonification: Three al- ternative interaction methods are provided for exploring synthetic data sets to locate a given set of structures. A low interaction method allows selection of data range, play- back speed, and play/stop commands. For the medium interaction method, a jog wheel and shuttle is used to navigate the sonification at different speeds and direction. The high interaction method lets the analyst navigate by moving the mouse over a screen area that corresponds to the data, like tape scrubbing. Both objective measurements and subjective opinions found the low interaction method less effective and efficient, and preferred the two higher interaction modes. Interestingly, users preferred the medium interaction mode for its option to quickly set the sonification parameters, and then letting it play while concentrating on listening; the high interaction method requires constant user activity to keep the sound going. It should be noted here that these results strictly apply only to the specific methods studied, and cannot be generalised; however, they do provide interesting background.

5.4.4 ”The Humane Interface” and sonification

The field of Human Computer Interaction (HCI) is very wide and diverse, and cannot be covered here in depth. However, a rather specialised look at some examples of inter- faces may suffice to provide enough context for discussing the main issues in designing sonification interfaces. Rather than attempting to cover the entire field, I will take a strong position statement by an expert in the field as a starting point: Jef Raskin was responsible for the Macintosh Human Interface design guidelines that set the de facto standard for best practice in HCI for a long time, and his book ”The Humane Interface” (Raskin(2000)) is an interesting mix between best practice patterns and rather provocative ideas. Here is a brief overview of the main statements by chapter: 1. Background - The central criterium for interfaces is quality of the interaction; it should be made as humane as possible. Humane means responsive to human needs, and considerate of human frailties. As one example, the user should always determine the pace of interaction. 2. Cognetics - Human beings only have a single locus of attention, and a single focus of attention, which in interactions with machines is nearly always on the task they try 68 to achieve.23 Computers and interfaces should not distract users from their intentions. Human beings always tend to form habits; user interfaces should allow the formation of good habits, as through benign habituation competence becomes automatic. A possible measure of how well an interface supports benign habituation is to imagine whether a blind user can learn it. As a more general point, humans mostly use computers to get work done; here, user work is sacred, and user time is sacred. 3. Modes - Modes are system states where the same user gesture can have different effects, and are generally undesirable; one should eliminate modes where possible. The exception to the rule is physically maintained modes, which he calls quasi-modes (entered e.g. by holding down a special key, and reverted to normal when the key is released.) Visible affordances should provide strong clues as to their operations. If modes cannot be entirely avoided, monotonic behaviour is the next best solution: a single gesture always causes the same single operation; and in a mode where the operation is not meaningful the gesture should do nothing. It is worth keeping in mind that everyone is both expert and novice at the same time when different aspects of a system are considered. 4. Quantification - Interface efficiency can be measured, e.g. with the GOMS Keystroke model. For most cases, ’back of the envelope’ calculations give a good first indication of efficiency; standard times for hitting a key, pointing by mouse, moving from mouse to keyboard, and mentally preparing an action are sufficient for that. Finding the minimum combination for a given task is likely to make that task more pleasant to perform. Obviously, the time a user is kept waiting for software to respond should be as low as possible; while a user is busy with other things, s/he will not notice waiting times. 5. Unification - This chapter ranges far beyond the scope needed here, eventually making a case that operating systems and applications should disappear entirely. Fundamental actions are catalogued, and variants of computer-wide string search are discussed as one example of how system-wide unified behaviour should work. 6. Navigation - The adjectives ’intuitive’ and ’natural’ when used for interfaces generally translate to ’familiar’. Navigation, as with the ZoomWorld approach might be interesting for organising larger collections of sonification designs; for the context of the SonEnvir project these ideas were not applicable. 7. Interface issues outside the user interface - Programming Environments are notoriously bad interfaces, and actually have been getting worse: On a 1984 computer, starting up, running Basic, and typing a line to evaluate 3 +4 may be accomplished in maybe 30 seconds; on a current (2000) computer, every one of these steps takes much longer, even for expert users.

23Raskin’s motto for the chapter is a quote from a character in the TV series Northern Exposure, Chris: ”I can’t think of X if I’m thinking of Y.” 69

Relevance to sonification and the SonEnvir project

The most closely related notion to disappearing system software (chapter 5) is the Smalltalk heritage of SC3. Smalltalk folklore says that ’when Smalltalk was a little girl, she thought she was an operating system’ - one could do almost everything within Smalltalk, including one of Raskin’s major desirables, namely, defining new operations by text at any time, which change or extend the ways things work in a given environment. The question what ’user content’ is actually being created is extremely important in sonification work: In sonification usage situations, ’content to keep’ can comprise uses of a particular data file, particular settings of the sonification design, perceptual phenomena observed with these data and settings, and text documentation, i.e., descriptions of all of the above and possibly user actions to take to cause certain phenomena to emerge. The text editor and code interface in SC3 is well suited for this: commands to invoke a sonification design (e.g. written as a class), code to access a specific data file, and notes of observations can be kept in a single document, as they are all just text. Across different sonification designs, SC3 behaves uniformly in this respect. Compared to most programming environments, the SC3 environment allows very fluid working styles. Documentation within program creation (literate programming, as Don- ald Knuth called it), is supported directly.

5.4.5 Goals, tasks, skills, context

From a pragmatic point of view, a number of compromises need to be balanced in interaction design, especially when it is just one of several aspects to be negotiated:

• Simple designs are quicker to implement, test and improve than more complex designs. Given that one usually understands requirements much better by imple- menting and discussing sketches, simpler designs will often be better.

• Exotic devices can be very interesting; however, they limit transferability to other users, and will require extra costs and development time. Even when there is a strong reason to use a special interface device, including a fallback variants with standard UI devices is recommended.

• Functions should be clearly made available to the users; usually that means making them visible affordances. (Buxton et al.(2008) argues here that the attitude ’you can do that already’, in some arcane way experts may know about, means that final users will not use that implemented function.)

Goals are firmly grounded in the application domain, and with the users. What do users want to achieve with the sonification design to be created? The goals will naturally 70 be different for different domains, datasets, and contexts (e.g. research prototypes or applications for professional use); nevertheless these examples may apply to most designs:

• experience the differences between comparable datasets of a given kind

• find phenomena of interest within a given dataset, e.g. at specific locations, with specific settings

• document such phenomena and their context, as they may become findings

• make situations in which phenomena of interest occurred repeatable for other users

The interaction design of a sonification design should allow the user’s focus of attention to remain at least close to these top-level goals. Ideally, the design should add as little cognitive load as necessary for the user, to keep her attention free for the goals. The sonification design’s interface should offer ways to achieve all necessary and useful actions toward achieving these goals. The concepts for these actions should obviously be formulated in terms of the mental model the user has of the data and the domain they come from. Tasks comprise all the actions users take to achieve their top-level goals. Tasks can be directly functional for attaining a goal, or necessary to change the system’s state such that a desired function becomes available. Systems that often require complicated preparation to get things done tend to distract users from their goals, and are thus experienced as frustrating. Some example tasks that come up when using a sonification design are:

• load a sonification design of choice (out of several available)

• load a dataset to explore

• start the sonification

• compare with a different datasets

• tune the sonification design while playing

• explore different regions of a dataset by moving through them

• look up documentation of the sonification design details

• start, repeat, stop sonification of different sections

• store a current context: a dataset, current selection, current sonification parameter settings, and accompanying text/explanation. 71

For all these tasks, there should be visible affordances that communicate to the user how the related tasks can be done. Ideally, a single task should be experienced as one clear sequence of individual actions (or subtasks). More complex tasks will be composed of a sequence of subtasks. As novice users ac- quire more expertise, they will form conceptual chunks of these operations that belong together. As long as these subtasks require meaningful decisions, it is preferable to keep them separate; if there is only a single course of actions, one should consider making it available as a single task. Skills are what users need to have or acquire to use an interface efficiently. These can include physical skills like manual dexterity, knowledge of operating systems, and other skills. In the HCI literature, two conflicting viewpoints can be found here: a. users already possess skills that should be re-used; one should add as little learning load as possible, and enable as many users as possible to use a design quickly; b. interfaces should allow for long-term improvement, and enable motivated users to learn to do very complex things very elegantly eventually. Which of these apply will depend on the context the sonification is designed for; in any case it is advisable to consider well what one is expecting of users in terms of learning load. Some necessary knowledge / skills include:

• locating files (e.g. program files, data files)

• reading documentation files

• selecting and executing program text

• using program shortcuts (e.g. start, stop)

• using input devices like mice, trackballs, tablets

Context should be represented clearly to reduce cognitive load: all changeable settings should be e.g. visible on a graphical user interface, such as choice of data file, sonification parameter settings, current subset choice, and others. Often, the display elements for these can double as affordances that invite experimentation. In some cases, it can be useful to display the current data values graphically, or to double auditory events visually as they occur in realtime playback.

5.4.6 Two examples

EEG players

In the course of the SonEnvir projects, most of the interaction design was done in collab- orative sessions. One exception that required more formal procedures was redesigning 72 the EEG Screener and Realtime Player (discussed in depth in chapter 9.1), as the in- tended expert users were not available for direct discussion. These designs went through a full design revision, with a task analysis that is identical for most of the interface. The informal ’wish list’ included: Simple to use, start in very few steps, low effort, keep results reproducible; include a small example data file that can be played directly. The task analysis comprised these items: Goals:

• quickly screen large EEG data files to find episodes to look at in detail

Tasks:

1. locate and load EEG files in edf format

2. select which EEG electrode channels will be audible

3. select data range to playback: which time segment within file speedup factor, filtering

4. play control: play, stop, pause, loop; feedback current location

5. document current state so it can be reproduced by others

6. include online documentation in german

7. later: prepare example files for different absences

All of these were addressed with the GUI shown in figure 9.2: 1. File selection is done with a ’Load EDF’ button and regular system file dialog; for faster access, the edf file is converted to soundfiles in the background, and feedback is given when ready. 2. Initially, this was only planned with popup menus and the electrode names; however, making a symbolic map of the electrode positions on the head and letting users drag- and-drop electrode labels to the listening locations (see figure 9.3) proved was much appreciated by the users. 3. Time range within the file was realised in multplie ways: graphical selection within a soundfile view showing the entire file; providing the start time, duration, and end time as adjustable number boxes; and showing the selected time segment in a magnified second soundfile view. This largely follows sound editor designs, which EEG experts are typically not familiar with. 73

4. Play controls are implemented as buttons; play state is shown by button color (white font is active) and by a moving cursor in both soundfile views. The cursor’s location is also given numerically. Looping and filtering is also controlled by buttons; in looped mode, a click plays when the loop point is crossed. In filter mode, the volume controls for the individual bands are enabled. When filtering is off, these controls are disabled for clarity. Adjustable playback parameters are all available as named sliders, with the exact numer- ical values and units. (Recommended presets for different usage situations were planned, but not realised eventually.) 5. The current state can be documented with buttons and shortcuts: The ’Take Notes’ button opens a text window, which contains the current filename; the current time and playback settings can be pasted into it, so they can be reconstructed later. 6. The ’Help’ button opens a detailed help page in German. The EEG Realtime Player re-uses this design with minimal extensions, as shown in figure 9.5; this reduces learning time for both designs, which are intended for the same group of users. The main differences are the use of different time units (seconds instead of minutes) and more parameter controls, as the synthesis concept is more elaborate.

Wahlges¨ange

This design is described in detail in section 6.2; its GUI is shown in figure 6.5. As this design follows a Model-Based concept, the realtime interaction mode is central: Goals:

• compare geographical distribution of voters for ca. 12 parties in four elections in a region of Austria.

Tasks:

1. switch between a fixed range of elections and parties to explore

2. ’inject energy’ by interaction to excite the model at a visually chosen location

3. compare parties and elections quickly after another

4. adjust free sonification parameters like timescale

1. Choosing which election and party results to explore is done with two groups of buttons which show all available choices. The currently active button has a white font. 2. As common in Model-Based Sonification, this design requires much more interaction: to obtain sound, users must click on the geographical map. This causes a circular wave 74 to emerge from the given location, which spreads over the entire extent of the map. Each data point is indexed by spatial location on the map; when the expanding wave hits it, a sound is played based on its value for the current data channel (voter turnout for one of the parties). 3. For faster comparisons, switching to a new election or party plays the sonification for the new choice with the last spatial location; switching between parties can also be done by typing reasonably mnemonic characters as shortcuts. 4. The free sonification parameters like expansion speed of the wave, number of data points to play (to reduce spatial range), etc., can be adjusted with sliders which also show the precise numerical values. Full explanations are given in a long documentation section before the program code, which was deemed sufficient at the time. An interesting possible extension here would be the use of a graphical tablet to obtain a pressure value when clicking on the map; this would be equivalent to a velocity-sensitive MIDI keyboard. However, in the interest of easier transfer to other users, we preferred to keep the design independent of specific non-standard input devices.

5.5 Spatialisation Model

The most immediate quality of a sound event is its localization: What direction did that sound come from? Is it near or far away? We often spontaneously turn toward an unexpected sound, even if we were not paying attention earlier. Spatial direction is also one of the stronger cues for stream separation or fusion (Breg- man(1990), Snyder(2000), Moore(2004)); when sound events come from different directions, they are unlikely to be attributed to the same physical source. Music technology has developed a variety of solutions for spatialising synthesized sound, and both SuperCollider3 and the SonEnvir software environment support multiple ap- proaches for different source characteristics, and different reproduction setups. Sources can either be continuous or short-term single events; while continuous sources may have fixed or moving spatial positions, streams of individual events may have dif- ferent spatial positions for each event. In effect, giving each individual sound event its own static position in space is a granular approach to spatialisation. (1D) Stereo rendering over loudspeakers works well for few parallel streams, where spatial location mainly serves to identify and disambiguate streams. The most common spatialisation method employed is amplitude panning, which relies on the illusion of phantom sound sources created between a pair of loudspeakers, with the perceived position depending on the ratio of signal levels between the two speakers. Panorama 75 potentiometers (pan pots) on mixing desks employ this method. Sound localisation on such setups is of course compromised at listening positions outside the sweet spot. (2D) Few channel rendering is typically done with horizontal rings of 4 - 8 speakers. This has become more easy in recent years with 5.1 (by now, up to 7.1) home audio systems, which can be used with external input from multichannel audio interfaces. Such systems can spatialize sources on the horizontal plane quite well, and can be used as up to 7 static physical point sources as well. (3D) Multichannel systems, such as the CUBE at IEM Graz with 24 speakers, or the An- imax Multimedia Theater in Bonn with 40 speakers, are usually designed for symmetry, spreading a number of loudspeakers reasonably evenly on the surface of a sphere. This allows for good localisation of sources on the sphere, with common spatialisation ap- proaches including vector based panning, Ambisonics, and Wave Field Synthesis. Source distances outside the sphere can be simulated well by reducing the level of the direct sound relative to the reverb signal, and lowpass filtering it. (1D/3D) Headphones are a special case: they can be used to listen to stereo mixes for loudspeakers (and most listeners today are well trained at localising sounds with this kind of spatial information); and they can be used for binaural rendering, i.e. sound environments that feature the cues which allow for sound localisation in ’normal’ auditory perception. For music, this may be done with dummy head recordings; for auditory display, this is done with simulations of these cues applied to all the sound sources individually to create their spatial characteristics.

5.5.1 Speaker-based sound rendering

Physical sources

For multiple speaker setups, a simple and very effective strategy is to use individual speakers as real physical sources. The main advantage is that physics really help in this case; when locations only serve to identify streams, as with few fixed sources, fixed single speakers work very well.

Amplitude Panning

The most thorough overview on amplitude panning methods is provided in Pulkki(2001). Note that all of the following methods work for both moving and static sources. Code examples for all these are given in Appendix B.1. 1D: In the simplest case of panning between two speakers, equal power stereo panning is the standard method. 2D: The most common case here is panning to a horizontal, symmetrical ring of n 76 speakers by controlling azimuth; in many implementations, the width over how many speakers (at most) the energy is distributed can be adjusted. In case the angles along the ring are not symmetrical, adjustments can be made by remapping, e.g. with a simple breakpoint lookup strategy. However, using the best geometrical symmetry attainable is always superior to compensation for asymmetries. Often it is necessary to mix multiple single-channel sources down to stereo: The most common technique for this is to create an array of pan positions (e.g. n steps from 80% left to 80% right), to pan every single channel to its own stereo position, and summing these stereo signals. Mixing multiple channel sources into a ring of speakers can be done the same way; the array of positions then corresponds to (potentially compensated) equal angular distances around the ring. Both larger numbers of channels can be panned into rings of fewer speakers, and vice versa. 3D: For simple geometrical arrangements of speakers, straightforward extensions of am- plitude panning will suffice. E.g. for the CUBE setup at IEM consists of rings of 12, 8, and 4 speakers (bottom, middle, top); the setup at Animax Multimedia Theater in Bonn adds a bottom ring of 16 speakers. For these systems, having 2 panning axes, one between the rings for elevation, and one for azimuth in each ring, works well. Again, the speaker setup should be as symmetrical as possible; compensation can be trickier here. Generally speaking, even while compensations for less symmetrical se- tups are mathematically plausible, spatial images will be worse outside the sweet spot. Maximum attainable physical symmetry cannot be fully substituted by more DSP math. Compensating overall vertical ring angles and individual horizontal speaker angles within each ring is straightforwrd with the remapping method described above. For placement deviations that are both horizontal and vertical, using fuller implementations of Vector Based Amplitude Panning (VBAP, see e.g. Pulkki(2001)) is recommended 24; however, this was not required within the context of the SonEnvir project, or this dissertation. For placement deviations that are both horizontal and vertical, it is preferable to have . However, this was not needed within the context of the SonEnvir project.

Ambisonics

Ambisonics is a multichannel reproduction system developed independently by several researchers in the 1970s Cooper and Shiga(1972); Gerzon(1977a,b), based on the idea that spherical harmonics can be used to encode and decode directions from which sound energy comes; a good basic introduction to Ambisoncis math is online here25.

24VBAP has been implemented for SC3 in 2007 by Scott Wilson and colleagues, see http://scottwilson.ca/site/Software.html 25 http://www.york.ac.uk/inst/mustech/3d audio/ambis2.htm 77

The simplest form of Ambisonics, first order, can be considered an extension of the classic Blumlein MS stereo microphone technique: in MS, one uses an omnidirectional microphone as a center channel (M for Mid), and a figure-of-8 mike to create a Side signal (S). By adding or subtracting the side signal from the center, one obtains Left and Right signals; e.g. L = M-S, R = M+S. By using figure-of-8 mikes for Left/Right, Front/Back, and Top/Bottom signals, one obtains a first order Ambisonic microphone, such as those made by the Soundfield company26. The channels are conventionally named W, X, Y, Z. Such an encoded recording can be decoded simply for speaker positions on a sphere. In the 1990s, the mathematics for 2nd and 3rd order Ambisonics were developed to achieve increasingly higher spatial resolution; these are formulated in Malham(1999), and also available online here27. Extensions to even higher orders were realised recently by IEM researchers (Musil et al. (2005); Noisternig et al.(2003)), with multiple DSP optimizations implemented as a PureData library. Using MATLab tools written by Thomas Musil, coefficients for encod- ing/decoding matrices for different speaker combinations and tradeoff choices can be calculated offline, and can then simply be read in from text files in the realtime platform of choice. The most complex use of this library so far has been the VARESE system (Zouhar et al.(2005)). This is a dynamic recreation of the acoustics of the Philips pavil- ion at Brussels World Fair, for which Edgard Var`ese’s Po`eme Electronique´ (and Iannis Xenakis’ concr`etePH) was composed. While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team decided to write a consistent new implementation of Ambisonics in SC3, based on a subset of the existing PureData libraries. This package was realised up to third order Ambisonics by Christopher Frauenberger for the AmbIEM package, available here28. It supports the main speaker setup of interest, the IEM Cube, as well as a setup for headphone rendering as described below.

5.5.2 Headphones

For practical reasons, such as when working in one room with colleagues, scientists exper- imenting with sonifications are required to use headphones. Many standard techniques work well for lateralising sounds, which can be entirely sufficient for making streams segregate or fuse as desired. In order to achieve perceptually credible simulations of auditory cues for full localisation, for example, making sounds appear to come from the front, or above, more complex approaches are needed; the most common approach is to model the cues by means of which the human ear determines sound location.

26http://www.soundfield.com 27 http://www.york.ac.uk/inst/mustech/3d audio/secondor.html 28 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/ 78

Sound localisation in human hearing depends on the differences between the sound heard in the left and right ears; in principle, three kinds of cues are involved: Interaural Level Difference (ILD), which is the level difference of a sound source between the ears, dependent on the source’s direction. This can roughly be simulated with amplitude panning, which is however limited to left/right distinction in headphones (usually called lateralisation). Being so similar to amplitude panning, it is fully compatible with stereo speaker setups. Interaural Time Difference (ITD), the difference in arrival time of a sound between the ears. This is on the order of a maximum of 0.6 msec: at a speed of sound of 340 m/sec, this is the time equivalent to a typical ear distance of 21 cm. This can be simulated well for headphones; but because delay panning does not transfer reliably for speakers (one hardly ever sits exactly on the equidistance symmetry axis of one’s loudspeaker pair), it is hardly used. Like amplitude panning, delay panning only creates lateralisation cues.

Head Related Transfer Functions - HRTF / HRIR

Head Related Transfer Functions (HRTFs) or equivalently, Head Related Impulse Re- sponses (HRIRs) capture the fact that both ITD and ILD are frequency-dependent: For every direction of sound incidence, the sound arriving at each ear is colored by reflections on the human pinna, head, and upper torso; such pairs of filters are quite characteristic for the particular direction they corrsepond to. Roughly speaking, localising a heard sound depends on extracting the effect of the pair of filters that colored it, and inferring the corresponding direction from the characteristics of this pair of filters; obviously, this works more reliably on known sources. HRTFs/HRIRs can be measured by recording known sounds from a set of directions with miniature microphones at the ear, and extracting the effect of the filters. Obviously, HRTF filters are different for every person (as are people’s ears and heads), and every person is completely accustomed to decoding sound directions from her own HRTFs. Thus, there is no miracle HRTF curve that works perfectly for everyone; however, because some features in HRTFs are generalizable (such as the directional bands described in Blauert(1997)), the idea of using HRTFs to simulate sounds coming from different directions has become quite popular. The KEMAR set of HRIRs (see Gardner and Martin(1994); the data are available online here 29) is based on recordings made with a dummy head, and is considered to work reasonably well for different listeners. The IRCAM has also published individual HRIRs of ca. 50 people for the LISTEN project (Warusfel(2003), online here 30), so one can try to find matches to suit a particular person’s preferences well.

29http://sound.media.mit.edu/resources/KEMAR/full.tar.Z 30http://recherche.ircam.fr/equipes/salles/listen/ 79

Implementing fixed HRIRs for fixed source locations is straightforward, as one only needs to convolve the sound source with one pair of HRIRs. However, this is not sufficient: static angles tend to sound like colouration (as caused by inferior audio equipment); in everyday life, we usually move our heads slightly, creating small changes in ITD, ILD and HRTF which quickly disambiguate any localisation uncertainties. Thus, creating convincing moving sources with HRTF spatialisation is required, which is not trivial: as a source’s position changes, its impulse responses must be updated quickly and smoothly. There is no generally accepted scheme for efficient high-quality HRIR interpolation, and convolving every source separately is computationally expensive.

Ambisonics and Virtual Binaural Rendering

For complex changing scenes, the IEM has developed a very efficient approach for bin- aural rendering (Musil et al.(2005); Noisternig et al.(2003)): In effect, taking a virtual, symmetrical speaker setup, and spatializing to that setup with Ambisonics; then render- ing these virtual speakers as point sources with their appropriate HRIRs, thus arriving at a binaural rendering. This provides the benefit that the Ambisonic field can be rotated as a whole, which is really useful when head movements of the listener are tracked, and the binaural rendering is designed to compensate for them. Also, the known problems with Ambisonics when listeners move outside the sweet zone disappear; when one carries a setup of virtual speakers around one’s head, one is always right in the center of the sweet zone. This approach has been ported to SC3 by C. Frauenberger; its main use is in the VirtualRoom class, which simulates moving sources within a rectangular box-shaped room. This class is especially useful for preparing spatialisation with multi-speaker setups by headphone simulation. Among other things, the submissions for the ICAD 2006 concert31, described also in section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web documentation32. One can of course also spatialize sounds on the virtual speakers by any of the simpler panning strategies given above as well; this trades off easy rotation of the entire setup for better point source localisation. To support simple headtracking, C. Frauenberger also created the ARHeadTracker ap- plication, also available as a SuperCollider3 Quark.

31 http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php 32 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html 80 5.5.3 Handling speaker imperfections

All standard spatialisation techniques work best when speaker setups are as symmetrical and well-controlled as possible. While it may not always be feasible to adjust mechan- ical positions of speakers freely for very precise geometry, a number of factors can be measured and compensated for, and this is supported by several utility classes written in SuperCollider, which are part of the SonEnvir framework.

Latency

The Latency class plays a test signal for a given number of audio channels, and waits for the signals to arrive back at an audio input. The resulting list of measured per-channel latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class described below.

Spectralyzer

While inter-speaker latency differences are well-known and very often addressed, we have found another common problem to be more distracting for multichannel sonification: Each individual channel of the reproduction chain, from D/A converter to amplifier, cable, loudspeaker, and speaker mounting location in the room, can sound quite different. When changes in sound timbre can encode meaning, this is potentially really confusing! To address this, the Spectralyzer class allows for simple analysis of a test signal as played into a room, with optional smoothing over several measurements, and then tuning compensating equalizers by hand for reasonable similarity across all speaker channels.

SpeakerAdjust

Once one has achieved usable EQ curves for every speaker channel, one can begin to compensate for volume differences between channels (with big timbral differences between channels, measuring volume or adjusting it by listening is rather pointless). The SpeakerAdjust class expects specifications for relative amplitude, (optionally) delay time, and (optionally) as many parametric EQ bands as needed for each channel. Thus, a speaker adjustment can be created that runs at the end of the signal chain and linearizes the given speaker setup as much as possible; of course, adding limiters for speaker and listener protection can be built into such a master effects unit as well. Chapter 6

Examples from Sociology

Though sociology has early on been considered a promising field of application (Kramer (1994a)), sonification to date is not widely known within the social sciences. Thus, one purpose of collaborating with sociologists was to raise the awareness of the potential benefits sonification can bring to social research. Three sonification designs and their research context are described and analysed as case studies here: the FRR Log Player, Wahlges¨ange (election/whale songs), and the Social Data Explorer. Social (or sociological) data generally show characteristics that make them promising for sonification: They are multi-dimensional, and they usually depict complex relations and interdependencies (de Campo and Egger de Campo(1999)). We consider the application of sonification to data depicting historical (or geographical) sequences as the most promising area within the social sciences. The fact that sound is inherently time-bound is an advantage here, because sequential information can be conveyed very directly by mapping the sequences on the implicit time axis of the sonification. In fact, social researchers are very often interested in events or actions in their temporal context. The importance of developmental questions is even growing due to the glob- alized notion of social change. Sequence analysis, the field methodologically concerned with these kinds of questions, assembles methodologies that are by now rather estab- lished, like event history analysis, and appropriate techniques to model causal relations over time (Abbott(1990, 1995); Blossfeld et al.(1986); Blossfeld and Rohwer(1995)). Like most methods of quantitative (multivariate) data analysis, sequence analysis meth- ods need to be based on an exploratory phase. The quality of the analysis process as a whole depends critically on the outcome of this exploratory phase. As the amount of social data is continuously increasing, effective exploratory methods are needed to screen these data. On higher aggregation levels (such as global, or UN member states level), social data have both a time (e. g. year) and a space dimension (e. g. nation) and thus can be understood both as time and geographical sequences. The use of sonification to explore data of social sequences was the main focus of the sociological part within the

81 82

SonEnvir project.

6.1 FRR Log Player

An earlier stage of this work was described in detail in a poster for ICAD 2005 (Day´e et al.(2005)), it is briefly documented in the SonEnvir sonification data collection here 1, and the full code example is available from the SonEnvir code repository here2. Researchers in social fields, be they sociologists, psychologists or design researchers, sometimes face the problem of studying actions in an area which is not observable for ethical reasons. This was especially true in the context of the RTD project Friendly Rest Room FRR3 (see Panek et al.(2005), which was partly funded by the European Commission. The project’s aim was to develop an easy to use toilet for older persons, and persons with (physical) disabilities. In order to meet that objective, an interdisci- plinary consortium was set up, bringing together specialists of various backgrounds like industrial design, technical engineering, software engineering, user representation, and social scientists. In the final stage of the FRR project, a prototype of this toilet was installed at a day care center for patients with multiple sclerosis (MS) in Vienna, in order to validate the design concept in daily life use. The sonification design described here was intended for sonifying the log data gathered during this validation phase, because difficulties had arisen with these analyses. Being unable, for ethical reasons, to gather observational data, these log data are the only way to understand the actions taken by the user. The FRR researchers are interested in these data because they provide information on the user’s interaction with the given technical equipment, and thus on the usability and everyday usefulness of the toilet system.

6.1.1 Technical background

The guests of this day care center are patients with varying degrees of Multiple Sclerosis (MS); some need support from nurses when using the toilet while others can use it independently. Due to security considerations as well as for pragmatical reasons, not all components developed within the FRR-project were selected for this field test (see Panek et al.(2005)). The main features of the installed conceptual prototype are:

• Actuators to change the height of the toilet seat, ranging from 40 to 70 cm.

1http://sonenvir.at/data/logdata1/ 2https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/FRR Logs/ 3http://www.is.tuwien.ac.at/fortec/reha.e/projects/frr/frr.html 83

• Actuators to change the seat’s tilt, ranging from 0 to 7 degrees forward/down.

• Six buttons on a hand-held remote control to use these actuators: toilet up, toilet down, tilt up, tilt down, as well as flush and alarm triggers.

• Two horizontal support bars next to the toilet that can be folded up manually.

• A door handle of a new type which is easier to use for people with physical dis- abilities was mounted on the outside of the entrance door.

Figure 6.1: The toilet prototype system used for the FRR field test. Left to right: the door with specially designed handle, the toilet prototype as installed at the day center, and an illustration of the tilt and height changing functionality.

As direct observation of the users’ interaction with the toilet system was out of the question, sensors were installed in the toilet area that continuously logged the current status of the toilet area. These sensors recorded:

• the height of the toilet seat (in cm, one variable),

• the tilt of the toilet seat (in degree, one variable),

• the status of the remote control buttons (pressed/not pressed, six variables),

• the status of the entrance door (open/not open, one variable); and,

• the presence of RFID tagged smart cards (RFID mid range technology) near the toilet seat to identify any persons present. The guests and the employees of the day care center were provided with such smart cards, and an RFID module in the toilet area registered the identities of up to four cards simultaneously. 84

The log data matrix recorded from these sensor data is quite unusual for sociological data, due to its time resolution of about 0.1 sec maximum (which is high for social data), and the sequential properties of the information captured by the data. One log entry consists of about 25 variables, of which 11 are relevant for our analysis: A timestamp for when an entry was logged, and the ten variables described above. Of these eleven variables, seven are binary. Each log file records the events of one day. In case there is no event for a longer time (e.g. during the night), a ’watchdog’ in the logging software creates a blank event every 18 minutes to show the system is still on. In order to use these log files to understanding what the users did, we needed to recon- struct sequences of actions of a user based on the events registered by the sensors. The technical data had to be interpreted in terms of users’ interaction with the equipment; otherwise the toilet prototype could not be evaluated. The technical data themselves are not sufficient for a validation, as we need to validate whether or not the proposed techni- cal solution results in an improvement of the users’ quality of life, which is the eventual social phenomenon of interest here. Due to the sequential nature of the information contained in the log files, established routines from multivariate statistics could not be applied, as they usually do not consider the fundamental difference of data composed of events in temporal sequence.

6.1.2 Analysis steps

Graphical Screening On a graphical display (which is what the FRR researchers used), it is not at all easy to follow the sequential order of the events, above all because such a sequence consists of several variables. Yet, as the first step of analysis, we relied on graphs with the purpose on identifying episodes. An episode in our context is defined by a single user’s visit to the toilet. A prototypical minimum episode consists the following logged events:

door open door close tilt down (multiple events) tilt up (multiple events) button flush door open

Note that, in this specific episode, the height and the tilt of the toilet bowl are adjusted via remote control by the user. Still, this episode is a very simple chain of events. Most of the logged events for tilt down and tilt up result only from the weight of the person sitting on the toilet seat. 85

Figure 6.2: Graphical display of one usage episode (Excel).

The first step in analysing the data material was to use graphical displays to look for sections that could be identified as one user’s visit to the toilet prototype, and to chunk the data into such episodes, which formed our new entities of analysis. The episode displayed graphically in figure 6.2 is an example for a very simple, single episode. It is obvious that the graph is not easy to interpret due to its complexity (possibly additionally complicated on black and white printouts). The sequential character of the events can be read visually, if not very comfortably: One can see that the starting event is that the door opens, and then closes; followed by the event that the toilet bowl tilts forward (the tilting degree grows). We can assume that the person is now sitting on the toilet. Then the height is adjusted, and the tilt as well. After the tilt returns to a lower value (we can assume the weight has been removed, so we can infer that the person has stood up), the flush button is pressed, and the door opens and closes again. The other variables remain unchanged. Investigating patterns of use The FRR researchers were not interested primarily in the way a single person behaves in the Friendly Rest Room, but rather whether different groups of people would be found who, for instance due to similar physical limitations, show similarities in interacting with the given technical equipment. Such typical action patterns of various user groups, are interesting to cross-reference with data from other sources: Characteristics like sex, weight, age of a person, her/his physical and cognitive limitations, additional informa- 86 tion like whether s/he is using a wheelchair, or crutches, are important to deepen the interpretation and allow for causal inferences. For this purpose, an identification module was mounted behind the cover of the water container of the FRR prototype, which was intended to recognize users wearing RFID tags. To give just one example how user identification can help: usually, people who use wheelchairs need more time than non-wheelchair users to open a door, enter the room and close the door again. This is partly because of the need to manoeuvre around the door when sitting in a wheelchair, mainly because standard door handles are hard to use, especially when, as is the case with MS, people have restricted mobility in their arms. Thus, if an analysis shows that the time needed by wheelchair users to enter the room is on average shorter than with a standard door, one can conclude that the FRR-designed door handle is a usability improvement for wheelchair users. Similarly, one can identify further patterns of use and possibly relate them to user char- acteristics as mentioned above. However, these patterns are not only important for the evaluation of the equipment, but also for figuring out user IDs that were accidentally not recorded. Comparing anonymous episodes with patterns Unfortunately, RFID tag recognition only worked within a range of about 50cm around the toilet, and so not every person using the toilet was identified. Thus there are anonymous episodes which cannot be related to personal data from other sources. From a heuristic perspective, these anonymous data are nearly useless. As this applies for 53 % of the 316 episodes, this was a serious concern for the validity of the results. Thus it was decided to study the episodes of identified users in order to find patterns that may allow for eventual identification of anonymous episodes. For some of the anonymous episodes, direct identification was possible. For others, most likely for users who did not use the prototype often, we could rely on conjecture based on what we could derive from the episodes of identified users. By comparing with the patterns identified in step 2, we made use of the ’anonymous’ episodes we analysed them by approaching the problem with empirically found categories.

6.1.3 Sonification design

The repertoire of sounds for the FRR Log player sonification design is:

• Door state (open or closed) is represented by coloured noise similar to diffuse ambient outside noise; this noise plays when open and fades out when closed.

• Button presses on the remote control for height and tilt change (up or down) play short glissando events, up or down, identified for height or tilt by different basic pitch and timbre. 87

• Alarm button presses are rendered by a doorbell-like sound - this button is mostly used to inform nurses that assistance is needed; use for emergency is rare.

• Flush button presses are represented with a decaying noise burst.

• Continuous state of height and tilt are both represented as soft background drones; when their values change, they move to the foreground, and when their values are static, they recede into the background again.

This design mixes discrete-event sonification (marker sounds for the button presses) and continuous sonification (tilt and height).

Figure 6.3: FRR Log Player GUI and sounds mixer.

6.1.4 Interface design

The graphical user interface shown in 6.3 provides visual feedback, and possibilities for interaction: A button allows for selection of different episode files to sonify; it shows the filename when a file has been selected. If a user ID tag has been recorded in the log, that is shown. 88

For playing the sequence, Start/Stop buttons and a speed control are provided. Speed is the most useful control, as different patterns may appear on different timescales. A mixer for the levels of all the sound components is provided, and for tuning the details of each sound, can be called up from a button (”px mixer”). This ProxyMixer window allows for storing all tuning details as code scripts, so that useful settings can be fully documented, communicated and reproduced. The binary state variables are all visually displayed as buttons, and allow for triggering from the GUI: A button for the door, and buttons for remote buttons turn red when activated in the log. When they are pressed on the GUI, they play their corresponding sound, so users can learn the repertoire of symbolic sounds very quickly. The continuous variables are all displayed: time within the log as hours:minutes:seconds; height and tilt of the seat as labeled numbers and as a line with variable height and tilt. Finally, the last 5 and the next 5 events in the log are shown as text; this was very useful for debugging, and it provided an extra layer of available information to the users of the sonification design.

6.1.5 Evaluation for the research context

For the research context these data came from, this sonification design was successful: It represented time sequence data with several parallel streams of parameters, and events to be detected efficiently, and it was straightforward to learn and use. The researchers reported being able to use rather high speedups, and being able to achieve good recognition of different user categories. In fact, the time scaling was essential for understanding the meaning behind the sequential order and timing of events. Especially the times between events, the breaks, were instructive as they possibly point to problems of the user with the equipment to be evaluated. In short, the sonification design solved the task at hand more efficiently than the other tools previously used by the researchers.

6.1.6 Evaluation in SDSM terms

Within the subset of 30 episodes used for design development (out of 316), the longest is 1660 lines, and covers 32 minutes, while the shortest ones are ca. 180 lines, and 5 minutes. The SDSM Map shows data anchors for this variety of files, and marks for three different speedups of these 2 example files, original speed (x1) and speedups of x10 and x100. At lower speeds, one can leave the continuous sounds (tilt and height) on, while at high speedups, the rendering is clearer without them. The 8 (or 6 at higher speeds) data properties used are actually rendered technically as parallel streams; whether they are perceived as such is a question of the episode under study, the listening attitude, and 89 the playback speed. For example, one could listen to each button sound indivually, but usually the timed sequence of button presses would be heard as one stream of related, but disparate sound events.

Figure 6.4: SDS Map for the FRR Log Player.

While this design was created before the SDSM concept existed, it conforms to all basic SDSM recommendations, as well as secondary guidelines. Time is represented as time; time scaling as the central SDSM navigation strategy is available as a user interaction parameter. Thus, users can experiment freely with different time scales to bring different event patterns into focus; in SDSM terms, the expected ’gestalt number’ (here, the data time rescaling) can be adjusted to fit into the echoic memory time frame. This is supported here by adaptive time-scaling of the sound events: as time is sped up, sound durations shorten by the square root of the speedup factor (see below). Recorded (binary) events in time are represented by simple, easily recognized marker sounds; they either sound similar to the original events (the flush button, alarm bell), or they employ straightforward metaphors consistently (glissando up is up both for tilt or height), thus minimizing learning load. Continuous state is represented by a background drone, which is turned louder when changes happen; this jumping to the foreground amplifies the natural listening behavior of ’tuning out’ constant background sounds, and being alerted when the soundscape 90 changes. For higher speedups, researchers reported that they often turned these com- ponents off completely, so the option to let users do that quickly was useful. The time scaling of marker sounds is handled in a way that can be recommended for re-use: Constant sound durations create too much overlap at higher speeds, while pro- portional scaling to the speedup factor deforms the symbolic marker sounds too much for easy recognition. So, the strategy invented for this sonification was to scale the du- rations of the marker sounds by 1/(timeScalescaleExp); scaleExp values being between 0.0 (no duration scaling) and 1.0 (fully match sequence time scaling). For the time scaling range desired here, 1 to 100, scaling sound durations by the power of 0.5, i.e. the square root, has turned out to work well: the sounds are still easily recognized as transformations of their original type, and one can still follow dense sequences well.

6.2 ’Wahlges¨ange’- ’Election Songs’

This work is also described in de Campo et al.(2006a), and in the SonEnvir data collection here4. The SC3 code for running this design can be downloaded from the SonEnvir svn repository here5. It was designed by Christian Day´eand the author, and it is based on an example for the Model-Based Sonification concept called ’Sonogram’ described by Hermann(2002); Hermann and Ritter(1999) (not to be confused with standard spectrograms or medical ultrasound-based imaging). The code example by Till Bovermann is available here6. With the sonification design presented here, we can explore geographical sequences. As a straightforward and familiar example for social data with geographical distributions, we use election results; in particular, from the Austrian province of Styria, for provincial parliament elections in 2000 and 2005, and the national parliament election in 20067. Our interest focused on displaying social data both in their geographical distribution, and at a higher spatial resolution than usual. Whereas most common displays of social data focus on the level of districts (here, 17), we wanted to design a sonification that displays spatial distances and similarities in the election results among neighboring communities. The mind model is that of a journey through Styria. A journey can be defined as the transformation of a spatial distribution into a time distribution. A traveler who starts at

4http://sonenvir.at/data/wahlgesaenge/ 5 https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Prototypes/Soziologie/ElectionsDistMap/ 6 http://www.techfak.uni-bielefeld.de/ tboverma/sc/tgz/MBS Sonogram.tgz 7Styria is one of nine federal states in Austria. It consists of 542 communities grouped in 17 districts, and about 1 190 000 people live here. In autumn 2005, more than 700 000 Styrian voters elected their political representatives. The result of this election was politically remarkable: the ruling conservative party OVP¨ (Osterreichische¨ Volkspartei: Austrian People’s Party) has been defeated for the first time since 1945 by the left social-democratic party SPO¨ (Sozialdemokratische Partei Osterreichs:¨ Social- Democratic Party of Austria). 91 community A passes first the neighbouring communities, and the longer she is on the way the more space is between her and community A. Hence, in this sonification, the spatial distances between communities are mapped onto the time axis.

6.2.1 Interface and sonification design

The communities are displayed in a two-dimensional window on a computer screen (see figure 6.5). For each community, the coordinates of the community’s administrative offices were determined and used as the geographical reference point of the respective community. The distances as well as the angles within our data thus correspond with the real distances and angles between the communities’ administrative offices.

Figure 6.5: GUI Window for the Wahlges¨angeDesign. The left hand panel allows switching between different election results (and district/community levels of aggregation), and between the parties to listen to. It also allows tuning some param- eters of the sonification, and it displays a short description of the closest ten communities. The maps window shows a map of Styria with the community borders; this map is the clicking interface.

This sonification design depends strongly on user interaction: like most Model-Based Sonifications, it needs to be played, like to a musical instrument; without user actions, 92 there is no sound. Clicking the mouse anywhere in the window initiates a circular wave that spreads in two-dimensional space. The propagation of this wave is shown on the window by a red circle. When the wave hits a data point, this point begins to sound in a way that reflects its data properties. In our case, these data properties are the election results within each community. Thus, the user first hears the data point nearest to the clicking point, from the proper spatial direction, with pitch being controlled by the turnout percentage of the currently selected party in that community (high pitch being high percentage); then the result for the second-nearest community, and so on. The researcher can select different parties to listen to their results from the election under study. Further, the researcher can choose a direction in which to look and listen. In figure 6.5, this direction is North, indicated by the soft radial line within the circular wave. The line begins at the point where the researcher has initiated the wave, to provide visual feedback while listening, and keeping a trace of which initial location the current sound was generated for. Data points along this line are heard from the front, others are panned to their appropriate directions. While this sonification was designed for a ring of twelve speakers surrounding the listener, it can be used with standard stereo equipment as well: For stereo headphones, one changes to a ring of four, and listens to the front two channels. Then, data points along the main axis are heard from the center, those on the left (or right) are panned accordingly, 90 degrees being all the way left or right8. The points at more than 90 degrees off axis progressively fade out, and those above 135 degrees off axis are silent. The GUI provides the following sonification parameter controls: A distance exponent defines how much the loudness for a single data point decreases with increasing distance. For 2D spaces, 1/distance is physically correct, but stronger or weaker weightings are interesting to experiment with. The velocity of the expanding wave in km/second. The default of 50 km/sec scales the entire area (when played from the centre) into a synakusis-like time scale of 3 seconds. Slower or higher speeds can be experimented with to zoom further in or out. The maximum number of communities (Orte in German) that will be played. Selecting only the nearest 50 or so data points allows for exploration of smaller areas in more detail. The decay time for each individual sound grain. At higher speeds, shorter decays create less overall overlap, and thus provide more clarity; for smaller sets and slower speeds, longer decay times allow for more detailed pitch perception and thus higher perceptual

8Note that for stereo speakers at +-30 degrees, the angles within +-90 degrees are scaled together to +-30 degrees - which we find preferable to keeping the angles intact and only hearing a 60 degree ’slice’ of all the data points, which could be done by leaving the setting at 12 channels, and only using the first 2. 93 resolution. The direction in which the wave is looking; in the sound, this determines which direction will be heard from the front. The direction can be rotated through North, West, South and East. For more detail information, the ten data point locations nearest to the clicked point are shown on a list view.

6.2.2 Evaluation

This sonification design is a good tool for outlier analysis. It works rather fast at a low level of aggregation (communities), and outliers are easily identified by tones that are higher than their surroundings. Typically, these are local outliers: in an area that has a local average value of say 30%, you can hear a 40% result ’sticking out’; when analysing the entire dataset statistically, this may not show up as an outlier. A second strong feature is the ability to get a quick impression of distributions of a data dimension with their spatial order intact, so achieving the tricky task of developing an intuitive grasp of the details of one’s data becomes more likely. This sonification design is not restricted to election data: Other social indicators that are assessed at the community level (unemployment rates, labor force participation rate of women, and others) can be included. To represent them in conjunction with e.g. election results promotes the investigation of local dependencies that might be hidden by higher aggregation levels or by the mathematical operations of correlation coefficients. Finally, this sonification design is of course not restricted to the geographical borders of Styria. It can be used as an exploratory tool enabling researchers to quickly scan social data in their geographical distribution, at different aggregation levels. Given an inter- esting question to address at such higher levels, an adaptation to different geographical scales, i.e. European and global data distributions is straightforward to do, e.g. with nations as the aggregation entity. When considered from an SDSM perspective, this sonification design respects a number of SDSM recommendations: It shows the important role of interaction design, while the sound aspect of the sonification design itself remains rather basic. It also shows the central importance of time scaling/zooming between overview and details; in fact this design was the source for recommending this particular time-scaling strategy within the SDSM concept. The design also demonstrates metaphorical simplicity recommended by SDSM. An SDSM graph shows that the sonification can render one data property of the entire set within echoic memory time frame, and zoom into more detail by selecting subsets, or by slowing down the propagation speed. 94

Figure 6.6: SDS-Map for Wahlges¨ange.

6.3 Social Data Explorer

6.3.1 Background

This sonification design is a study for mapping geographically distributed multidimen- sional social data to a multiparametric sonification, i.e. a classical parameter mapping sonification. It offers a number of interaction possibilities, so that sociologists (the intended user group) can experiment with changing the mappings freely. This serves both for learning sonification concepts by experimentation and for finding interesting mappings, for instance, mappings that confirm known correlations between parameters. The example data file contains the distribution of the working population of all 542 communities in Styria by sectors of economic activities, given in table 6.1. This data file is quite typical for geographically distributed social data.

6.3.2 Interaction design

A number of interactions can be accessed from the user interface shown in figure 6.7: ’Order’ allows sorting by a chosen parameter (alphabetically or numerically); ’up’ is ascending, ’dn’ is descending. The number-box is for choosing one data item to inspect 95

Table 6.1: Sectors of economic activities

Agrarian, Wood-, and Fishing Industries Mining Production of commodities Energy and Water Industries Construction Trade Hotel and Restaurant Trade Traffic and Communication Credit and Insurance Realty, Company Services Public Administration, Social Security Pedagogy Health, Veterinary, and Social Services Other Services Private Households Exterritorial Organisations First-time seeking work

by index in the sorted data, so e.g. 0 is the first data point of the current sorted order. Every parameter of the sonification can be mapped by using the elements of a ’mapping line’: For every synthesis or playback parameter, users can select a data dimension. The data range in minimum and maximum values is displayed. The data can have a ’warp’ property, i.e. whether the data should be considered linear, exponential, or have another characteristic mapping function. The arrow-button below ’pushes’ the range of the current data dimension to the editable number boxes, as this is the data scaling range (’mimax’) to use for parameter mapping. This range can be adjusted, in case this becomes necessary to experiment with a specific hypothesis. The second ’mimax’ range is the Synth parameter range, which is adjustable, as is the warp factor. Here, the arrow-button also pushes in the default parameter values. The ’range’ display that follows shows the default synthesis parameter range (e.g. 20- 20000 for frequency), and the popup menu under ’Synth Param’ shows the name of the parameter chosen for that mapping line. Setting ’playRange’ determines the range of data point indices to play within the current sorted data, with 0 being the first datapoint. ’post current range’ posts the current range in the current order. The final group of elements, labeled ’styrData’, allows for starting and stopping the 96

Figure 6.7: GUI Window for the Social Data Explorer. The top line of elements is used for sorting data by criteria. The five element lines below are for mapping data dimensions to synth parameters, and scaling the ranges flexibly. The bottom line allows selecting a range of interest within the sorted data, and sonification playback control. sonification playback.

6.3.3 Sonification design

The sonification design itself is quite a simple variant of discrete-event parameter map- ping. Three different synthesis processes (’synthdefs’) are provided, all with control parameters for freq, amp, pan, sustain. The synthdefs mainly vary in the envelope they use (one is quasi-gaussian, the other two percussive), and in the panning algo- rithm (’sineAz’ is for multichannel ring-panning). Which of these sounds is used can be changed in the code. The player algorithm iterates over the chosen range of data indices. It maps the values of each data item to values for the synthesis event’s parameters, based on the current mapping choices. If nothing is chosen for a given synthesis parameter, a default value is used (e.g. for duration of the event, 0.1 seconds). 97 6.3.4 Evaluation

For experimenting with parameter mapping sonification, this design allows for similar rendering complexity as the Sonification Sandbox (Walker and Cothran(2003)), though without parallel streams of sounds. Both the user interface and the sonification itself are sketches rather than polished applications, e.g. the user interface could allow loading data files, switching between instruments, and derive its initial display state from the current state of the model. Given more development time, it would benefit from multiple and more complex sound functions, from making more functionality available from GUI elements, and from fuller visual representation of the ongoing sonification. Nevertheless, according to the sociol- ogist colleague who experimented with it, it supported exploration of the particular type of data file well enough to confirm its viability. While we intended to experiment with designs bridging between the Wahlges¨angedesign and the Social Data Explorer, this was not pursued, mainly due to time constraints, and because other ventures within the SonEnvir project were given higher priority. Chapter 7

Examples from Physics

In the course of the SonEnvir project, we began with sonifications of quantum spectra, and later decided to shift the focus to statistical spin models as employed in computa- tional physics, for various reasons given below. Sonification has been used in physics rather intuitively, without referring to the term explicitly. The classical examples are the Geiger counter and the Sonar, both monitoring devices for physical surroundings. An early example of research using sonification is the experiment of the inclined plane by Galileo Galilei. Following Drake(1980), it seems plausible that Galilei used auditory information to verify the quadratic law of falling bodies (see chapter3, and figure 3.1.1). In reconstructing the experiment, Riess et al. (2005) found that time measuring devices of the 17th century (water clocks) were almost certainly not precise enough for these experiments, while rhythmic perception was. In modern physics, sonification has already played a role: one example of audification is given in a paper by Pereverzev et al., where quantum oscillations between two weakly coupled reservoirs of superfluid helium 3 (predicted decades earlier) were found by lis- tening: ”Owing to vibration noise in the displacement transducer, an oscilloscope trace [...] exhibits no remarkable structure suggestive of the predicted quantum oscillations. But if the electrical output of the displacement transducer is amplified and connected to audio headphones, the listener makes a most remarkable observation. As the pressure across the array relaxes to zero there is a clearly distinguishable tone smoothly drifting from high to low frequency during the transient, which lasts for several seconds. This simple observation marks the discovery of coherent quantum oscillations between weakly coupled superfluids.” (Pereverzev et al.(1997)) Next to sonification methods in physics, physics methods found their way into sonifica- tion, as in the model-based sonification approach by Hermann and Ritter(1999). For example, in so called data sonograms, physical formalisms are used to explore high- dimensional data spaces; an adaptation of the data sonogram approach has been used in the ’Wahlges¨ange’sonification design described in section 6.2.

98 99

Physics and sonification

In physics, sonification has particular advantages. First of all, modern particle physics is usually described in a four-dimensional framework. For a three dimensional space evolving in time, a complete static visualisation is not possible any more. This makes it harder to understand and thus very abstract - thus in both didactics and research, sonification may be useful. In the auditory domain, many sound parameters may be used to display a four-dimensional space, maintaining symmetry between the four dimensions by comparing different rotations of their mappings. A feature of auditory dimensions that has to be taken into account is that these dimensions are generally not orthogonal, but could rather be compared to mathematical subspaces (see Hollander(1994)). This concept is very common in physics, and thus easily applicable. Furthermore in physics, many phenomena are wave phenomena happening in time, just as sound is. Thus sonification provides a very direct mapping. While scientific graphs usually map the time direction of physical phenomena onto a geometrical axis, this is not necessary in a sonification, where physical time persists, and multiple parameters may be displayed in parallel. While perceptualisation is not intended to replace classical analytical methods, but rather to complement them, there are examples where visual interpretation is superior to or at least preceding mathematical treatment. For instance, G. Marsaglia(2003) describes a battery of tests for the quality of numerical random number generators. One of these is the parking lot test, where mappings of randomly filled arrays in one plane are plotted and visually searched for regularities. He argues that visual tests are striking, but not feasible in higher dimensions. As nothing is known beforehand about the nature of patterns that may appear in less than ideal random number generators, there is no all-encompassing mathematical test for this task. Sonification is a logical continuation of such strategies which can be applied with multidimensional data from physical research contexts. The major disadvantage of sonification we encountered is that physicists (and probably natural scientists in general) are not familiar with it. Visualisation techniques and our learnt understanding of them has been refined since the beginnings of modern science. For auditory perception especially, we were e.g. confronted with the opinion that the hearing process is just a Fourier transformation, and could be fully replaced by Fourier analysis. This illustrates that much work is required before sonification becomes standard practice in physics. 100 7.1 Quantum Spectra Sonification1

Quantum spectra are essential to understand the structure and interactions of composite systems in such fields as condensed matter, molecular, atomic, and subatomic physics. Put very briefly, quantum spectra describe the particular energy states which different subatomic particles can assume; as these cannot be observed directly, competing models have been developed that predict the precise values and orderings of these energy levels. Quantum spectra provide an interesting field for auditory display due to the richness of their data sets, and their complex inner relations. In our experiments (’us’ refer- ring to the physics group within SonEnvir), we were concerned with the sonification of quantum-mechanical spectra of baryons, the most fundamental particles of subatomic physics observed in nature. The data under investigation stem from different competing theoretical models designed for the description of baryon properties. This section reports our attempts at finding valid and useful strategies for displaying, comparing and explor- ing various model predictions in relation to experimentally measured data by means of sonification. We investigated the possibilities of sonification in order to develop them as a tool for classifying and explaining baryon properties in the context of present particle theory. Baryons - most prominently among them the proton and the neutron - are considered as bound systems of three quarks, which are presently the ultimate known constituents. The forces governing their properties and behaviour are described within the theory of quantum chromodynamics (QCD). While up to now this theory is not yet exactly solvable for baryons (at low and intermediate energies), one resorts to effective models, such as constituent quark models (CQMs). CQMs have been suggested in different variants. Existing models differ mainly in which components they consider to constitute the forces binding the constituent quarks: All models include a so called confinement component - as the distance between quarks expands, the forces between them grow, which keeps them confined - and a hyperfine interaction, which models interactions between quarks by particle exchange. As a result there is a variety of quantum-mechanical spectra for the ground and excited states of baryons. The characteristics of the spectra contain a wealth of information important for the understanding of baryon properties and interactions. Baryons are also classified by the combinations of quarks they are made up of, and by a number of other properties such as color, flavor, spin, parity, and angular momentum, which can be arranged in symmetrical orders. For more background in Constituent Quark Models and baryon classification, please refer to Appendix C.1.

1This section is based on material from two SonEnvir papers: de Campo et al.(2005d) and de Campo et al.(2006a). 101 7.1.1 Quantum spectra of baryons

The competing CQMs produce baryon spectra with characteristic differences due to the different underlying hyperfine interactions. In figure 7.1 the excitation spectra of the nucleon (N) and delta (∆) particles are shown for three different classes of modern relativistic CQMs. While the ground states are practically the same (and agree with experiments) for all CQMs, the excited states show different energies and thus level orderings. (For instance, in the OGE CQM the first excitation above the N ground state P 1− P 1+ is J = 2 , whereas for the GBE CQM it is J = 2 .) Evidently the predictions of the GBE CQM reach the best overall agreement with the available experimental data.

Figure 7.1: Excitation spectra of N (left) and ∆ (right) particles. In each column, the three entries left to right are the energies (in MeV, or Mega-electronVolts) based on One-Gluon exchange (Eidelman(2004)), Instanton-induced (Glozman et al.(1998); Loering et al.(2001)), and Goldstone-Boson Exchange (Glantschnig et al.(2005)) constituent quark models. The shaded boxes represent experimental data, or more precisely, the ranges of imprecision that measurements of these data currently have (Eidelman(2004)).

7.1.2 The Quantum Spectra Browser

Sonifying baryon mass spectra

The baryon spectra as visualised by patterns such as in Fig. 7.1 allow a discrimination of the qualities of the CQM description of experiment. Also one can read off characteristic features of the different CQMs such as the distinct level orderings, etc. However, it is quite difficult to conjecture specific symmetries or other relevant properties in the dynamics of a given CQM by just looking at the spectra. Thus, there are a number of open research questions where we expected sonification to be helpful. We began by identifying phenomena that are likely to be discernible in sonification experiments: 102

1+ • Is it possible to distinguish e.g. the spectrum of an N 2 nucleon from, say, a delta 3+ ∆ 2 by listening only? • Is there a common family sound character for groups of particles, or for entire models?

• In the confinement-only model, the intentionally absent hyperfine interaction causes data points to merge into one: is this clearly audible?

We studied the sonification of baryon spectra with three specific data sets. They contain the N as well as ∆ ground state and excitation levels for three different dynamical situations: 1) the GBE CQM (Glozman et al.(1998)), 2) the OGE CQM (Theussl et al. (2001)), and 3) the case with confinement interaction only, i.e., omitting the hyperfine interaction component. Each one of these data files is made up of 20 lists, and each list contains the energy levels of a particular N as well as ∆ multiplet J P . The lists are different in length: Depending on the given J P multiplet they contain 2 - 22 entries, since we only take into account energy levels up to a certain limit.

Sonification design

For sonification of baryon spectra, the most immediately interesting feature is the level spacing. The quantum-mechanical spectrum is bounded from below and its absolute position is fixed by the N ground state (at 939 MeV); above that, spectral lines up to ca 3500 MeV appear for the excited states in the spectrum of each particle. As the study of these level spacings depends on the precise nature of the distances between these lines within and across particles, a sonification design demands high resolution for that parameter; thus we decided to map these differences between the energy levels to audible frequencies. Several mapping strategies were tried for an auditory display of the spacings between the energy levels in the spectra: I) Mapping the mass spectra to frequency spectra directly, with tunable transposition together with optional linear frequency shift and spreading, and II) Mapping the (linear) mass spectra to a scalable pitch range, i.e. using perceptually linear pitch space as representation. Both of these approaches can be listened to as simultaneous static spectra (of one particle at a time) and as arpeggios with adjustable temporal spread against a soft background drone of the same spectrum.

Interface design

These models are implemented in SuperCollider3 scripts; for more flexible browsing, a GUI was designed (see figure 7.2). All the tuneable playback settings can be changed 103 while playing, and they can be saved for reproducibility and an exchange of settings between researchers. Some tuning options have been included in order to account for known data properties: E.g., the values calculated for higher excitations in the mass spectra are considered to be less and less reliable; we modeled this with a tuneable slope factor that reduces amplitude of the sounds representing the higher excitation levels in all models.

Figure 7.2: The QuantumSpectraBrowser GUI. The upper window allows for multiple selection of particles that will be iterated over in 2D loops; or alternatively, for direct playback of that particle by clicking. The lower window is for tuning all the parameters of the sonification design interactively.

For static data like these, flexible, interactive comparison between different subsets of the data is a key requirement; e.g. in order to find out whether discrimination by parity P is possible with auditory display, one will want to automatically play interleaved sequences alternating between selected particles with positive and negative parities. The Quantum Spectra Browser window allows for the following interactions: The buttons Manual, Autoplay choose between manual mode (where click on a button switches to the associated sound) and an autoplay mode that iterates over all the selected 104 particles, either horizontally (line by line) or vertically (column by column). The buttons LoopStart, LoopStop stop and start this automatic loop; the numberbox stepTime sets for how many seconds each spectrum is presented. The three rows of buttons below Goldstone, OneGluon, Confinement allow for playing individual spectra, or for a multiple selection of which particles are heard in the loop. The QSB Sound Editor allows for setting many synthesis/spatialisation parameters: fixedFreq sets the freqency that corresponds to ground state; the default value is 939 Hz (for 939 MeV). fRangScale rescales the frequency range the other energy levels are mapped into: a scale of 1 is original values, 2 expands to twice the linear range. As this distorts proportions, we mostly left this control at 1. transpose transposes the entire spectrum by semitones, so a value of -24 is two ocatves down. This leaves proportions intact, and many listeners find this frequency range more comfortable to listen to. slope determines how much the frequency components for higher energy levels are attenuated; this models the decreasing validity of higher energy levels. 0 is full level, 0.4 means each line is softer by a factor of 1 − 0.4 than the previous line. (The frequency-dependent sensitivity of human hearing is compensated for separately using the AmpComp unit generator). panSpread sets how much spectral lines are separated spatially. With a spread of 1, and stereo playback, the ground state is all the way left, and the highest excited state is all right; less than 1 means they are panned closer together. When using multichannel playback, this can expand over a series of up to 20 adjacent channels. panCenter sets where the center line will be panned spatially - 0 is center, -1 is all left, 1 is all right. The remaining parameters tune the details of an arpeggiation loop: essentially, a loop of spread-out impulses excites the spectral lines individually, and they ring until a minimum level is reached. ringTime determines how long each component will take to decay (RT for -60dB) after an impulse. bgLevel maintains presence of the entire spectrum as one gestalt: the sprectal line sounds will only decay to this minimum level and remain at that level. attDelay determines when within the loop the first attack will play. attSpread determines how spread out the attacks will be within the loop time; within the loop the first attack will play. loopTime determines the time for one cycle of impulses.

7.1.3 The Hyperfine Splitter

Addressing a more subtle issue, we then designed a Hyperfine Level Splitter, which allows for studying the so called splittings of the energy levels due to a variable strength of the hyperfine interaction inherent in the CQMs. The hyperfine interaction is needed in order to describe the binding of three quarks more realistically, i.e. in closer accordance with experimental observation. When it is absent (in simulations), certain quantum states are degenerate, meaning that the corresponding energy levels of some particles coincide. 105

In the first demonstration example, we chose the excitation levels of two different particles (the Neutron n-1/2+ and the Delta d3/2+), calculated within the same CQM, the Goldstone-Boson Exchange model (gbe) Glozman et al.(1998). These two particles are degenerate when there is no hyperfine interaction present.

Sonification design

Mapped into sound, this means that one hears a chord of three tones for the ground states and the first two excitation levels, which are the same for both particles. Here, auditory perception is more difficult than in the Quantum Browser, as the mass spectra are being played as continuous chords, and the hyperfine interaction may be ’turned up’ gradually (to 100 percent). Thereby, the energy levels are pulled apart, and one hears a complex chord of six tones. The two particles that are compared can be distinguished acoustically now, as when they are observed in experiments. With the Level Splitter, the dynamical ingredients leading to these energy splittings may be studied in detail, and likewise the quantitative differences in distinct CQMs. The underlying sonification design is an extension of that for the Quantum Browser. Mainly, some parameters are added to control the number of spectral lines to be rep- resented at once, and a balance control between the simultaneous or interleaved two channels that are compared.

Interface design

The Hyperfine Data Player window allows for the following interactions: The sets of pop-up menus labeled left and right select which model (GBE, OGE), which particle (Nukleon, Delta, etc.), which state (1/2, 3/2 etc), and which parity (+, -, both) is chosen for sonification in that audio channel. The slider percent determines where to interpolate between the model points of choice and their corresponding points in the Confinement-only model; this is where the hyperfine interaction component to the model can be gradually turned on or off. The graphical view below, labeled l3, l2, l1 - l1, l2, l3 shows the precise values for the first several energy states of the two particles chosen. The very bottom is ground state (939MeV), the visible range above goes up to 3500. In the state shown, a so called ’level crossing can be seen (and heard): level 3 of GBE nucleon 1/2 (both parities) crosses below level 2; by comparison, in OGE, the same particle has monotonically ascending spectral energy states. The bottom row of buttons stops and starts the sonification, posts the current interpolated values, and recalls a number of prepared demo settings. The Hyperfine Editor allows for setting many synthesis/spatialisation parameters fa- miliar from the Quantum Spectra Browser, as well as several more: balance sets the balance between left and right channels. bgLevel sets the minimum level for arpeggiated 106

Figure 7.3: The Hyperfine Splitter GUI. The left window is for selecting two particles by model, particle name, spin, and parity; the hyperfine component is faded in and out with the slider in the middle. The bottom area shows the audible spectral lines central-symmetrically. The window on the right side is the editor for the synthesis and spatialisation parameters of the sonification design. settings, as above. brightness adds harmonic overtones (by frequency modulation) to the individual lines so that judging their pitch becomes easier. pitStretch rescales the pitch range the other energy levels are mapped into: a scale of 1 is original values, 2 expands to twice the intervallic range. (This is different from fRangScale above, which used linear scaling). transpose transposes the entire spectrum by semitones, as above. melShift determines when within the loop the second channel’s attack will play relative to the first. 0 means they play together, 3 means they are equally spaced apart (by 3 of 6 subdivisions); the maximum of 6 plays them in sync again. melSpread determines how much the attacks within one channel are arpeggiated; 3 means such that they appear equally spread in time. Together, these two controls allow alternating the two spectra as groups, or interleaving the individual spectral lines across the two spectra. ringAtt determines how fast the attack times are for both channels. ringDecay sets the decay time for the spectral lines for both channels. nMassesL, nMassesR is handled automati- cally when changing particle types and properties. This is the number of masses audible in the spectrum, which can be reduced by hand if desired. ampGrid sets the volume for a reference grid (clicks) which can be turned on for orientation. 107 7.1.4 Possible future work and conclusions

At this point, the physicists that had worked closely with these data in their own research agenda unfortunately had to leave the project, which meant that this line of experiments came to an end before more interesting ideas could be experimented with. These were intended to explore a number of further aspects that may ultimately be relevant in the scientific study of particle physics by sonification, and for completeness, these ’loose ends’ are given here.

Comparison with experimental data

As can be seen seen from figure 7.1, there are several experimental data available for the energy levels. However, they are affected by experimental uncertainties. Consequently, their auditory display needs some adaptations. We intended to differentiate between (sharp) theoretical data as deduced from the CQMs and (spread) phenomenological data measured in experiment by adding narrow band modulation to spread-out data bands. It should be quite interesting to qualify the theoretical predictions vis-a-vis the experimental data.

Representing symmetries with spatial ordering

Much effort has gone into finding visual representations for the multiple symmetries between particle groups and families. Arranging the sound representations in 3D-space with virtual acoustics in a spatial order determined by symmetry properties between particle groups may well be scientifically interesting; navigating such a symmetry space could become an experience that lets physicists acquire a more intuitive notion of the nature of these symmetries.

Temporal aspects

There are plenty of interesting time phenomena in the quantum physics, which could be made use of in numerous ways in further explorations. For example, there is an enormous variation in the half-life of different particles. This could be expressed quite directly in differentiated decay times for different spectral lines. In addition, including the probabilities for transitions between excited states and ground states will open promising possibilities for demonstrating the dynamical ingredients in the quark interactions inside baryons. 108

Conclusions

Our investigations have indicated that sonification is an interesting alternative and a promising complementary tool for analysing quantum-mechanical data. While many interesting design ideas came up in this line of research, which may well be useful for other contexts, the implemented sonification designs were not fully tested by domain experts in this quite specialised field. Given motivated domain science research partners, a number of good candidates for sonification approaches remain to be explored further in the context of quantum spectra. 109 7.2 Sonification of Spin models2

Spin models provide an interesting test case for sonification in physics, as they model complex systems that are dynamically evolving and not satisfactorily visualisable. While the theoretical background is largely understood, their phase transitions have been an interesting subject of studies for decades, and results in this field can be applied to many scientific domains. While most classical methods of solving spin models rely on mean values, their most important feature, especially at the critical point of phase transition, are the spin fluctuations of single elements. Therefore we started out with the fluctuations of the spins, and provided auditory in- formation that can be analysed qualitatively. The goal was to display three-dimensional dynamic systems, distinguish the different phases and study the order of the phase tran- sition. Audification and sonification approaches were implemented for the spin models studied, so that both realtime monitoring of the running model and analysis of pre- recorded data sets is possible. Sound examples of these sonifications are described in Appendix C.2.

7.2.1 Physical background

Spin systems describe macroscopic properties of materials (such as ferromagnetism) by computational models of simple microscopic interactions between single elements of the material. The principal idea of modeling spin systems is to study a complex system in a controlled way, where they are theoretically tractable, and mirror the behaviour of real compounds. From a theoretical perspective, these models are interesting because they allow studying the behaviour of universal properties in certain symmetry groups. This means that some properties do not depend on details like the kind of material, such as so-called order parameters giving the order of the phase transition. Already in 1945, E. A. Guggenheim (cited in Yeomans(1992)) found that the phase diagrams of eight different fluids he studied shows the very same coexistence curve3. A theoretical explanation is given by a classification in symmetry groups – all of these different fluids belonged to the same mathematical group.

2This section is based on a SonEnvir ICAD conference paper, Vogt et al.(2007) 3This becomes apparent when plotted in so-called reduced variables, the reduced temperature being

T/Tcrit, the actual temperature relative to the critical one, and pressure is treated likewise. 110 7.2.2 Ising model

One of the first spin models, the Ising model, was developed by Ernst Ising in 1924 in order to describe a ferromagnet. Since the development of computational methods, this model has become one of the best studied models in statistical physics, and has been extended in various ways.

Figure 7.4: Schema of spins in the Ising model as an example for Spin models. The lattice size here is 8 x 8. At each lattice location, the spin can have one of two possible values, or states (up or down).

Its interpretation as a ferromagnet involves a simplified notion of ferromagnetism.4 As shown in figure 7.4, it is assumed that the magnet consists of simple atoms on a quadratic (or in three dimensions cubic) lattice. At each lattice point an atom (here, a magnetic moment with a spin of up or down) is located. In the computational model, neighbouring spins try to align to each other, because this is energetically more favorable. On the other hand, the overall temperature causes random spin flips. At a critical temperature

Tcrit, these processes are in a dynamic balance, and there are clusters of spins on all orders of magnitude. If the temperature is lowered from Tcrit, one spin orientation will prevail. (Which one is decided by the random initial setting.) Macroscopically, this is the magnetic phase (T < Tcrit). At T > Tcrit, the thermal fluctuations are too strong for uniform clusterings of spins. There is no macroscopic magnetisation, only thermal noise. 4There are many different application fields for systems with next neighbour interaction and random behaviour. Ising models have even been used to describe social systems, as e.g. in P. Fronczak(2006), though this is a disputed method in the field. 111 7.2.3 Potts model

A straightforward generalisation of this model is the admission of more spin states than just up and down. This was realized by Renfrey B. Potts in 1952, and was accordingly called the Potts model. Several other extensions of models were studied in the past. We worked with the q-state Potts model and its special case for q = 2, the Ising model, both being classical spin models. For mathematical background, see Appendix C.2.

The order of the phase transition is defined by a discontinuity in the derivates of the free energy (see figure 7.5). If there is a finite discontinuity in one of the first derivatives, the transition is called first order. If the first derivatives are continuous, but the second derivatives are discontinuous, it is a so-called continuous phase transition.

Figure 7.5: Schema of the orders of phase transitions in spin models. The mean magnetisation is plotted vs. decreasing temperature. (a) shows a continuous phase transition and (b) a phase transition of first order. In the latter, the function is discontinuous at the critical temperature. The roughly dotted line gives an approximation on a finite system, e.g. a computational model. The bigger the system, the better this approximation models the discontinuous behaviour.

Nowadays, spin models are usually simulated with Monte Carlo algorithms, giving the most probable system states in the partition function (Yeomans, 1992, p. 96). We implemented a Monte Carlo simulation for an Ising and Potts model in SuperCollider3 (see figure 7.2.3). The lattice is represented as a torus (see fig. 7.8) and continually updated: for each lattice point, a different spin state is proposed, and the new overall energy calculated. As shown in equation C.3, it depends on the neighour’s interactions

(SiSj) and the overall temperature (given by the coupling J ∼ 1/T ). If the new energy is smaller than the old one, the new state is accepted. If not, there is still a certain chance that it is accepted, leading to random spin flips representing the overall temperature. To observe the model and draw conclusions from it, usually mean values of observables are calculated from the Monte Carlo simulation, e.g. the overall magnetisation. The simulation needs time to equilibrate at each temperature in order to model physical 112 reality, e.g. with small or large clusters. Big lattices with a length of e.g. 100 need many equilibration steps. With a typical evolution of the model, critical values or the order of the phase transition can be deduced. This is not rigorously doable, as on a finite lattice a function will never be continuous, compare figure 7.5. In a quantised system, the ”jump” in the observable will just look more sudden for a first order phase transition. This last point is both an argument for using sonification and a research goal for this study: by using more information than the mean values, the order of the phase transition can be more clearly distinguished. Also, we studied different phase transitions with the working hypothesis that there might be principal differences in the fluctuations, which can be better heard. (A Potts model with q ≤ 4 states has a continuous phase transition, whereas with q ≥ 5 states it has a phase transition of first order.) Thus researchers may gain a quick impression of the order of the phase transition.

Implementing spin models

In all the analytical approaches, the solving procedures of models are based on abstract mathematics. This gives great insight in the universal basics of critical phenomena, but often a quick glance on a graph complements classical analysis, as mentioned above. Thus in areas where visualisation cannot be done, applying sonification can help to reach an intuitive understanding with relatively few underlying assumptions. Sonification tools can also serve as monitoring devices for highly complex and high dimensional simulations. The phases and the behaviour at the critical temperature can be observed. Finally, we were particularly interested in sonification of the critical fluctuations with self-similar clusters on all orders of magnitude. We wanted to provide for a more or less direct observation of data on all levels of the analysis, both to verify assumptions and to not overlook new insights. This should be done by observing the dynamic evolution of the spins, not only mean values. Thus, the important characteristic of spin fluctuations can be studied and the entire system continuously observed.

Spin model data features

Spin models have several basic characteristics, which were used in different sonifica- tion approaches. These properties refer to the structure of the model, the theoretical background and its interpretation, and they were exploited for the sonification as follows:

• The models are discrete in space by fixed lattice positions and these are filled with discrete valued spins. The data sets are rather big, on the order of a lattice size of 100 in two or three dimensions, and are dynamically evolving. Because of the 113

Figure 7.6: GUI for the running 4-state Potts Model in 2D. The GUI shows the model in a state above critical temperature, where large clusters emerge. The lattice size is 64x64. The averages below the spin frame show the development of the mean magnetisation for the 4 spin parities over the last 50 configurations. As the temperature is constant and the system has been equilibrated before, these mean values are rather constant. 114

specifics of the modeling, the simulations are only correct on the statistical aver- age, and many configurations have to be taken into account together for correct interpretation. Considering that a single auditory event has to have some mini- mum duration to display perceptually distinguishable characteristics, we explored two options for the auditory display: a fast audification approach, and omission, i.e. representing only a subset of all spins, using a granular approach.

• The models are calculated by next-neighbour interaction aligning the spins on the one hand, and random fluctuations on the other. We aimed to preserve the next- neighbour property at least partially by different strategies of moving through the data frame: either along a conventional torus path, or along a Hilbert-curve, see fig. 7.8 (in approaches 7.2.4, 7.2.5 and 7.2.7). For the lossy (omission) approach, the statistical nature of the model was preserved by picking random elements for the granular sonification.

• There is a global symmetry in the spins, thus - in the absence of an exterior magnetic field - no spin orientation is preferred. This was mapped for the Ising model by choosing the octave for the two spin parities. In the audifications, every spin orientation is assigned a fixed value, and symmetry is preserved as the sound wave only depends on the relative difference between consecutive steps in the lattice.

• At the critical point of phase transition, the clusters of spins become self-similar on all length scales. We tried to use this feature in order to generate a different sound quality at the point of phase transition. This would allow a clear distinc- tion between the two phases and the (third) different behaviour at the critical temperature itself.

7.2.4 Audification-based sonification

In this approach, we tried to utilise the full available information generated by the model. As the Sonification Design Space Map suggests audification for higher density auditory display, we interpreted the spins within each time instant as a waveform (see figure 7.7). This waveform can be listened to directly or taken as a modulator of a sine wave.5 When the temperature is lowered, regular clusters emerge, changing only slowly from time step to time step. Thus, if the audification preserves locality, longer structures will emerge aurally as well, resulting in more tone-like sounds. When one spin dominates, there is silence, except for some random thermal fluctuations at non-zero temperature.

5While this would not qualify as an audification by the strictest definition, such a simple modulation is still conceptually quite close. 115

Figure 7.7: Audification of a 4-state Potts model. The first 3 milliseconds of the audio file of a model with 4 different states in the high temper- ature phase (noise).

Figure 7.8: Sequentialisation schemes for the lattice used for the audification. The left scheme shows a torus sequentialisation, where spins at opposed borders are treated as neighours. This treats a 2D-grid like a torus (a doughnut shape), as row by row is read. On the right side a Hilbert curve is shown.

While fig. 7.7 explains handling of one line of data for the sonification, the question remains how to move through all of them. Different approaches of sequentialisation are shown in fig. 7.8. The model has periodic boundary conditions, so a torus path is possible. We also experimented with moving through the lattice along a Hilbert curve. This is a space filling curve for quadratic geometries, reaching every point without inter- secting with itself. This was intended to make the audification insensitive to differences which arise depending on whether rows or columns are read first, which can occur in the case of symmetric clustering. Eventually, it turned out that symmetric clustering mainly depends on unfavorable starting conditions and occurs only rarely, so we mostly used a torus path, as the model does in the calculation. The sounds were recorded directly from the interactive model, using the GUI shown in fig. 7.2.3 for a specific temperature. In order to judge the phase of the system, this simple method is most efficient. 116

At the time of recording, the model has already been equilibrated - its state represents a typical physical configuration for the specific temperature. When the temperature is cooled down continually, the system needs several transition steps at each new temper- ature before the data represents the new physical state correctly. Thus, in a second approach, data was pre-recorded and stored as a sound-file. Contrary to our assump- tions, the continuous phase transition is not very clearly distinguishable from the first order phase transition. This is partly due to the data - on a quantised lattice there are no truly continuous observables, so the distinction between first and second order transitions is fuzzy in principle. A fundamental problem is that the equilibration steps (which are not recorded!) between the stored configurations cut out the meaningful transitions between them: That these equilibration steps are needed at all is in fact a common drawback in the established computational spin models. When one considers every complete lattice state as one sequence of single audio samples (e.g. 32x32 = 1024 lattice sites), then with a sampling frequency of 44100 Hz, at every 23 ms a potentially completely different state is rendered, instead of a continuously evolving system with only few changes in the cluster structures from one frame to the next. This makes it more difficult to understand the dynamic evolution of the transitions. We tried to leave out as few equilibration steps as possible to stick closely to a physical relevant state and still keep the transitions understandable. Consequently, we recorded e.g. for a 32x32-lattice every 32nd step, and on the whole 10 different couplings (temperatures), every 32 times. Thus, our soundfiles (described in appendix C.2) have (32 x 32) lattice sites x 10 couplings x 32 record steps = 327680 samples, and last 7.4 s. Still, when comparing a 4-state Potts model to one with 5 spin states, the change in the audio pattern is only slightly more sudden in the latter.

7.2.5 Channel sonification

We refined the audification approach by recording data for each spin separately. This concept is shown in figure 7.9. All of the lattice is sequentialised like a torus (see fig. 7.8) and read out for every spin state separately. When data of spin A is collected, only lattice sites with spin A are set to 1; all the others to 0. On the contrary, when spin B data is collected, all lattice sites with spin A are set to 0, and spin B to 1; and so forth.

Thus, the different spins are separate and can be played on different channels. One remaining problem is that the channels are highly correlated: in the Ising model with only 2 states, the 2 channels are exactly reciprocal. Thus there may be phase cancellations in the listening setup that makes it harder to distinguish the channels. Still, the overall impression is clearer than the simple audification, and this approach is the most promising regarding the order of the phase transition. 117

Figure 7.9: A 3-state Potts model cooling down from super- to subcritical state. The three states are recorded as audio channels, shown here with time from left to right. Toward the end, channel 2 dominates.

7.2.6 Granular sonification

In this approach, the data were pre-processed, which allowed for designing less fatiguing sounds. Also, more sophisticated considerations can be included in the sonification design. In a cloud sonification we first sonified each individual spin as a very short sound grain, and played them at high temporal density. A 32x32 lattice (1024 points) can be played within one second, and allowing some overlap, this leaves on the order of 3 ms for each sound grain. One second is a longer than desirable time for going through one entire time instant, but this is simply a trade-off between representing all the available data for that time instant, and moving forward in time fast enough. For bigger lattices, this approach is too slow for practical use. Thus the next step was calculating local mean values. We took random averaged spin blocks in the Ising model6, see figure 7.10, so the data was pre-processed for the soni- fication, and we did not use all available information. At first, for each configuration a

6In this sonification we stayed with the simpler Ising model due to realtime CPU limitations, but the results do transfer to the Potts model. 118

Figure 7.10: Granular sonification scheme for the Ising model. The spatial location of each randomly chosen spin block within the grid determines its spa- tialisation, and its averaged value determines pitch and noisiness of the corresponding grain.

few lattice sites are chosen; then for each site, the average of its neighbouring region is calculated, giving a mean magnetic moment between −1 (all negative) and +1 (all positive); 0 meaning the ratio of spins is exactly half/half. This information is used to determine the pitch and the noisiness of a sound grain. The more the spins in one block are alike, the clearer the tone (either lower or higher), the less alike, the noisier the sound. Location of the block in 3D space is given by spatial position of the sound grain.7 The soundgrains are very short and played quickly after one another from differ- ent virtual regions. With this setting, a three-dimensional gestalt of the local state of a cubic lattice is generated around and above the listener. Without seeing the state of the model, a clear picture emerges from the granular sound

7This spatial aspect can only be properly reproduced with a multi-channel sound system. We adapted the settings for the CUBE, a multi-functional performance space with a permanent multi- channel system at the IEM Graz. Using the VirtualRoom class described in section 5.5, one can also render this sonification for headphones. 119 texture, and also untrained listeners can easily distinguish the phases of the model.

7.2.7 Sonification of self-similar structures

To study a detail aspect of the above approach, we looked at self-similar structures at the point of phase transition by sonification. Music has been considered to exhibit self-similar structures, beginning with (Voss and Clarke(1975, 1978)); later on, the general popularity of self-similarity within chaos theory has also extended to computer music, and the hypothesis that self-similar structures may be audible has led to a lot of experimentation and compositions with such conceptual background. In internal listening tests we tried to display structures on several orders of magnitude in parallel. These were calculated by a blockspin transformation, which returns essentially the spin orientation of the majority of points in a region of the lattice. It was our goal to make such structures of different orders of magnitude recognisable as similarly moving melodies, or as a unique sound stream with a special sound quality.

Figure 7.11: A self similar structure as a state of an Ising model. This is used as a test case for detecting self similarity. Blockspins are determined by the majority of spins of a certain region.

In our design, three orders of magnitude in the Ising model were compared to each other, as shown in figure 7.11. The whole lattice (on the right side - with the least resolved blockspins) was displayed in the same time as a quarter of the middle and as an eighth of the left blockspin spin structure (second on the left side). The original spins are shown on the left. Comparing three simultaneous streams for similarities turned out to be a demanding cognitive task: Trying to follow three streams and comparing their melodic behaviour at the same time is not trivial, even for trained musicians. Thus we experimented with an alternative: the three streams representing different orders of magnitude are interleaved quickly. When the streams are self-similar, one only hears a single (random) stream; as soon as one stream is recognisably different from the others, a triple grouping emerges. While this method works well with simple test data as shown 120 in fig. 7.11, we could not verify self-similarities in noisy data of running spin models. We suspect that self-similar structures do not persist long enough for detection in running models, but for time reasons did not pursue this further.

7.2.8 Evaluation

Domain Expert Opinions A listening test with statistical analysis was not appropriate as there were not enough subjects available familiar with researching spin models. Thus, as a qualitative evaluation we obtained opinions from experts in the field. These were four professors of Theoretical Physics in Graz, who were not directly involved in the sonification designs. They were explained the results and given a few questions on the applicability and usefulness of the results. The overall attitude may be summed up as curious but rather sceptical, even if the opin- ions differed in the details. Asked whether they themselves would use the sonifications, all of them answered they would do so only for didactic reasons or popular scientific talks. The possibility of identifying different phases was acknowledged but was not seen as superior to other methods (e.g. studying graphs of observables, as would be the stan- dard procedure). One subject remarked that, for research purposes, the ’aha-moment’ was missing. This might be due to the fact that the Ising and Potts model have both been studied for decades and are well understood. While the data is mainly thermal noise, there is only little information to extract. Our sonifications reveal no new physi- cal findings for the models we chose. A three dimensional display seems interesting for the experts, even if the dimensions are not experienced explicitly (in the audification approach there is a sequentialisation for displaying one dimension) and the sound grain approach as implemented only applies for three physical dimensions. Another application that was discussed is a quick overview over large data sets: e.g. checking numerical parameters (that there are enough equilibration steps, for instance) or getting a first impression of the order of the phase transition. This seems plausible to all subjects, even if the standard procedure, e.g. a program for pattern recognition, would still be equivalent and - given the familiarity with such tools - preferable to them. The main point of criticism was the idea of a qualitative rather than quantifiable approach towards physics, which is seen as a possible didactics tool but ’not hard science’. General sonification problems were discussed as well: it was noted that visualisation techniques play a more and more important role in science, and that they are tough competitors. Also for state of the art of publishing, sonification is at a disadvantage. Besides this expected scepticism, it can be remarked that all subjects heard immediately the differences in the sound qualities. Metaphors to the sounds came up spontaneously during the introduction, as e.g. boiling water for the point of phase transition. The 121 experts came up with several ideas for future projects to discuss; this kind of interest is an encouraging form of feedback. Conclusions and Possible Future Work Spin models are interesting test cases for studying sonification designs for running mod- els. We implemented both Monte Carlo simulations of Potts and Ising models and sonification variants in SuperCollider3. These models produce dynamically evolving data with their main characteristics being fluctuations of single spins; although analytically well defined, finite computational models can only reproduce a numerical approximation of the predicted behaviour, which has to be interpreted. A number of different sonifications were designed in order to study different aspects of these spin models. We created tools for the perceptualisation of lattice calculations which are extensible to higher dimensions and a higher number of states. They allow both observing running models, and analysing pre-recorded data to obtain a first impression of the order of the phase transition. Experimenting with alternative sonification techniques for the same models, we found differing sets of advantages and drawbacks: Granular sonification of spin blocks gives a reliable classification of the phase the system is in, and allows to observe running simulations, using the random behaviour of spin models. Audification based tools allow us to make use of all the available data, and even track each spin orientation separately in parallel. This tool is used to study the order of the phase transition. Additionally, we worked on sonifications of self similar structures. With this study, sonification was shown to be an interesting complementary data repre- sentation method for statistical physics. Useful future directions for extending this work would include increased data quality and choices of different input models, which would lead to classification tools for phase transitions that allow studying models of higher dimensionality. Continued work in this direction could lead to applications in current re- search questions in the field of computational physics. The research project QCDAudio hosted at IEM Graz with SonEnvir participant Kathi Vogt as lead researcher will explore some of these directions. Chapter 8

Examples from Speech Communication and Signal Processing

The Signal Processing and Speech Communication Laboratory at TU Graz focuses on research in the area of non-linear signal processing methods, algorithm engineering and applications thereof in speech communication and telecommunication. After investigat- ing sonification approaches to the analysis of stochastic processes and wave propagation in ultra-wide-band communication (briefly mentioned in de Campo et al.(2006a)), the focus for the last phase in SonEnvir was on the analysis of time series data. In signal processing and speech communication, most of the data under study are se- quences of values over time. There are many properties of time series data that interest the researcher. Besides analysis in the frequency domain, the statistical distribution of values provides important information about the data at hand. With the Time Series Analyser, we investigated the use of sonification in analysing the statistical properties of amplitude distributions in time series data. From the domain science’s point of view, this can be used as a method for the classification of signals of unknown origin, or for the classification of surrogate data to be used in experiments in telecommunication systems.

8.1 Time Series Analyser1

The analysis of time series data plays a key role in many scientific disciplines. Time series may be the result of measurements, unknown processes or simply digitised signals of a variety of origins. Although usually visualised and analysed through statistics, the inherent relationship to time makes them particularly suitable for a representation by means of sound. 1This section is based on the SonEnvir ICAD paper Frauenberger et al.(2007).

122 123 8.1.1 Mathematical background

The statistical analysis of time series data is concerned with the distribution of values without taking into account their sequence in time. As we will see later, changing the sequence of values in a time series completely destroys the frequency information while keeping the statistical properties intact. The most well known statistical properties of time series data is the arithmetic mean (8.1) and the variance (8.2).

n 1 X x = x (8.1) n i i=1 n 1 X σ2 = (x − x)2 (8.2) n i i=1 However, higher order statistics provide more properties of time series data, describing the shape of the underlying probability function in more detail. They all derive from the statistical moments of a distribution defined by

n 0 X n µn = (xi − α) P (x) (8.3) i=1 where n is the order of the moment, α the value around which the moment is taken and P (x) the probability function. The moments are most commonly taken around the mean, which is equivalent to the first moment µ1. The second moment around the mean (or second central moment) is equivalent with the variance σ2 and hence, the squared standard deviation σ. Higher order moments define the skewness and kurtosis of the distribution. The skewness is a measurement for the asymmetry of the probability function, meaning a distribution has high skewness if its probability function has a more pronounced tail toward one end than to the other. The skew is defined by

µ3 γ1 = 3 (8.4) 2 µ2 with µi being the i − th central moment. The kurtosis describes the ’peakedness’ of a probability function; the more pronounced peaks there are in the probability function, the higher the kurtosis of the distribution. It is defined by

µ4 β2 = 2 (8.5) µ2 Both values distinguish time series data and are significant properties in signal processing. From the SDSM point of view, the inherent time line and the typically large numbers of data values in time series data suggest the use of the most direct approach to auditory perceptualisation - audification. When interpreted as a sonic waveform the statistical 124 properties of time series data become acoustical dimensions, which may be perceived: The variance corresponds directly to the power of the signal, and hence (though non- linearly) to its perceived loudness. The mean, however, is nothing more than an offset and is not perceivable. The question of interest is whether the skewness and the kurtosis of signals can be related to perceptible dimensions as well.

8.1.2 Sonification tools

In order to investigate the statistical properties of time series data by audification, we first developed a simple tool that allows for defining arbitrary probability functions for noise. Subsequently, we built a more generic analysis tool that makes it possible to analyse any kind of signal. This tool was also used as the underlying framework for the experiment described in section 8.2.

8.1.3 The PDFShaper

The PDFShaper is an interactive audification tool that allows users to draw probability functions and listen to the resulting distribution as an audification in real-time. Figure 8.1 shows the user interface. PDFShaper provides four graphs (top down): the probability function, the mapping func- tion, the measured histogram and the frequency spectrum of the time series synthesised as specified by the probability function. The tool allows the user to interactively draw in the first graph to create different kinds of amplitude distributions. It then calculates a mapping function which is defined by 1 Z x C(x) = = P (t)dt (8.6) g(x) 0 where C(x) is the cumulative probability function and g(x) is a mapping function that if applied to a uniform distribution y produces values according to the probability function P (t). This mapping function essentially shapes values from a uniform distribution to any desired probability function P (t). In the screenshot shown, the probability function is drawn into the top graph as a shifted exponential function. After applying the mapping function shown in the second graph to white noise, the third graph shows the real-time histogram of the result. It approximately resembles the target probability function. Note that both skew and kurtosis are relatively high in this example as the probability function is shifted to the right and has a sharp peak. 125

Figure 8.1: The PDFShaper interface

8.1.4 TSAnalyser

The TSAnalyser is a tool to load any time series data and analyse its statistical properties. Figure 8.2 shows the user interface. Besides providing statistical information about the file loaded (aiff format) it shows a histogram and a spectrum. Its main feature is to be able to ”scramble” the signal. 126

Figure 8.2: The TSAnalyser interface

That is, it randomly re-orders the values in the time series and hence, destroys all spectral information. When analysing amplitude distributions, the spectral information is often distracting. Scrambling a signal will result in a noise-like sound with the same statistical properties as the original. In the screenshot the loaded file is a speech sample that comes with every SuperCollider installation. When scrambled, the spectrum at the bottom shows an almost uniform distribution in the frequency domain. Both PDFShaper and TSAnalyser are implemented in SuperCollider, and available as part of the SonEnvir Framework by svn here2.

2https://svn.sonenvir.at/svnroot/SonEnvir/trunk/src/Framework/ 127 8.2 Listening test

The experiment described here was designed to investigate whether the higher order statistical properties of arbitrary time series data are perceptible when rendered by au- dification. If so, what are the perceptual dimensions that would correlate to these properties, and what are the just noticeable difference levels?

8.2.1 Test data

The first challenge in designing the experiment was to create appropriate data. They should not contain any spectral information and the statistical properties should be fully controllable, ideally independently. Unfortunately, it is a non-trivial task to define probability functions with certain statistical moments, as this is an ill-defined problem. We settled on a random number generator for the Levi skew alpha-stable distribution Wikipedia(2007). It was chosen because it features parameters that directly control the resulting skew and kurtosis which also can be made atypically high. It is defined by the probability function

1 Z + inf f(x; α, β, c, µ) = φ(t)e−itxdt (8.7) 2π − inf φ(t) = eitµ−|ct|α(1−iβsign(t)Φ) (8.8) πα Φ = tan( ) (8.9) 2 Where α is an exponent, β directly controls the skewness and c and µ are scaling parameters. There is no analytic solution to the integral, but there are special cases in which the distribution behaves in specific ways. For example, for α = 2 the distribution reduces to a Gaussian distribution. Fortunately, the Levi distribution was implemented as a number generator in the GNU Scientific Library GSL, see GSL Team(2007). It allows for generating sequences of numbers of any length for a distribution determined by providing the α and β parameters. For the experiment we generated 24 signals with skew values ranging from -0.19 to 0.25 and kurtosis ranging from 0.17 to 14. It turned out to be impossible to completely de- couple skew from kurtosis. So, we decided to generate two sets, one that has insignificant changes in skew, but a range in kurtosis of 0.16 to 14, while the other set covered the full range for skew and 0.15 to 5 for kurtosis. All signals were normalised to have a variance of 0.001 and were 3 seconds long (at a samplerate of 44.1 kHz) with 0.2 seconds fade-in and fade-out times. 128 8.2.2 Listening experiment

The experiment was designed as a similarity listening test. Participants were listening to sequences of three signals and had to select the two signals they perceived as being most similar. Each sequence was composed of the signal under investigation (each of the 24), a second randomly chosen signal out of the 24, and the first signal scrambled; the three signals were in random order. It was pointed out to participants that they will not hear two exactly identical sounds within the sequence, but they were asked to select the two that sounded most similar. The signal under investigation and its scrambled counterpart were essentially different signals, but shared identical statistical properties. It was not specified which quality of the sound they should listen for to make this decision. This and the scrambling was done to make sure that participants focus on a generic quality of the noise rather than specific events within the signals. After a brief written introduction into the problem domain and the nature of the exper- iment, participants started off with a training phase of three sequences to learn the user interface. For this training phase, the signals with the largest differences in skew and kurtosis were chosen to give people an idea of what to expect. Subsequently, each of the sets were played; Set one with 9 sequences, Set two with 15. The sequence of the sets was altered with each participant. Participants were also able to replay the sequence as often as they wished and adjust the volume to their taste. Figure 8.3 shows the user interface used.

Figure 8.3: The interface for the time series listening experiment.

A post-questionnaire asked for the sound quality participants used to distinguish the signals and asked them to assign three adjectives to describe this quality. Furthermore, participants were asked whether participants could tell any difference between the sets, and whether they felt there was any learning effect, i.e., whether the task became easier during the experiment. 129 8.2.3 Experiment results

Eleven participants took part in the experiment, most of them working colleagues or students at the institute. Four participants were members of the SonEnvir team and had more substantial background on the topic which, however, did not seem to have any impact on their results. The collected data shows that there is a significant increase in the probability of choosing the correct signals as the difference in kurtosis and skew increased. Figure 8.4 shows the average probabilities in four different ranges of ∆ kurtosis. The skew in this set was nearly constant (±0.001), so the resulting difference in correct answers is related to the change in ∆ kurtosis. While up to a difference of 5 in kurtosis the probability

Figure 8.4: Probability of correctness over ∆ kurtosis in set 1 is only insignificantly higher than 0.333 (the probability of random answers), and even decreases, there is a considerable increase thereafter, topping at over 70% at differences of around 11. This indicates that 5 is the threshold for just noticeable differences for kurtosis. This is also supported by the results from set 2 as shown in figure 8.5. For skewness the matter was more difficult as we had no independent control over it. Although the data from set 2 suggest that there is an increase in probability with increasing difference in skew (as shown in figure 8.6), this might also be related to the difference in kurtosis. Looking at the probability of correctness over both, the difference in kurtosis and the 130

Figure 8.5: Probability of correctness over ∆ kurtosis in set 2

Figure 8.6: Probability of correctness over ∆ skew in set 2 difference in skew (as in figure 8.7) reveals that it is unlikely that the increase is related to the change in ∆ skewness. While in every spine in which ∆ skew is constant the 131 probability increases with increasing ∆ kurtosis, this is not the case vice versa.

Figure 8.7: Probability of correctness over ∆ skew and ∆ kurtosis in set 2

Summarising, we found evidence that participants could reliably detect changes in kur- tosis greater than 5, but we did not find enough proof for the case of skewness. This may indicate that we need to use a different dataset which has bigger differences in skew while having small values for the kurtosis. However, for this another family of distributions must be found. The number of times participants used the replay option seemed to have no impact on their performance. Figure 8.8 shows the number of replays of all data points over ∆ kurtosis. Red crosses indicate correct answers, black dots incorrect answers. Although participants replayed the sequence more often when the difference in kurtosis was small, there is no evidence that they were more successful when using more replays. The answers to the post-questionnaire must be seen in the light of the data analysis above. The quality participants assessed to drive their decisions must be linked to the kurtosis rather than skewness in the signal. The most common answers for this quality were crackling and the frequency of events. Others included roughness and spikes. However, some participants also stated that they heard different colours of noise and other artefacts related to the frequency spectrum. This is a common effect when being exposed to noise signals for a longer period of time. Even if the spectrum of noise is not changing at all (as in our case), humans often start to imagine hearing tones 132

Figure 8.8: Number of replays over ∆ kurtosis in set 2 and other frequency related patterns. Asked for adjectives to describe the quality the participants provided cracking, clicking, sizzling, annoying, rhythmic, sharp, rough and bright/dark. In retrospect, this correlates nicely with the kurtosis being the ’peakedness’ of the probability function. There was no agreement over which set was easier. Most participants said there was hardly any difference while some would state the one or the other. Finally, on average people felt that there was no learning curve involved and the examples were short enough for them not to get too tired over listening to them.

8.2.4 Conclusions

In this section we presented an approach for analysing statistical properties of time series data by auditory means. We provided some background on the mathematics involved and presented the tools for audification of time series data that were developed. Subse- quently, we described a listening test designed to investigate the perceptual dimensions that would correlate with higher order statistical properties like skew and kurtosis. We discussed the data chosen and the design of the experiment. The results show that there is evidence that participants improved in distinguishing noise signals as the dif- ference in kurtosis increased. The data suggests that in this setting the just noticeable difference was 5. However, for skew we were not able to find similar evidence. In a 133 post-questionnaire we probed for the qualities that participants used to distinguish the signals and obtained a set of related adjectives. Future work will have to investigate why there was nothing to be found for skewness in the signals. It might have been the case that our range of values did not allow for segregation by skew, and a different source for data will have to be found to have independent control over skew. However, it might also be the case that skew is not perceivable in direct audification and a different sonification approach has to be chosen to make this property perceptible. In SDSM terms, the listening experiment respected the 3 second echoic memory time limit, maximising the number of data points to fit into that time frame by audifying at a samplerate of 44.1 kHz. Chapter 9

Examples from Neurology

9.1 Auditory screening and monitoring of EEG data

This chapter describes two software implementations for EEG data screening and realtime monitoring by means of sonification. Both have been designed in close collaboration with our partner institution, the University Clinic for Neurology at the Medical University Graz. Both tools were tested in depth with volunteers, and then tested with the expert users they are intended for, i.e. neurologists who work with EEG data daily. In the course of these tests, a number of improvements to the designs were realised; both the tests and the final versions of the tools are described in detail here. This scope of reported work is intended to provide an integrated description and analysis of all aspects of the design process from sonification design issues, interaction choices, user acceptance, to steps towards clinical use. This work is described with much more neurological background in the PhD thesis by Annette Wallisch (Wallisch(2007), in German). This chapter is based on a SonEnvir paper for ICAD 2007 (de Campo et al.(2007)), and this work is also briefly documented online in the SonEnvir data collection1, with accompanying sound examples.

9.1.1 EEG and sonification

As the general background for EEG and sonification is covered extensively in a number of papers (Baier and Hermann(2004); Hermann et al.(2006); Hinterberger and Baier (2005); Mayer-Kress(1994); Meinicke et al.(2002)), it is kept rather brief here. EEG is short for electroencephalogram, i.e. the registration of the electrical signals coming from the brain that can be measured on the human head. There are standard systems where to locate electrodes on the head, called montages; e.g., the so-called 10-20 system, which spaces electrodes at similar distances over the head (see Ebe and

1 http://sonenvir.at/data/eeg/

134 135

Homma(2002) and many other EEG textbooks). The signal from a single electrode is often analysed in terms of its characteristic frequency band components: The useful frequency range is typically given as 1-30 Hz, sometimes extended a little higher and lower. Within this range, different frequency bands have been associated with particular activities and brain states; e.g., the ’alpha’ range is between 8 and 13 Hz, associated with a general state of relaxedness, and non-activity of the brain region for visual perception; thus alpha activity is most prominent with eyes closed. For both sonification designs presented, we split the EEG signal into frequency ranges which closely correspond to the traditional EEG bands2, as shown in table 9.1.

Table 9.1: Equally spaced EEG band ranges.

EEG band name frequency range deltaL(ow) 1 - 2 Hz deltaH(igh) 2 - 4 Hz theta 4 - 8 Hz alpha (+ mu) 8 - 16 Hz beta 16 - 32 Hz gamma 32 - 64 Hz

9.1.2 Rapid screening of long-time EEG recordings

For a number of neurological problems, it is standard practice to record longer time stretches of brain activity. A stationary recording usually lasts more than 12 waking hours; night recordings are commonly even longer, up to 36 hours. For people with so-called ’absence’ epileptic seizures (often children), recordings with portable devices are made over similar stretches of time. These recordings are then visually screened, i.e. looked through in frames of 20-30 seconds at a time; this process is both demanding and slow. For the particular application toward ’absences’, rapid auditory screening is ideal: these seizures tend to spread over the entire brain, so the risk of choosing only few electrodes to screen acoustically is not critical; furthermore, the seizures have quite characteristic features, and are thus relatively easy to identify quickly by listening. For more gen- eral screening, finding time regions of interest quickly (by auditory screening) potentially reduces workload and increases overall diagnostic safety. With visual and auditory screen- ing combined, the risk of failing to notice important events in the recorded brain activity

2The alpha band we employ is slightly wider than the common 8-13 Hz; we merge it with the slightly higher mu-rhythm band to maintain equal spacing. 136 is quite likely reduced.

9.1.3 Realtime monitoring during EEG recording sessions

A second scenario that benefits from sonification is realtime monitoring while recording EEG data. This is a long-term attention task: an assistant stays in a monitor room next to the room where the patient is being recorded; s/he watches both a video camera view of the patient, and the incoming EEG data on two screens. In the event of atypical EEG activity (which must be noticed, so one can intervene if necessary), a patient may or may not show peculiar physical movements. Watching the video camera, one can easily miss atypical EEG activity for a while. Here, sonification is potentially very useful, because it can alleviate constant atten- tion demands: One can easily habituate to a background soundscape, which is known to represent ’everything is normal’. When changes in brain activity occur, the sound- scape changes (in most cases, activity is increased, which increases both volume and brightness), and this change in the realtime-rendered soundscape automatically draws attention. A sonification design that aims to render EEG data in real time is also useful for studying brain activity as recorded by EEG devices at its natural speed: One can easily portray activity in the traditional EEG frequency bands acoustically; as many of the phenomena are commonly considered to be rhythmical phenomena, auditory presentation is partic- ularly appropriate here, see Baier et al.(2006). Realtime uses of biosignals have other applications too, see e.g. Hinterberger and Baier(2005); Hunt and Pauletto(2006).

9.2 The EEG Screener

9.2.1 Sonification design

For rapid EEG data screening, there is little need for an elaborate sonification design. As the signal to be sonified is a time signal, and a signal speed of several 10000s of points per seconds is deemed useful for screening, straightforward audification is the obvious choice recommended by the Sonification Design Space Map. Not doing any other processing allows for keeping the rich detail of the signals entirely intact. With common EEG sampling rates around 250 Hz, a typical speedup factor is 60x faster than real time, which transposes our center band (alpha, 8-16Hz) to 480-960 Hz, well in the middle of the audible range. For more time resolution, one can go down to 10x, or for more speedup, up to 360x. See Figure 9.1 for locations on the Sonification Design Space Map. 137

Figure 9.1: The Sonification Design Space Map for both EEG Players. As there is no total size for EEG files (they can be anything from a few minutes to 36 hours and more), Data Anchors are given for one minute and for one hour (center, and far right). The labels Scr x10, Scr x60, and Scr x360 shows the map locations for minimum, default, and maximum settings of speedUp, i.e. the time scaling of the EEGScreener (bottom right). The labels RTP 1band and RTP 6bands show the locations for a single band and all six bands of the EEGRealtimePlayer. Note that the use of two audio channels moves both of these designs inwards along the ’number of streams’ axis, which is not shown here for simplicity.

This allows for wide ranges of time scales of local structures in the data to be put into the optimum time window (the ca. 3 second window of echoic memory, see section 5.1 and de Campo(2007b)), while keeping the inner EEG bands well in the audible range; if needed, one can compensate for reduced auditory sensitivity to the outer bands by raising their relative amplitudes. A lowpass filter for the EEG signal is available from 12 to 75 Hz, with a default value at 30 Hz, to provide the equivalent of visual smoothing used in EEG viewer software. Our users wanted that feature, and it is a simple way to reduce higher band activity, which is mostly considered noise (from a visual perspective that is). A choice is provided between the straight audified signal, and a mix of six equal- bandwidth layers, which can all be individually controlled in volume. This allows both for focused listening to individual bands of interest, and for identification of the EEG band in which a particular audible component occurs. A further reason to include this 138

Figure 9.2: The EEGScreener GUI. The top rows are for file, electrodes, and time range selection. Below the row for playback and note-taking elements are the playback parameter controls, and band filtering display and controls. band-splitting was to introduce the concept in a simpler form, such that users could transfer the idea to their understanding of the realtime player.

9.2.2 Interface design

The task analysis for the Screener demanded that a graphical user interface be simple to use (low-effort, little training needed), fast, and to provide for keeping reproducible results of screening sessions. Furthermore, it should provide choices of what to listen to, and visual feedback of what exactly one is hearing, and how. The GUI elements are similar to sound file editors (which audio specialists are familiar with, but EEG specialists usually are not). File, electrode, and range selection 139

The button Load EDF is for selecting a file to be screened. Currently, only .edf3 files are supported, but other formats are easy to add if needed. The text views next to it (top line) provide file data feedback: file name, duration, and montage type the file was recorded with4. The button Montage opens a separate GUI for choosing electrodes by location on the head (see figure 9.3).

Figure 9.3: The Montage Window. It allows for electrode selection by their location on the head (seen from above, the triangle shape on top being the nose). One can drag the light gray labels and drop them on the white fields ’Left’ and ’Right’.

The popup menus Left and Right let users choose which electrode to listen to on which of the two audio channels. Like many soundfile editors, the signal views Left and Right show a full-length overview of the signal of the chosen electrodes. During screening, the current playback position is indicated by a vertical cursor. The range slider Selection and the number boxes Start, Duration, End show the current selection and allow for selecting a range within the entire file to be screened. The number box Cursor shows the current playback position numerically. The signal views Left Detail and Right Detail show the waveform of the currently selected electrodes zoomed in for

3A common format for EEG files, see http://www.edfplus.info/ 4As edf files do not store montage information, this is inferred from the number of EEG channels in the file; at our institution, all the raw data montage types have different numbers of channels. 140 the current selection. Playback and note taking The buttons Play, Pause, Stop start, pause, and stop the sound. The button Looped/No Loop switches between once-only playback and looped playback (with a click to indicate when the loop restarts). The button Filters/Bypass switches playback between Bypass mode (the straight audified signal, only low-pass-filtered), and Filters mode, the mixable band-split signal. The button Take Notes opens a text window for taking notes during screening. The edf file name, selected electrodes and time region, and current date are pasted in as text automatically. The button Time adds the current playback time at the end of the notes window’s text, and the button Settings adds the current playback settings (see below) to the notes window text. To let the user concentrate on listening while screening a file, it is possible to stay on the notes window entirely: Key shortcuts allow for pausing/resuming playback (e.g. to type a note), for adding the current time as text (so one can take notes for a specific time), and for the current playback settings as text. Playback Controls These control the parameters of the screener’s sound synthesis. speedUp sets the speedup factor, with a range between 10-360; the default value of 60 means that one minute of EEG is presented within one second. Note that this is straightforward tape-speed acceleration, which preserves full signal detail. The option to compare different time-scalings of a signal segment allows for learning to distinguish mechanical (electrode movements) and electrical artifacts (muscle activity) from EEG signal components. lowPass sets the cutoff frequency for the lowpass filter, range be- tween 12 and 75 Hz, with a default of 30 Hz. clickVol sets the volume of the loop marker click, and volume sets the overall volume. In Bypass mode, only the meter views are visible in this section, and they display the amount of energy present in each of the six frequency bands (deltaL, deltaH, theta, alpha, beta, gamma). In Filters mode, the controls become available, and one can raise the level of bands one wants to focus on, or turn down bands that distract from details in other bands. The buttons All On / All Off allow for quickly resetting all levels to defaults.

9.3 The EEG Realtime Player

The EEGRealtimePlayer allows listening into details of EEG data in real time (or up to 5x faster when playing back files), in order to follow temporal events in or near their 141 original rhythmic contour. This design (and its eventual distribution as a tool) has been developed in two stages: Stage one is a data player, which plays recorded EEG data files at realtime speed with the same sonification design (and the same adjustment facilities) as the final monitor application. This allows for familiarising users with the range of sounds the system can produce, for experimenting with a wide variety of EEG recordings, and for finding settings which work well for a particular situation and user. This stage is described here. Stage two is an add-on to the software used for EEG recording, diagnosis, and admin- istration of patient histories at the institute. Currently, this stage is implemented as a custom version of the EEG recording software which simulates data being recorded now (by reading a data file), and sending the ’incoming’ data by network on to a special version of the Realtime player (i.e. the sound engine and interface). Here, the incoming data is sonified with the same approach as in the player-only version. Eventually, this second program is meant be implemented within the EEG software itself.

9.3.1 Sonification design

The sonification design for real time monitoring is much more elaborate than the screener. It was prototyped by Robert H¨oldrichin MATLAB, and subsequently adapted and im- plemented for realtime interactive use in SC3 by the author. For a block diagram, see fig. 9.4. The EEG signal of each channel listened to is split into six bands of equal relative band- width (one octave, 1-2, 2-4, ... 32-64 Hz). Each band is sonified with its own oscillator and a specific carrier frequency: based on a user-accessible fundamental frequency base- Freq, the carriers are by default multiples of baseFreq by integer numbers, 1, 2, ... 6. If one wants to achieve more perceptual separation between the individual bands, one can deform this overtone pattern with a stretch factor harmonic, where 1 is pure overtone tuning: carF req = baseF req ∗ i ∗ harmonici−1 (9.1)

The carrier frequency in each band is modulated with the band-filtered EEG signal, thus creating a representation of the signal shape details as deviation from center pitch. The amplitude of each oscillator band is determined by the amplitude extracted from the corresponding filter-band, optionally stretched by an expansion factor contrast; this creates a stronger foreground/background effect between bands with low energy and bands with more activity. For realtime monitoring as a background task, a second option for emphasis exists: high activity levels activate an additional sideband modulation at carFreq * 0.25, which creates a new fundamental frequency two octaves lower. This should be difficult to miss even when not actively attending. 142

Figure 9.4: EEG Realtime Sonification block diagram. 143

Figure 9.5: The EEG Realtime Player GUI. Note the similarities to the EEGScreener GUI; the main difference is the larger number of synthesis control parameters.

Finally, for file playback, crossing the loop point of the current selection is acoustically marked with a bell-like tone.

9.3.2 Interface design

Most elements (buttons, text displays, signal views, notes window) have the same func- tions as in the EEGScreener. The main difference to the EEGScreener is that there are many more playback controls, since the sonification model (as described above) is much more complex. The Playback controls are ordered by importance from top to bottom: contrast ranges from 1-4; values above 1 expands the dynamic range, making active 144 bands louder and thus moving them to the foreground relative to average-activity bands. For background monitoring, levels between 2-3 are recommended. baseFreq is the fundamental frequency of the sonification, between 60-240 Hz; this can be tuned to user taste - and our users have in fact expressed strong preferences for their personal choice of baseFreq. freqMod is the depth of frequency modulation of the carrier for each band. At 0, one hears a pure harmonic tone with varying overtone amplitudes, at greater values, the pitch of the band is modulated up and down, driven by the filtered signal of that band. Thus the signal details of the activity in that band are rendered in high perceptual resolution. A value of 1 is normal deviation. emphasis fades in a new pitch two octaves below baseFreq for very high activity levels; this can be used for extra emphasis in background monitoring. harmonic is the harmonicity of the carrier frequencies: A setting of 1 means purely harmonic carrier frequencies, less compresses the spectrum, and more expands it; this can be used to achieve better perceptual band separation. clickVol sets the volume of the loop marker click, volume sets the overall volume of the sonification, and speed controls an optional speedup factor for file playback, with a range between 1-5, 1 being realtime; in live monitoring mode, this control is disabled. Band Filter Controls and Views The buttons All On and All Off allow for setting all levels to medium or zero. The meter views show the amount of energy present in each of the six frequency bands, and the sliders next to them set the volume of each frequency band.

9.4 Evaluation with user tests

9.4.1 EEG test data

For development and testing of the sonification players described, a variety of EEG recordings - containing typical epileptic events and seizures - was collected. This database was assembled at the Department for Epileptology and Neurophysiological Monitoring (University Clinic of Neurology, Medical University Graz), by using the in-house archive system. It contains anonymous data of currently or recently treated patients. For the expert users tests, three data examples were chosen, suited for each player’s special purpose. For the Screener, rather large data sets were selected, to test with a realistic usage example. Two measurements of absences and one day/night EEG with seizures localized in the temporal lobe were prepared. The Realtime Player was tested with three short data files; one a normal EEG (containing eye movement artefacts and 145 alpha waves), and two pathological EEGs (generalized epileptic potentials, and fronto- temporal seizures). The experts we worked with considered the use of audition in EEG-diagnostics very unusual. We expected them to find it difficult to associate sounds with the events, so they did some preliminary sonification training: For all data examples, they could look at the data with their familiar EEG viewer software after having listened first, and try to match what they had heard with the visual graphs familiar to them.

9.4.2 Initial pre-tests

An initial round of tests was done to get a first impression of usability, and data appropri- ateness, which also contained experimental tasks (learning to listen). In order to obtain independent and unbiased opinions, two interns were invited to test the first versions of the screener and the realtime player by listening through the entire prepared database at their own pace. They were instructed to take detailed notes of the phenomena they heard (including inventing names for them), and where in which files; they spent roughly 40 hours on this task. The documentation of their listening experiments was then ver- ified in internal re-listening and testing sessions. After these pre-tests, we decided to reduce some parameter ranges to prevent users from choosing too extreme settings, and we chose a smaller number of data sets for the second test round with expert users.

9.4.3 Tests with expert users

As the eventual success of these players depends on acceptance by the users in a clinical setting, it was essential to do an evaluation with medical specialists. This was done by means of two feedback trials; using the results of the primary expert test round, the players were then improved in many details. For both players we made pre/post- comparisons of user ratings between the different versions. Even though we tested with the complete potential user group at our partner institution, the test group is rather small (n=4); thus we consider the tests, and especially the open question/personal interviews section, as more qualitative than quantitative data. To prepare the four specialists for their separate test sessions, they were introduced to the new aspects of data evaluation and experience by sonification in a group session. For each EEG player a separate test session was scheduled to avoid ’listening overload’ and potential confusion. Questionnaire The questionnaire contained the following 11 scales: The ratings to give for each statement ranged from 1 (strongly disagree) to 5 (strongly 146

Table 9.2: Questionnaire scales for EEG sonification designs

1 Usability 2 Clarity of interface functions 3 Visual design of interface 4 Adjustability of sound (to individual taste) 5 Freedom of irritation (caused by sounds) 6 Good sound experience (i.e. pleasing) 7 Allows for concentration 8 Recognizability of relevant events in data by listening 9 Comparability (of observations) with EEG-Viewer software 10 Practicality in Clinical Use (estimated) 11 Overall impression (personal liking)

agree). In addition to the 11 standardized questions, space for individual documentation and description was provided. Moreover, an open question asked for further comments, observations, and suggestions.

Results of first expert tests

This initial round of tests resulted in a number of improvements in both players: Elab- orate EEG waveform display and data range selection was added to both; the visual layout was unified to emphasize elements common to both players; and the screener was extended with band filtering, which is both useful in itself, and a good mediating step toward the more complex realtime sonification design.

9.4.4 Analysis of expert user tests EEG Screener 1 vs. 2

Optimizing the interface and interaction possibilities for version 2 of the Screener im- proved most of its ratings sustantially: it was considered to offer more comfortable use (+1) and more attractive visual design (+1). The sound experience for the medical specialists has improved somewhat (+0.5), while the freedom of irritation experienced improved very much (+2.0). While all other criteria improved substantially, recognizabil- ity of events, comparability with viewer software, and clinical practicality received lower ratings (between -0.5 and -0.25). We suspect that the better rating in the first test round may have been enthusiasm about the novelty of this tool. Thus, personal conversation with the expert users after the tests showed how strongly opinions differed: One user did not feel ’safe’ and comfortable with the screener and could not trust his own hearing skills enough to discriminate relevant information from (technical) artefacts by listening. 147

Figure 9.6: Expert user test ratings for both EEGScreener versions.

By contrast, the three others were quite relaxed and felt positively reassured to have done their listening tasks properly and effectively. Furthermore, the users probably were less motivated in comparing the EEG viewer to the ’listening result’ (which was asked in one question), as they had done that carefully in the first tests already. Overall, all users reported much higher satisfaction with version 2 of the screener (+1). The answers in the open comments section can be summarized as follows: All users confirmed better usability, design, clarity and transparency of version 2. Some improve- ments were suggested in the visualization of the selected EEG channels, in particular when larger files are analysed. Moreover, integration of the sonification into the real EEG viewer would be appreciated a lot. A plug-in version of the player for the EEG- Software used (NeuroSpeed by B.E.S.T. medical) was already in preparation before the tests; in effect, the expert users confirmed its expected usefulness.

9.4.5 Analysis of expert user tests - RealtimePlayer 1 vs. 2

The mean estimation of the second realtime player version shows a positive shift in nearly all scales of the questionnaire. Moreover, the range of the ratings is smaller than before, so the answers were more consistent. The best ratings were given for visual design (+1), 148

Figure 9.7: Expert user test ratings for both RealtimePlayer versions. adjustability of sound (+1) and comparability to viewer (+1.5), all estimated with ’good to very good’. The ’overall impression’ was now estimated as ’good’ (+1), as well as ’usability’ (+0.5), ’clarity of interface’ (+0.5), and ’good sound experience’ (+1). The aspects ’recognizability of relevant EEG events’ ( +1) and ’practical application’ (+1) are estimated similarly satisfying. The only item that remains at the same mean rating is ’freedom of irritation’, estimated as a little better than average. The same rating was given for ’allowed concentration’ (+1.5), which has improved very much. Probably, these two aspects correspond to each other: in spite of the improved control of irritating sounds and a learning effect, the users were still untrained in coping with the rather complex sound design. This sceptical position was taken in particular by two users, affecting items 5 to 9. All in all, the ratings indicate good progress in the realtime player’s design. This may well have been influenced the strong time restraints on these tests: As our experts have very tight schedules in clinical work, it has been difficult to obtain enough time for reasonably unhurried, pressure-free testing. Comparing the ratings across the two first versions, the Realtime Player 1 was not rated as highly as the Screener 1. We attribute this to the higher complexity of the sound design (which did not come across very clearly under the time pressure given), the related non-transparency of some parameter controls, and to ensuing doubts about the practical 149 benefit of this method of data analysis. Only the rating for irritation is better than Screener 1, which indicates that the sound design is aesthetically viable for the users. All these concerns were addressed in the Realtime Player 2: In order to clarify the band- splitting technique, GUI elements indicate the amount of power present in each band, and allow for interactive choice of which bands to listen to; less parameter controls are made available to the user5, with simpler and clearer names. Much more detailed help info pages are also provided now. Finally, band-splitting (adapted to audification) was integrated into the Screener 2 as well, which gives users a clearer understanding of this concept across different sonification approaches.

9.4.6 Qualitative results for both players (versions 2)

For both players, all users mentioned easy handling (usability), good visual design, and transparency of functionality. More positive comments on the Screener were ’higher creativity’ (by using the frequency controls) and that irritating sounds have nearly disap- peared. One user explained this by a training effect, and we agree: It seems that as users learn to interpret the meaning of ”unpleasant” sounds (such as muscle movements), the irritation disappears. Regarding the realtime player, users mentioned good visual corre- lation with the sound, because of the new visual presentation of EEG on the GUI. One user noted that acoustical side-localisation of the recorded epileptic seizure works well. Further improvements were suggested: For both players, the main wish is synchronization of sound and visual EEG representation (within the familiar software): In case of realtime monitoring, this would allow to better compare the relevant activities. As far as screening is concerned, the visual representation of larger files on the GUI was considered not very satisfying. For the realtime player, presets for the complex parameters in accordance to specific seizure types were suggested as very helpful. Moreover, usability could still be improved a bit more (however, no specific wishes were given), as well as irritating sounds should be further decreased. This wish may also be due to the fact that the offered parameter- controls for reducing disturbing sounds may not have been used fully. This can likely be addressed by more training.

9.4.7 Conclusions from user tests

According to the experts’ evaluation of the EEG Screener, intensive listening training will be essential for its effective use in clinical practice - in spite of improved usability and

5Version 1 had some visible controls mainly of interest to the developer. 150 acceptance of the second version. As the visual mode in clinical EEG diagnostics and data analysis is still dominant, for the widespread use of sonification tools an alternative time and training management is necessary. After such training, our new tools may well help to successively reduce effort and time in data analysis, decrease clinical diagnostic risk, and in the longer term, offer new ways for exploring EEG data.

9.4.8 Next steps

A number of obvious steps could be taken next (given followup research projects): For the Realtime Player, the top priority would be integration of the network connec- tion for realtime monitoring during EEG recording sessions. Then, user tests in real world long-term monitoring settings can be conducted. These tests should result in recommended synthesis parameter presets for different usage scenarios. For the sound design, we have experimented with an interesting variant which empha- sizes the rhythmic nature of the individual EEG bands more (see Baier et al.(2006); Hinterberger and Baier(2005)). This feature can be made available as an added user parameter control (’rhythmic’), with a value of 0 maintaining the current sound design, and 1 accentuating the rhythmic features more strongly. For both Realtime Player and Screener, eventual integration into the EEG administration software used at our clinic was planned; however, this can only be done after another round of longer-term expert user tests, and when the ensuing design changes have been finalised.

9.4.9 Evaluation in SDSM terms

The main contributions to the Sonification Design Space Map concept resulting from work on the EEG players were the following lessons: Adopt domain concepts and terminology wherever possible (band splitting) make interfaces as simple and user-friendly as possible provide lots of visual support for what is going on (here, show band amplitudes) provide opportunities to understand complex representations interactively by providing options to take them apart (here, listening to single bands at a time) give users enough time to learn (this did not happen for the Realtime Player). Chapter 10

Examples from the Science by Ear Workshop

For more background on the Science By Ear workshop, see section 4.2, and here1. The dataset LoadFlow and the experiments made with it in the SBE Workshop are instructive basic examples; they are given as first illustrations of the Sonification Design Space Map in section 5.1. Other SBE datasets and topics (EEG, Ising, UltraWideband, Global Social Data) were elaborated in more depth in mainstream SonEnvir research activities, and are thus cov- ered in the examples from the SonEnvir research domains. The remaining two datasets, RainData, and Polysaccharides, are described briefly here for completeness.

10.1 Rainfall data

These data were provided and prepared by Susanne Schweitzer and Heimo Truhetz of the Wegener Center for Climate and Global Change, Graz. The data describe the precipitation per day over the European alpine region from 01.01.1980 to 01.01.1991. Additionally, associated orographic information (i.e. describing the average height of the area) was provided. Such data are quite common in climate physics research. The precipitation for 24 hours is measured as the total precipitation within 6:00 UTC2 and 6:00 UTC of the next day. The data were submitted in a single large binary file of the following format: Each single number is precipitation data in mm/day over the European alpine region (latitude 49.5N- 43N, longitude 4E-18E) with 78 x 108 grid points. The time range covers 11 years, from 1980-1990; this equals 4018 days. The data is stored in 4018 arrays (one after another) of 78 x 108 (rows x colums) values. The first array contains precipitation data over the selected geographic region of day 1 (1.1.1980), the 2nd array is precipitation data over the selected geographic region of day 2 (2.1.1980), and so on. A visualisation of the

1 http://sonenvir.at/workshop/ 2Coordinated Universal Time

151 152

Figure 10.1: Precipitation in the Alpine region, 1980-1991. average precipitation over the 11 years given is shown in figure 10.1. A second file provides associated information on orography of the European alpine region, i.e., the terrain elevation in meters. This data is stored in one 78 x 108 array. General questions the domain scientists deemed interesting were whether it would be possible to hear all three dimensions (geographical distribution and time) simultaneously and to find a meaningful representation of the distribution of precipitation in space and time. They also speculated that it might be relaxing to listen to a synthetic rendering of the sound of rain. As possible topics to investigate, they suggested:

• 10-year mean precipitation in the seasons

• variability of precipitation via standard deviations (i.e., do neighbouring regions more often swing together or against each other?)

• identification of regions with similar characteristics via covariances (do different regions sound different?) 153

• extreme values (does the rain fall regularly, or are there long droughts in some regions?)

• correlations in height (does precipitation behave similarly in similar orographic heights?)

• distribution of precipitation amounts (on how many days is the precipitation higher than 20mm, 19 mm, 18mm, etc?)

As a test of the proper geometry of the data format, the SC3 starting file for the sessions provided a graphical representation of the orographic dataset, with higher regions shown as brighter gray, see figure 10.2. We also provided example reading routines for the data file itself.

Figure 10.2: Orography of the grid of regions.

Session team A

In the brainstorming phase, team A came up with the idea to use spatial distribution for the definition of features like variability, entropy, etc., possibly using large regions such as quarters of the entire grid. The team agreed that the data should be used as time series, since rhythmical properties are expected to be present. The opinion was that the main interest is in the deviations from the average yearly shape. Thus, the team decided to 154 try using an acoustic representation of the data series conditioned to the average yearly shape as a benchmark curve as follows: if the value in question is higher than average, high pitched dust (single sample crackle through a resonance filter) is audible. if value is lower than average, lower pitched dust is heard. The amplitude should scale with the absolute deviation from average, and the dust density should scale with absolute rain values. In this fashion, one could sonify different locations at the same time by assigning the sonifications of different locations to different audio channels. This should produce temporal rhythms if there are systematic dependencies between the locations. As data reading turned out to be more difficult than expected, the team began experimenting with dummy data to design the sounds and behaviour of the sonification, while the second programmer worked on data preparation. In the end, the team ran out of time before the real data became accessible enough for replacement.

Session team B

Team B discussed many options while brainstorming: as the data set was quite large, choosing data subsets, e.g. by regions; looking for possible correlations; maybe listening to the entire time range for a single location; maybe use a random walk for a reading tra- jectory; select location by pointing (mouse), compare a reference time-series sonification to the data subset under study. The team found a good solution for the data reading difficulties: they read only the data points of interest directly fro the binary file, as this turned out to be fast enough for real time use. The designs written explored comparing the time-series for two single locations over ten years; the sound examples produced demonstrate these pairs played sequentially and simultaneously on left and right channels. The sounds are produced with discrete events: each data point is rendered by a gabor grain with a center frequency determined by the amount of rain for the day and location of interest. In the final discussion, the team found that a comparison of different regions would be valuable, where the mean area over which to average should be flexible. Such averaging could also be considered conceptually similar to fuzzy indexing into the data; modulating the averaging range and providing fuzziness in three dimensions would be worth further exploration.

Session team C

Team C had the most difficulties getting the data loaded properly; this was certainly a deficiency in the preparation. After converting a subset of the data with Excel, they decided on comparing the data for January in all years, and listen for patterns and dif- 155 ferences across different regions. Some uncertainty about whether the conversions were fully correct remained, but this was considered relatively unimportant for the experimen- tal workshop context. The sonification design entailed calculating a mean January value for all locations, and comparing each individual day to the mean value. This was intended to show how the precipitation varies, and to identify extreme events. All 8424 stations are being scanned along north/south lines, which slowly move from west to east. The ration of the day’s rainfall to the mean was mapped to the resonance value of a bandpass filter driven by white noise. The sound examples provided cover January 5, 15, and 25 for the years 1980 and 1981 scaled into 9 seconds; a much slower variant with only 190 stations is presented as well for comparison, and this shows a much smoother tendency. Varying filter resonance as rapidly as described above is not likely to be very clearly audible.

Comparison in SDSM terms

The data set given is quite interesting from an SDSM perspective: it has 2 spatial indexing dimensions, with 78 * 108 = 8424 geographical locations, for which orographic data dimension (average elevation above sea level) are also given. For each location, data are given for 1 (or maybe 2) time dimensions, namely, 365 (resp. 366) days * 11 years = 4018 time steps (days). Thus, multiple locations are possible for its data anchor (see figure 10.3), depending on the viewpoint taken. From a temporal point view, one would treat the 8424 locations as the data size, and create a ’day anchor’ at x: 8424, y: 1, a month anchor at x: 8424, y: 30, the year anchor and the 11-year are both outside the standard map size. For a single location, an anchor could be at x: 4018, y: 2. In any case, whatever one considers to be the unit size of this kind of data set is arbitrary, as both time and space dimensions could be different sizes and/or resolutions. Team A mapped one year into 7.3 seconds, and presented two streams of two mapped dimensions each (pitch label for deviation polarity, and intensity for deviation amount). These choices put its SDSM point at an expected gestalt size of 150 (x), dimensions at 2 (y), and streams at 2 (Z). Continuous parameter mapping is a reasonable choice for this location on the map. Team B begins with 8424 data points per 4 second loop; this is a rather dense gestalt size of ca. 6000. The design choice of averaging over 9-10 values scales this to ca. 600, which seems well suited for granular synthesis with a single data dimension used, mapped to the frequency parameter (y-axis), and using two parallel streams (z). Team C maps 8424 values into 9 seconds, which creates a gestalt size of ca. 3000 (label C1 on the map); this seems very fast for modulation synthesis of filter bandwidth, although it uses only a single stream and dimensions, so y and z values are both 1. 156

Figure 10.3: SDSM map of Rainfall data set.

The slower version (190 values in 9 seconds, C2) is more within SDSM recommended practice, at a gestalt size of ca. 60. While the SDSM concept recommends making indexing dimensions available for interaction, this was too complex for the workshop setting.

10.2 Polysaccharides

This problem was worked on for two two-hour sessions, so the participants had more time to reflect and consider how to proceed. The data were submitted by Anton Huber of the Institute of Chemistry at University of Graz.

10.2.1 Polysaccharides - Materials made by nature3

Polysaccharides make up most of the biological substance of plant cells. Their molecular geometries, such as their symmetries determine the physical properties of most plant- based materials. Even materials from trees of the same kind have different properties because of the environment they come from; so understanding the properties of a given

3This was the title of Anton Huber’s introductory talk. 157 sample is of crucial importance to Materials scientists. A typical question that occurs is: Are the given datasets (which should be the same) somehow different? In aqueous media, polysaccharides form so-called supermolecular structures. Very few of these molecules can structurise amazing amounts of water: water clusters can be several millimeters large. By comparison, the individual molecules are measured in nanometers, so there is a scale difference of six orders of magnitude! In a given measurement setup the materials are physically sorted by fraction: on the left side particles with big molecules (high mol numbers) are found, one the right small ones. Rather few bins (on the order of 30) of sizes and corresponding weights are conventionally considered sufficiently precise for classification, both in industry and science. The data for this session were analysis data of four samples of plant materials: beech, birch, oat and rice. Three different measurements were given, along with their indexing axes: channel 1 is an index (corresponding to mol size) of the measurement at channel 2, channel 3 is an index of channel 4, and channel 5 is an index of channel 6. Channels 1 and 2 contain the measured delta-refraction index of electromagnetic radi- ation aimed at the material sample; i.e. how strongly light of a given wavelength is diverted from its direction by the size-ordered regions along the sample. (The exact wavelength used was not given.) Channels 3 and 4 contain the measured fluorescence index under electromagnetic radi- ation, again dependent on the size-ordered regions along the sample. Channels 5 and 6 contain the measured dispersion of the material sample under light, or more precisely, how much the dispersion differs from that of clear water, based on molecule size along the size-ordered axis of the sample.

10.2.2 Session notes

The notes for this session were reconstructed shortly after the workshop.

Brainstorming

One of the first observations made was that the data look like FFT analyses - so the team considered using FFT and Convolution on the bins directly. An alternative could be a multi-filter resonator with e.g. 150 bands, maybe detuned from a harmonic series. As was noted several times in the workshop by those favouring audification, it seemed desirable to obtain ’rawer’ data as directly as possible from the measurements; these may be interesting to be treated as impulse responses. The first idea the team decided to try was to create a ”signature sound” of 1-2 seconds for one channel each data file by parameter mapping to about 15-20 dimensions; a 158 second step should be to compare two such signature sounds (for two channels of the same file) binaurally.

Experimentation

A look at the data revealed that across all files, channel 1 seemed massively saturated in the upper half, so we decided to take only the undistorted part of channel 1, downsample it to e.g. 50 zones, and turn these into 50 resonators, which would ring differently for the different materials when excited. The resonator frequencies were scaled according to the index axis, which is roughly equal to particle size: small particles are represented by high sounds, and big particles by lower resonant frequencies. Based on this scheme, we proceeded to make short ’sound signatures’ for the four materials, using delta-refraction index (channel 2) and fluorescence (channel 4) data, with two different exciter signals: Noise and impulses. The sound examples provided here4 present all four materials in sequence: Delta refraction index, impulse source: Materials1 Pulse BeechBirchOatRice.mp3 Delta refraction index, noise source: Materials1 Noise BeechBirchOatRice.mp3 Fluorescence, impulse source: Materials2 Pulse BeechBirchOatRice.mp3 Fluorescence, noise source: Materials2 Noise BeechBirchOatRice.mp3

The team also started making these playable from a MIDI drum pad for a more interactive interface, but did not have enough time to finish this approach.

Evaluation

The group agreed that having time for two sessions was much better for deeper discussion and more interesting results. Even so, more time would be desirable. In this particular session, the sound signatures made were easy to distinguish, so in principle, this approach works. What could be next steps? It would be useful to implement signatures of more than one channel to increase reliability of properties tracking; e.g. for materials production monitoring, this could be a useful application. It would also be interesting to try a nonlinear complex sound generator (such as a feedback FM algorithm) and control its inputs from the data, using on the order of 20-30 dimensions; this holistic approach would be interesting from the perspective of sonification research, as it might lead to emergent audible properties without requiring detailed matching of individual data dimensions to specific sound parameters. While

4http://sonenvir.at/workshop/problems/biomaterials/sound descr 159 there was no time to attempt this within the workshop setting, the idea would certainly warrant further research. In SDSM terms, the dimensionality of each data point is unusually high here. The sonifications render each material (consisting of 680 measurements) to a reduced range of the data, downsampled to 50 values, as resonator specifications, i.e., intensity and ringtime of each band. Given an interactive design, such as one allowing tapping on the different materials or probes, one can easily compare on the order of 5-8 samples within short term memory limits. Chapter 11

Examples from the ICAD 2006 Concert

The author was Concert Chair for the ICAD 2006 Conference at Queen Mary University London, and together with Christian Day´eand Christopher Frauenberger, organized the Concert Call, the review process for the submissions, and the concert itself (see section 4.3 for full details). This chapter discusses four of the eight pieces played in the concert, chose for diversity of the strategies used, and clarity and completeness of documentation.

11.1 Life Expectancy - Tim Barrass

This section discusses a sonification piece created by Tim Barrass for the ICAD 2006 Concert, described in Barrass(2006), and available as headphone-rendered audio file 1. Life Expectancy is intended to allow listeners to find relationships between life expectan- cies and living conditions around the world. The sounds he chooses are quite literal representations of their meanings, making them relatively easy to ’read’, even though the piece is quite dense in information. It is structured in three parts, beginning with a 20 second section which mainly provides spatial orientation, a long middle section representing living conditions for each country in a dense 2 second soundscape, and a short final section illuminating gender differences in life expectation. The opening section presents the spatial locations of all country capitals, ordered by ascending life expectancy. The speaker ring is treated as if it were a band around the equator, with the listener inside near the center of the globe. Each capital location is marked by a bell sound (which is easy to localise), spatialised in the ring of speakers according to the capital’s longitude; latitude (distance to the equator, North or South) is represented by the bell’s pitch, where North is higher. A whistling tone represents ascending life expectancy for each country, and as it is not spatialised, it is easy to

1 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/life.mp3

160 161 follow as one stream. Each country has roughly 0.1 second for its bell and whistle tone. The main section of the piece is about six minutes long, and presents a rich, complex audio vignette for every country, at the length of a musical bar of two seconds. The most intriguing aspect here is the ordering of the countries: First we hear the country with the highest life expectancy, then the lowest, the second highest, the second lowest, and so on until the interleaved orders meet in the median. Each sound vignette consists of the following sound components: Two bell sounds whose pitch indicates latitude, first of the equator, then of of the country’s capital spatial location, their horizontal spatial position being longitude. A chorus speaking the country’s name, with the number of voices representing the population number, and whose extension representing the country area. The capital name is also spoken, at its spatial location. A fast ascending major scale represents life expectancy, once for male, once for female inhabitants of the country. The number of notes of the scale fragment represents the number of life decades, so a life expectancy of 75 years would be represented as a scale covering 8 steps (up to the octave) with the last note shortened by 50 percent. The gender differences between each pair of scales, and the alternation of extreme contrasts in the beginning of the sequence articulates this aspect very interestingly. Clinking coins signify economic aspects: average income by density of the coin sounds, while gross domestic product (GDP) is indicated by reverb size. The sound of water filling a vessel indicates access to drinking water and sanitation: a full vessel indicates good access, an empty vessel little access. Three pulses of this sound provide total, rural, and urban values. Sanitation is rendered by adding distortion to the water pulses when sanitation values are low (suggesting ’dirty’ water). The final short section of the piece focuses on gender differences in life expectancy. As the position bell moves from the North Pole to the South Pole, life expectancies for each country are represented with a tied note, going from the value for male to female (usually rising), and spatialised at the capital’s location. Tim Barrass is very modest in commenting on the piece (Barrass(2006)):

I have taken a straightforward and not particularly musical approach, in an attempt to gain a clear impression of the dataset. The sound mapping is ”brittle”, designed specifically for the dataset. I would not expect this approach to provide a flexible base to explore the musical, sonic and in- formational possibilities of similar material, but it may at least serve as an example of one direction that has been tried.

While the piece may appear ’artless’ in representing so much of the dataset with appar- ently simplistic sound mappings, I find the piece extremely elegant, both as a sonification, and as a composition. The sound metaphors are so clear that they almost disappear, as 162 does the spatial representation. It is quite an achievement to create concurrent sound layers that are both rich, complex, dense enough to be demanding to listen to, and trans- parent enough to allow for discovering different aspects as the piece proceeds. This piece certainly provided the richest information representation of all entries for the concert. The beginning and end sections work beautifully as frames for the piece, as orientation help, and as alternative perspectives on the same questions. For me, the questions that remain long after listening to the piece come from the strongest intervention in the piece, the idea of sorting the countries so as to begin with the most extreme contrasts in life expectancy, and moving toward the average lifespan countries.

11.2 Guernica 2006 - Guillaume Potard

This section discusses a piece created by Guillaume Potard for the ICAD 2006 Concert, described in Potard(2006), and available as headphone-rendered audio file 2. Guernica 2006 sonifies the evolution of world population and the wars that occurred between the year 1 and 2006. Going far beyond the data supplied with the concert call, Potard has compiled a comprehensive list of 507 documented wars, with geographical location, start and end year, and a flag indicating whether it was a civil war or not. He also located estimates for world population for the same time period. The sonification design represents the temporal and geographical distribution chrono- logically. The temporal sequence follows historical time: the start year of each war determines when its representing sound begins. As many more wars have occurred to- ward the end of the period observed, the time axis was slowed down logarithmically in the course of the piece, so the duration of a year near the end of the piece is 4 times longer than at the beginning. This maintains the overall tendency, but still provides better balance of the listening experience. The years 1, 1000 and 2000 are marked by gong sounds for orientation. The entire piece is scaled to a time frame of five minutes. The start time of each war is indicated by a weapon sound; the sounds chosen change with the evolution of weapon technology. In the beginning, horses, swords, and punches are heard, while after the invention of gunpowder cannons, guns, and explosions dominate. Newer technology such as helicopters is heard only toward the end of the piece, after the year 1900. Civil wars are marked independently by the additional sound of breaking glass. The spatial distribution of the sounds was handled by vector-based amplitude panning for the directions of the sound sources relative to the reference center, the geographical location of London. Sound distance was rendered by controlling the ratio of direct to reverberation sound. 2 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/guernica.mp3 163

The evolution of world population is sonified concurrently as a looping drone, with playback speed rising as population numbers rise. Guernica 2006 was certainly the most directly dramatic piece in the concert. The use of samples communicates the intended context very clearly, without requiring much prior explanation. As Potard(2006) states, richer data representation with this approach would certainly be possible; he considers representing war durations, distinguishing more types of war, and related factors like population migrations in future versions of the piece.

11.3 ’Navegar E´ Preciso, Viver N˜ao E´ Preciso’

This section discusses a sonification piece created by Alberto de Campo and Christian Day´efor the ICAD 2006 Concert, described in de Campo and Day´e(2006), and avail- able as headphone-rendered audio file3. As this piece was co-written by the author of this dissertation, much more background can be provided than with the other pieces discussed. In this piece, we chose to combine the given dataset containing current (2005) social data of 190 nations with a time/space coordinates dataset of considerable historical significance: The route taken by the Magellan expedition to the Moluccan Islands from 1519-1522, which was the first circumnavigation of the Globe.

11.3.1 Navigation

The world data provided by the ICAD 2006 Concert Call all report the momentary state for the year 2005, and thus free of the idea of historical progression. Also, the choice of which variables to include in the sonification, and how, must be based on theoretical assumptions which are not trivial to formulate on a level of aggregation involving 6.513.045.982 individuals (the number of people estimated to have populated this planet on April 30, 2006, see U.S. Census Bureau(2006)). The data do provide detailed spatial information, so we decided to choose a familiar form of data organization that combines space and time: the journey. Traveling can be defined as moving through both space and time. While the time dimension as we experience it is unimpressed by the desires of the traveler, s/he can decide where to move in space. The art and science that has enabled mankind to find out where one is, and in which direction to go to arrive somewhere specific, is known as Navigation. Navigation as a practice and as a knowledge system has exerted major influence on the

3 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/navegar.mp3 164 development of the world. The Western world has changed drastically by the conse- quences of the journeys led by explorers like Christopher Columbus or Vasco da Gama. (The art of navigation outside Europe, especially in Polynesia, is covered very interest- ingly in Conner(2005), pp 41-58.) The first successful circumnavigation of the globe, led by Ferdinand Magellan, proved beyond all scholastic doubts that the earth in fact appears to be round. This would not have happened without the systematic cultivation of all the related sciences in the school for navigation, map-making and ship-building founded by Henry the Navigator, King of Portugal in the 15th century. (Conner(2005) also describes their methods of knowledge acquisition vividly as mainly coercion, appro- priation, and information hoarding, see chapter: Blue Water Navigation, pp. 201ff.) For all these reasons, Magellan’s Route became an interesting choice for temporal and spatial organization for our concert contribution.

11.3.2 The route

Leaving Seville on August 10, 1519, the five ships led by Magellan (called Trinidad, San Antonio, Concepci´on,Victoria, and Santiago) crossed the Atlantic Ocean to anchor near present-day Rio de Janeiro after five months (Pigafetta(1530, 2001); Wikipedia (2006b); Zweig(1983)). Looking for a passage into the ocean later called the Pacific, they moved further south, where the harsh winter and nearly incessant storms forced them to anchor and wait for almost six months. While exploring unknown waters for this passage, the Santiago sank in a sudden storm, and the San Antonio deserted back to Spain; the remaining three ships succeeded and found the passage in the southernmost part of South America which was later called the Magellan Straits, in late October 1520. The ships then headed across the Mar del Sur, the ocean Magellan named the Pacific, towards the archipelago which is now the Philippines, where they arrived four months later. Seeking the mythical Spice Islands, Magellan and his crew visited several islands in this area (Limasawa, Cebu, Mactan, Palawan Brunei, and Celebes); on Mactan, Magellan was killed in a battle, and a monument in Lapu-Lapu City marks the site where he died. In spite of their leader’s death, the crew decided to fulfil their mission. By now diminished to 115 persons on just two ships (Trinidad and Victoria), they finally managed to reach the Spice Islands on November 6, 1521. Due to a leak in the Trinidad, only the Victoria ”set sail via the Indian Ocean route home on December 21, 1521. By May 6, 1522, the Victoria, commanded by Juan Sebastin Elcano, rounded the Cape of Good Hope, with only rice for rations. Twenty crewmen died of starvation before Elcano put into Cape Verde, a Portuguese holding, where he abandoned 13 more crew on July 9 in fear of losing his cargo of 26 tons of spices (cloves and cinnamon).” Wikipedia(2006b). On September 6, 1522, more than three years after she left Seville, Victoria reached the port of San Lucar in Spain with a crew of 18 left. One is reminded of a song by Caetano 165

Figure 11.1: Magellan’s route in Antonio Pigafetta’s travelogue (Primo Viaggio Intorno al Globo Terracqueo. - First travel around the terracqueous globe, see Pigafetta(1530)) . 166

Figure 11.2: Magellan’s route, as reported in wikipedia. http://wikipedia.org/Magellan

Veloso, who, pondering the mentality and fate of the Argonauts, wrote: ”Navegar ´e preciso, viver n˜ao´epreciso” - ”Sea-faring is necessary, living is not” (see appendixE).

11.3.3 Data choices

The explorers in the early 15th century were interested in spices (which Europe was massively addicted to at the time), gold, and the prestige earned by gaining access to good sources of both. Nowadays, other raw materials are considered premium goods. What would someone who undertakes such a journey today hope to gain for his or her exertions; what is as precious today as gold and spices were in the 16th century? We imagine today’s conquistadores (or globalizadores) would likely ask first about eco- nomic power: how rich is an area? Second, they would probably check geographical potential; and chances are that if any one resource will be as central to economic activ- ity in the future as spices were centuries ago, it will be drinking water resources. Water might well become the new pepper, the new cinnamon, or even the new gold. (As the Gulf wars showed, oil would have been the obvious current choice; however, we found the future perspective more interesting.) Thus we chose to focus on two main dimensions: one depicting economic characteristics of every country we pass, and another informing us about its inhabitants’ current access to drinking water. 167 11.3.4 Economic characteristics

The variable ’GDP per capita’ included in the given data set provides some insights in the overall economic performance of a country. Obviously, the ’GDP per capita’ variable lacks information about the distribution of the income; it only says how much money there would be per person if it were equally distributed. This is never the case; on the contrary, scientists find that the rich get richer and the poor get poorer both in intra- national and international contexts. E.g. in the US of 1980, the head of a company earned on average 42 times as much as an employee by the year 1999, this ratio was more than ten times higher: a company leader earned 475 times more than an average employee (Anonymous(2001)).

Figure 11.3: The countries of the world and their Gini coefficients. From http://en.wikipedia.org/wiki/Gini.

A measure that captures aspects of income distribution is the Gini coefficient on in- come inequality (Wikipedia(2006a)). Developed by Corrado Gini in the 1910s, the Gini coefficient is defined as the ratio of area between the Lorenz curve of the distribution and the curve of the uniform distribution, to the area under the uniform distribution. More common is the Gini index, which is the Gini coefficient times 100. The higher the Gini index, the higher the income differences between the poorer and the richer parts of a society. A value of 0 means perfectly equal distribution, while 100 means that one person gets all the income of the country and the others have zero income. However, the Gini index does not report whether one country is richer or poorer than the other. Our sonification tries to balance the limitations of these two variables by combining them: We include two factors that go into a Gini calculation; the ratio of the top and bottom 10 percentile of all incomes in a population, and the ratio of the top to bottom 20%. In Denmark, at Gini index rank 1 of 124 nations for which Gini data exist, the top 10% earn 4.5x as much as the bottom 10%, for the UK (rank 51), the ratio is 13.8:1; 168 the US (rank 91) ratio is 15.9:1; in Namibia, at rank 124, the ratio is 128.8:1. (In the sonification, missing values here are replaced by a dense cluster of near-center values, which is easy to distinguish acoustically from the known occurring distributions.)

11.3.5 Access to drinking water

An interesting variable provided by the ICAD06 Concert data set is ’Estimated percentage of population with access to improved drinking water sources total’. Being part of the so-called ”Social Indicators” (UN Statistics Division(1975, 1989, 2006)), the data are reported to the UN Statistics Division by the national statistic agencies of the UN member states. Unfortunately, this indicator has a high percentage of missing values (46 of 190 countries, or 24.2%). This percentage can be reduced to 16.3% (31 countries) by excluding missing values from countries which are not touched by our Magellanian route. Still, the problem is fundamental and must be addressed. The strategy we chose was to estimate the missing values on the basis of the data value of the neighboring countries, being aware that this procedure does not satisfy scientific rigor. In most cases, though, we claim that our estimates are likely to match reality: for instance, it is very likely that in France and Germany (as in in most EU countries), very close to 100% of the population do have access to ”improved drinking water resources”, and that this fact is considered too obvious to be statistically recorded.

11.3.6 Mapping choices

We deliberately chose rather high display complexity; while this requires more listener concentration and attention for maximum retrieval of represented information, hopefully a more complex piece invites to repeated listening, as audiences tend to do with pieces of music they enjoy. Every country is represented by a complex sound stream composed of a group of five resonators; the central resonator is heard most often, the outer pairs of resonators (’satellites’) sound less often. All parameters of this sound stream are determined by (a) data properties of the associated country and (b) the navigation process, i.e. the ship’s current distance and direction towards this country. At any time, the 15 countries nearest to the route point are heard simultaneously. This is both to limit display complexity for the sake of clarity, and to keep the sonification within CPU limits for realtime interactive use. The mapping choices in detail are given in 11.1. In order to provide a better opportunity to learn this mapping, the author has written a patch which plays only a single sound source/country at a time, where it is possible to switch between the parameters for all 192 countries. This allows comparing the multidimensional changes as one switches from say Hongkong (very dense population, very rich) to Mongolia (very sparse population, poor). In public demonstrations and talks, this has proven to be quite appropriate for this relatively complex mapping. When 169

Table 11.1: Navegar - Mappings of data to sound parameters

Population density of country Density of random resonator triggers GDP per capita of country Central pitch of the resonator group Ratio of top to bottom 10% Pitches of the outermost (top and bottom) ’satellite’ resonators Ratio of top to bottom 20% Pitches of the inner two ’satellite’ resonators (missing values for these become dense clusters) Water access Decay time of resonators (short tones mean dry) Distance from ship Volume and attack time (far away is ’blurred’) Direction toward ship Spatial direction of the stream in the loudspeakers (direction North is always constant) Ship speed, direction, winds Direction, timbre and volume of wind-like noise

hearing the piece after experimenting for a while with an example of its main components, many listeners report understanding the sonification much more clearly. It has also been helpful to provide some points of orientation that can be identified while the piece unfolds, as listed in table 11.2.

11.4 Terra Nullius - Julian Rohrhuber

This section discusses a sonification piece created by Julian Rohrhuber for the ICAD 2006 Concert. It is described in Rohrhuber(2006), and available as headphone-rendered audio file here4.

11.4.1 Missing values

The concept for ’Terra Nullius’ builds on a problem present (or actually, absent) in data from many different contexts: missing values. Rohrhuber(2006) states that in sonification, data are assumed to have implicit meaning, and that sonifications try to communicate such meaning. In the specific case of the data given for the concert, most data dimensions are quantitative; thus the data can be ordered along any such dimension, and the value for one dimension of a given data point can be mapped to a sonic property of a corresponding sound event. For example, one could order by population size, and map GDP per capita to the pitch of a short sound event. However, with missing values the situation becomes considerably more complicated:

4 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/audio/concert/terra.mp3 170

Table 11.2: Some stations along the timeline of ’Navegar’

0:00-0:10 Very slow move from Sevilla to San Lucar 0:20-0:26 Cape Verde: very direct sound (i.e. near the capital), rather low, dense spectrum (poor country, unknown income distribution) 0:54-1:00 Uruguay/Rio de la Plata: very direct sound, passing close by. 1:05-2:40 Port San Julian, Patagonia: very long stasis, everything is far away, six months long winter break in Magellan’s travel 2:45-3:00 Moving into Pacific Ocean: new streams, many dense spectra; unknown income distributions 3:20 Philippines: very direct sound (near capital), high satellites: unequal income distribution 4:00 Brunei: very direct, high, dense sound: very rich, unknown distribution ... towards Moluccan Islands 4:50 East Timor: direct, mostly clicking, only very low frequency resonances (very poor, little access to water, unknown income distribution) 5:15 into Indian Ocean: ’openness’, sense of distance 5:50 approaching Africa: more lower centers, with very high satellites: poor, with very unequal distributions (but at least statistics available) 5:55 Pass Cape of Good Hope: similar to East Timor 6:10 Arrive back at San Lucar, Spain

Rohrhuber states that ”These non-values break gaps into the continuity of evaluation - they belong to another dimension within their dimension. Missing data not only fail to belong to the dimension they are missing from, they also fail to belong in any uniform dimension of ’missing’.” Furthermore, one must consider that there are no fully valid strategies for dealing with missing values: Removing data points with missing values dis- torts the comparisons in other data dimensions; substituting likely data values introduces possible errors and reduces data reliability; marking them by recognizably out-of-range values may be logically correct, but these special values can be quite distracting in a sonification rendering.

11.4.2 The piece

The piece consists of multiple cycles, each moving around the globe once. For every cycle, all countries within a zone parallel to the equator are selected and sonified one at a time in East to West order, as shown in figure 11.4. In the beginning, the zone contains latitudes similar to England, or actually London, as the capitals determine geographical position. The sound is spatialised accordingly in the ring of speakers, so one cycle around 171 the globe moves around the speaker cycle once. With every cycle, the zone of latitudes widens until all countries are included.

Figure 11.4: Terra Nullius, latitude zones

To sonify the missing values in the 46 data dimensions given, a noise source is split into 46 frequency bands. When a value for a dimension is present, the corresponding band remains silent; the band only becomes audible when the value for that dimension in the current country is missing. After all countries are included in the cycle, the latitude zone narrows again over several cycles, and ends with the latitude and longitude of London. For this second half, the 172

filters have smaller bandwidth, so there is more separation between the dimensions. Gradually, constant decorrelated noise fades in on all speakers, which remains for a few seconds after the end of the last cycle. ’Terra Nullius’ plays very elegantly with different orders of ’missingness’, in fact creating what could be called ’second-order missing values’ of what is being sonified: ”... A band of filtered noise is used for each dimension that is missing, i.e. the noisier it is, the less we know. In the end the missing itself seems quite rich of information - only about what?” (Rohrhuber(2006)) Personally, I find this the most intriguing work of art in the ICAD 2006 concert. Subtly shifting the discussion to recursively higher levels of consideration of what it is we do not know, it is an invitation to deeper reflection on many questions about meaning and representation.

11.5 Comparison of the pieces

In order to study the variety of approaches that artists and sonifiers took in creating pieces, SDSM terminology and viewpoint turned out to be quite useful. For the dataset given, a clear anchor can be provided at 190 data points and 26 dimensions for the basic dataset, and 44 for the extended set (see figure 11.5). Life Expectancy chooses a rather large set of data dimensions, and sonifies aspects of it in three distinct ways: an overview of for spatial orientation, sorted by life expectancy (LE1), a long sequence of 2 second vignettes, densely packed with information (LE2), and a final sequence of life expectancies sorted North-South (LE3). Orientation - LE1 Within 20 seconds, a signal sound is played for each country, ordered by total life ex- pectancy; this renders 3 mapped dimensions (life expectancy, latitude, longitude). Vignettes - LE2 Five streams make up each vignette: two bell sounds - 2 dimensions: latitude and longitude; spoken country and capital name - 6 dimensions: 2 names, spatial location again (2), population size, and area; scale fragment - 2 dimensions: life expectancy for males and females; clinking coins - 2 dimensions: average income over density and GDP; water vessel - 3x2 dimensions: 3 pulses with 2 values each, ’fullness’ and distortion. This combination of parallel/interlocked streams with [2, 6, 2, 2, 6] dimensions each renders a total of 16 dimensions per vignette of 2 seconds! While these could also be rendered visually as a ’sideways’ view of the SDS map (showing the Y and Z axes), they are shown here as 16 parallel dimensions for better comparability. 173

Figure 11.5: SDSM comparison of the ICAD 2006 concert pieces.

Ending - LE3 This section is again short (30 seconds with intro and ending clicks, 17 without), and compares the 2 life expectancy values for males and females, with the countries sorted North/South; including spatial location, it uses 4 dimensions. Overall, the piece has very literal, easy to read mappings to sounds; it employs a really complex, differentiated soundscape, and it is very true to the concept of sonification. Guernica uses its own data, thus requiring its own data anchor. The piece renders world population as one auditory stream with a single dimension (GUE+), while each war is its own stream of 3 dimensions; while the maximum number of simultaneous wars in Potard’s data is around 35, the piece does not use war durations, so the maximum number of parallel streams is not documented. The order of the data is chronological, and at 507 wars within 300 seconds, it has an average event density of 5 within 3 seconds. Three dimensions are used for each event: the war’s starting year, and its latitude/longitude. The parallelism of streams is roughly sketched with copies of the label GUE receding along the Z axis; as this is dynamically changing, there is no satisfying visual representation. Like Life Expectancy, Guernica features very literal sound mappings (samples of fighting sounds); it is based on additional data collected on wars and population since year 1, 174 which extend the starting dataset consideraby; and it adds the notion of a timeline and historical evolution. Navegar orders the data along a historical time/space route. Within 6 minutes, 134 countries are rendered (the others are too far away from the route to be ’touched’), which puts the average data point density around 1 per 3 seconds. At any point, the nearest 15 countries are rendered as one stream each, with 7 dimensions per stream (NAV): lat- itude/longitude (with moving distance and direction), population density, GDP/capita, top 10 and top 20 richest to poorest ratios, and water access. The parallelism is again indicated symbolically as multiple NAV labels along the Z axis. Additionally, ship speed, direction, and weather conditions are represented, based on 76 timeline points (NAV+). Like Guernica, Navegar introduces a historical timeline; unlike it, it juxtaposes that with current social data. It uses metaphorically more indirect mappings than most of the other submissions. Uniquely within the concert context, it creates a soundscape of stationary sound sources with a subjective perspective: a moving observer (listener), and it also sonifies context (here, speed and travel conditions). Terra Nullius organizes the data by two criteria: selection by latitude zone, ordering by longitude. A maximum of all 46 dimensions is used throughout the piece, which sets its Y value on the SDSM map. Within 19 cycles, larger and larger data subsets are chosen; first, 14 countries within 18 seconds, putting it at a gestalt size of 2-3 (TN1 in the map). This speeds up to 190 countries at a rate of 100/sec, or ca 35 gestalt size (TN2), and returns to roughly the original rate eventually. What sets Terra Nullius apart from all other entries is that it assumes a meta-perspective on data perceptualisation in general by studying missing values exclusively. Conclusion While the SDSM map view of all for pieces shows the large differences between the approaches taken, it cannot fully capture or describe the radical differences in concepts manifested in this subset of the pieces submitted. On the one hand, that would be asking a lot of an overview-creating, orientational concept; one the other, it is interesting to find that even within rather tightly set constraints like a concert call, creativity easily defies straightforward categorisation. Chapter 12

Conclusions

This work consists of three interdependent contributions to sonification research: a the- oretical framework that is intended for systematic reasoning about design choices while experimenting with perceptualisations of scientific and other data; a software infrastruc- ture that pragmatically supports the process of fluid iterative prototyping of such designs; and a body of sonifications realised using this infrastructure. All these parts were created within one work process in parallel, interleaved streams: design sketches suggested ideas for infrastructure that would be useful; observing and analysing design sessions led to deeper understanding which informed the theoretical framework, and both the growing framework and the theoretical models eventually led to a more effective design workflow. The body of sonifications created within this system, and the theoretical models derived from the analyses of this body of practical work (and a few selected other sonification designs of interest) form the permanent results of this dissertation. They contribute to the field of sonification research in the following respects:

• The Sonification Design Space Map and the related models provide a sonification- specific alternative to TaDa Auditory Information Design, and they suggest a clearer, more systematic methodology for future sonification research, in particular for sonification design experimentation.

• The SonEnvir framework provided the first large-scale in-depth test of Just In Time programming for scientific contexts, which was highly successful. The sonification community, and other research communities have become aware of the flexibility and efficiency of this approach.

• The theoretical models, the practical methodology and the individual solutions developed here may help to reduce time spent to cover large design spaces, and thus contribute to more efficient and fruitful experimentation.

The work presented here was also employed in sonification workshop settings, and nu- merous talks and demonstrations given by the author. It proved to be helpful in giving

175 176 interested non-experts a clear impression of the central issues in sonification design work, and has been received favourably by a number of experts in the field.

12.1 Further work

Within the SonEnvir project, many compromises had to be made due to time and capacity constraints. Also, given the breadth of the overall approach chosen, many ideas could not be fully explored, and would thus warrant further research. In the theoretical models, the main desirable future research aims would be: 1. Integration of more analyses of the growing body of Model-Based Sonification designs. 2. Expansion of the user interaction model based on a deeper background in HCI research. In the individual research domains, several areas would warrant continued exploration. Here, it is quite gratifying to see that one of the research strands has led to a direct followup project: The QCDaudio project hosted at IEM Graz continues and extends research begun by Kathi Vogt within SonEnvir. For the EEG research activities, two strategies seem potentially fruitful and thus worth pursuing: continuing the planned integration into the NeuroSpeed software, and starting closer collaborations with other EEG researchers, such as the Neuroinformatics group in Bielefeld, and individual experts in the field, such as Gerold Baier. It is quite unfortunate that none of the designs created within this research context would be directly usable for visually impaired people. In my opinion, providing better access to scientific and other data for the visually impaired is one of the strongest motivations for developing a wider variety of sonification design approaches, and would be well worth pursuing more deeply. I hope the work presented will be found useful for future research in that direction. For me personally, experimenting with different forms of sonification in artistic contexts has become even more intriguing than it was before embarking on this venture. As the entries for the ICAD concerts, as well as many current network art pieces show, cre- ative minds find plenty of possibilities for experimentation with data representation by acoustic, visual and other means; creating work that is both aesthetically interesting and scientifically well-informed is a still a fascinating activity. When more perceptual modal- ities are included in more interactive settings, the creative options and the possibility spaces to explore multiply once again. Appendix A

The SonEnvir framework structure in subversion

This section describes which parts of the framework reside in which folders in the So- nEnvir subversion repository. Note that the state reported below is temporary; pending discussion with the SC3 community, more SonEnvir work will move into the main distri- bution, as well as into general SC3 Quarks, or SonEnvir-specific Quarks.

A.1 The folder ’Framework’

This folder contains the central SC3 classes written during the project, and their respec- tive help files. The sub-folders are structured as follows:

Data: contains all the SC3 classes for different kinds of data (see the Data model discussion above), such as EEG data in .edf format; it also includes some appli- cations written as classes: The TimeSeriesAnalyzer (described in section8), the EEGScreener and EEGRealTimePlayer (described in section9).

Interaction: contains the MouseStrum Class. Most of the user interface devices/interaction classes are covered by the JInT quark written by Till Bovermann, and available from the SC3 project site at sourceforge.

Patterns: contains the HilbertIndex, a pattern class that generates 2D and 3D in- dices along Hilbert space filling curves; note that for 4D Hilbert indices there is a quark package. It also includes support patterns for Hilbert index generation, and Pxnrand, a pattern that avoids repeating the last n values of its own output.

Rendering: contains two UGen classes, TorusPanAz and PanRingTop, and a utility for adjusting the individual speakers multichannel systems for more balanced sound, SpeakerAdjust. See also section 5.5.

177 178

Synthesis: includes a reverb class (AdCVerb, used in the VirtualRoom class), sev- eral classes for cascaded filters, a UGen to indicate loop ends in buffer playback, PhasorClick (both are used in the EEG applications); and a dual band compressor.

Utilities: includes a model for QCD simulations, Potts2D, a library of singing voice formants, and various extension methods.

osx, linux, windows: these folders capture platform specific development; of these, only the OSX folder is in use for OSX-specific GUI classes. These will eventually be converted to a crossplatform scheme.

A.2 The folder ’SC3-Support’

QtSC3GUI: contains GUIs written in Qt, which were considered an option for SC3 on Windows; this strand of development was dropped when sufficiently powerful versions of cross-platform GUI extension package swingOSC became available.

SonEnvirClasses, SonEnvirHelp: these contain essentially obsolete variants of So- nEnvir classes; they are kept mainly in case some users still need to run examples using these classes.

A.3 Other folders in the svn repository

CUBE: contains the QVicon2Osc application, which can connect the Vicon tracking system (which is in use at the IEM Cube) to any software that supports Open- SoundControl, and a test for that system using the SonEnvir VirtualRoom for binaural rendering.

Prototypes: contains all the sonification designs (’prototypes’) written, sorted by sci- entific domain. These are described extensively and analysed in the chapters on sonification designs for the domain sciences, 6 - 9.

Psychoacoustics: contains some demonstrations of perceptual principles written for the domain scientists.

SC3-Training: contains a short Introduction to SuperCollider for sonification; this was written for the domain scientists, both in German and in English.

SOS1, SOS2: contains demo versions of sonification designs for two presentations (called Sound of Science 1 and 2) at IEM Graz.

testData: contains anonymous EEG data files in .edf format, for testing purposes only. 179 A.4 Quarks-SonEnvir

This folder contains all the SC3 classes written in SonEnvir that have been migrated into Quarks packages for specific topics. Each folder can be downloaded and installed as a Quark.

QCD contains some Quantum Chromodynamics models implemented in SC3.

SGLib contains a port of a 3D graphics library for math operations on tracking data. gui-addons contains platform-independent gui extensions to SC3. hilbert contains a file reader for loading pre-computed 4D Hilbert curve indices from files. rainData contains a data reader class for the Rain data used in the SBE workshop (see section 10). wavesets contains the Wavesets class, which analyses mono soundfiles into Wavesets, as defined by Trevor Wishart. This can also be used for applying granular synthesis methods on times series-like data.

A.5 Quarks-SuperCollider

These extension packages contains all the SC3 classes written in SonEnvir that have been migrated into Quarks packages for specific topics. They can be downloaded and installed from the sourceforge svn site of SuperCollider.

AmbIEM: This package for binaural sound rendering using Ambisonics has become an official SuperCollider extension package (’Quark’). ARHeadtracker is an interface class to a freeware tracking system.

The statistics methods implemented within SonEnvir have moved to the general SC3 quark MathLib, while others have become quarks themselves, such as the JustInTerface quark (JInT) written by Till Bovermann (within SonEnvir). Finally, the TUIO quark (Tangible User Interface Objects, also by Till Bovermann, of University Bielefeld) is of interest for sonification research with strongly interactive approaches. Appendix B

Models - code examples

B.1 Spatialisation examples

B.1.1 Physical sources

For multiple speaker setups, a simple and very effective strategy is to use individual speakers as real physical sources. The main advantage is that physics really help in this case; when locations only serve to identify streams, as with few fixed sources, fixed single speakers work very well. SuperCollider supports this directly with the Out Ugen: it determines which bus a signal is written on, and thus, which audio hardware output it is heard on.

// a mono source playing out of channel 4 (indices start at 0) { Out.ar(3, Ringz.ar(Dust.ar(30), 400, 0.2)) }.play;

The JITLib library in SuperCollider3 supports a more flexible scheme: sound processes (in JITLib speak, NodeProxies) run on their own private busses by default; when they should be audible, they can be routed to the hardware outputs with the .play method.

~snd = { Ringz.ar(Dust.ar(30), 400, 0.2) }; // proxy inaudible, but plays ~snd.play(3); // listen to it on hardware output 4.

NodeProxies also support more flexible fixed multichannel mapping very simply: The .playN method lets one route each audio channel of the proxy to one or several hardware output channels, each with optional individual level controls.

// a 3 channel source ~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) };

180 181

// to individual speakers 1, 3, 5: ~snd3ch.playN([0, 2, 4]);

// to multiple speakers, with individual levels: ~snd3ch.playN(outs: [0, [1,2], [3,4]], amps: [1, 0.7, 0.7]);

B.1.2 Amplitude panning

All of the following methods work for both moving and static sources. 1D: In the simplest case the Pan2 UGen is used for equal power stereo panning.

// mouse controlled pan position { Pan2.ar(Ringz.ar(Dust.ar(30), 400, 0.2), MouseX.kr(-1, 1)) }.play;

2D: The PanAz UGen pans a single channel to a symmetrical ring of n speakers by azimuth, with adjustable width over how many speakers (at most) the energy is dis- tributed.

( { var numChans = 5, width = 2; var pos = MouseX.kr(0, 2); var source = Ringz.ar(Dust.ar(30), 400, 0.2); PanAz.ar(5, source, pos, width); }.play; )

In case the ring is not quite symmetrical, adjustments can be made by remapping; however, using the best geometrical symmetry attainable is always superior to post- compensation. In order to remap dynamic spatial positions to a ring of speakers at unequal angles such that the resulting directions are correct, the following example shows the steps are needed: Given a five-speaker system, equal speaker angles would be [0, 0.4, 0.8, 1.2, 1.6, 2.0] with 2.0 being equal to 0.0 (this is the behaviour of the PanAz UGen); the actual unsymmetric speaker angles could be for example [0, 0.3, 0.7, 1, 1.5, 2.0]; so remapping should map a control value of 0.3 (where speaker 2 actually is) to a control value value of 0.4 (the control value that positions this source directly in speaker 2). The full map of corresponding values is given in table B.1.

( // remapping unequal speaker angles with asMapTable and PanAz: a = [0, 0.3, 0.7, 1, 1.5, 2.0].asMapTable; b = Buffer.sendCollection(s, a.asWavetable, 1); { |inpos=0.0| var source = Ringz.ar(Dust.ar(30), 400, 0.2)); 182

Table B.1: Remapping spatial control values

list of breakpoints for equally spaced output/ desired spatial position mapped control values 0.0 0.0 0.3 0.4 0.7 0.8 1.0 1.2 1.5 1.6 0.0 == 2.0 2.0 == 0.0

var pos = Shaper.kr(b.bufnum, inpos.wrap(0, 2)); PanAz.ar(a.size - 1, source, pos); }.play; )

Mixing multiple channel sources down to stereo: The Splay UGen mixes an array of channels down to 2 channels, at equal pan distances, with adjustable spread and center position. Internally, it uses a Pan2 UGen.

~snd3ch = { Ringz.ar(Dust.ar([1,1,1] * 30), [400, 550, 750], 0.2) }; ~snd3pan = { Splay.ar(~ snd3ch, spread: 0.8, level: 0.5, center: 0) }; ~snd3pan.playN(0);

Mixing multiple channel sources into a ring of speakers: The SplayZ UGen pans an array of source channels into a number of output channels at equal distances; spread and center position can be adjusted. Both larger numbers of channels can be splayed into rings of fewer speakers, and vice versa. Internally, SplayZ uses a PanAz UGen.

// spreading 4 channels equally into a ring of 6 speakers ~snd4ch = { Ringz.ar(Dust.ar([1,1,1,1] * 30), [400, 550, 750, 900], 0.2) }; ~snd4pan = { SplayZ.ar(6, ~ snd4ch, spread: 1.0, level: 0.5, center: 0) }; ~snd4pan.playN(0);

3D: The SonEnvir extension TorusPanAz does the same for setups with rings of rings of speakers. Again, the speaker setup should be as symmetrical as possible; compensation can be trickier here. (In general, even while compensations for less symmetrical se- tups seem mathematically possible, spatial images will be worse outside the sweet spot. Maximum attainable physical symmetry cannot be fully substituted by more DSP math.) 183

( // panning to 3 rings of 12, 8, and 4 speakers, cf. IEM CUBE. ~snd = { Ringz.ar(Dust.ar(30), 550, 0.2) }; ~toruspan = { var hAngle = MouseX.kr(0, 2); // all the way around (2 == 0) var vAngle = MouseY.kr(0, 1.333); // limited to highest ring TorusPanAz.ar([12, 8, 4], ~snd.ar(1), hAngle, vAngle ); }; ~toruspan.playN(0); )

Compensating overall vertical ring angles and individual horizontal speaker angles within each ring is straightforwrd with the asMapTable method as shown above. For placement deviations that are both horizontal and vertical, it is preferable to have Vector Based Amplitude Panning in SC3, which has been implemented recently by Scott Wilson and colleagues1. However, this was not needed within the context of the SonEnvir project.

B.1.3 Ambisonics

While some Ambisonics UGens previously existed in SuperCollider, the SonEnvir team decided to write a consistent new implementation of Ambisonics in SC3, based on a subset of the existing PureData libraries. This package was realised up to third order Ambisonics by Christopher Frauenberger for the AmbIEM package, available here2. It supports the main speaker setup of interest (a half-sphere of 12, 8 and 4 speakers, the CUBE at IEM, with several coefficent sets for different tradeoff choices), and for a setup with 1-4-7-4 speaker rings, mainly used as a more efficient lower resolution alternative for headphone rendering, as described below.

( // panning two sources with 3rd order ambisonics into CUBE sphere. ~snd0 = { Ringz.ar(Dust.ar(30), 400, 0.2) }; ~snd1 = { Ringz.ar(Dust.ar(30), 550, 0.2) }; ~pos0 = [0, 0.01]; // azimuth, elevation ~pos1 = [1, 0.01]; // azimuth, elevation

~encoded[0] = { PanAmbi30.ar( ~snd0.ar, ~pos0.kr(1, 0), ~pos0.kr(1, 1)) }; ~encoded[1] = { PanAmbi30.ar( ~snd1.ar, ~pos1.kr(1, 0), ~pos1.kr(1, 1)) };

1See http://scottwilson.ca/site/Software.html 2 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/ 184 decode24 = { DecodeAmbi3O.ar(~ encoded.ar, ’CUBE_basic’) }; decode24.play(0); )

B.1.4 Headphones

Ambisonics and Virtual Binaural Rendering

For complex changing scenes, the IEM has developed a very efficient approach for bin- aural rendering (Musil et al.(2005); Noisternig et al.(2003)): In effect, taking a virtual, symmetrical speaker setup (such as 1-4-7-4), and spatializing to that setup with Am- bisonics; then rendering these virtual speakers as point sources with their appropriate HRIRs, thus arriving at a binaural rendering. This provides the benefit that the Am- bisonic field can be rotated as a whole, which is really useful when head movements of the listener are tracked, and the binaural rendering is designed to compensate for them. Also, the known problems with Ambisonics when listeners move outside the sweet zone disappear; when one carries a setup of virtual speakers around one’s head, one is always right in the center of the sweet zone. This approach has been ported to SC3 by C. Frauenberger; its main use is in the Vir- tualRoom class, which simulates moving sources within a rectangular box-shaped room. This class has turned out to be very useful as a simple way to prepare both experiments and presentations for multi-speaker setups by relatively simple headphone simulation.

( // VirtualRoom example - adapted from help file. // preparation: reserve more memory for delay lines, and boot the server s.options.memSize_(8192 * 16) .numAudioBusChannels_(1024); s.boot; // make a proxyspace p = ProxySpace.push; // set the path for the folder with Kemar files. VirtualRoom.kemarPath = "KemarHRTF/"; ) ( // create a virtual room v = VirtualRoom.new; // and start its binaural rendering v.init; // set the room properties (reverberation time and gain, // hf damping on reverb and early reflections gain) v.revTime = 0.1; 185 v.revGain = 0.1; v.hfDamping = 0.5; v.refGain = 0.8; ) ( // set room dimension [x, y, z, x, y, z]: // a room 8m wide (y), 5m deep(x) and 5m high(z) // - nose is always along x v.room = [0, 0, 0, 5, 8, 5]; // make it play to hardware stereo outs v.out.play; // listener is listener position, a controlrate nodeproxy; // here movable by mouse. v.listener.source = { [ MouseY.kr(5,0), MouseX.kr(8,0), 1.6, 0] }; )

// add three sources to the scene ( // make three different sounds ~noisy = { Decay.ar(Impulse.ar(10, 2), 0.2) * PinkNoise.ar(1) }; ~ringy = { Ringz.ar(Dust.ar(10), [400, 600,950], [0.3, 0.2, 0.05]).sum }; ~dusty = { Dust.ar(400) }; ) // add the three sources to the virtual room: // source, name, xpos, ypos, zpos v.addSource( ~noisy, \noisy, 1, 2, 2.5); // bottom right corner v.addSource( ~ringy, \ringy, 1.5, 7, 2.5); // bottom left v.addSource( ~dusty, \dusty, 4, 5, 2.5); // top, left of center v.sources[\noisy].set(\xpos, 4, \ypos, 6, \zpos, 2); // set noisy position v.sources[\noisy].getKeysValues; // check its position values v.sources[\ringy].set(\xpos, 2.5, \ypos, 4, \zpos, 2);

// remove the sources v.removeSource(\noisy); v.removeSource(\ringy); v.removeSource(\dusty); v.free; // free the virtual room and its resources p.pop; // and clear and leave proxyspace

Among other things, the submissions for the ICAD 2006 concert3, described also in section 4.3) were rendered from 8 channels to binaural for the reviewers, and for the web

3 http://www.dcs.qmul.ac.uk/research/imc/icad2006/concert.php 186 documentation4. One can of course also spatialize sounds on the virtual speakers by any of the simpler panning strategies given above as well; this trades off easy rotation of the entire setup for better point source localisation. To support simple headtracking, C. Frauenberger also created the ARHeadTracker ap- plication, also available as a package from the SonEnvir website here5.

B.1.5 Handling speaker imperfections

All standard spatialisation techniques work best when speaker setups are as symmetrical and well-controlled as possible. While it may not always be feasible to adjust mechan- ical positions of speakers freely for very precise geometry, a number of factors can be measured and compensated for, and this is supported by several utility classes written in SuperCollider, which are part of the SonEnvir framework.

Latency

The Latency class plays a test signal for a given number of audio channels, and waits for the signals to arrive back at an audio input. The resulting list of measured per-channel latencies can be used to create compensating delay lines, e.g. in the SpeakerAdjust class described below.

// test 2 channels, max delay expected 0.2 sec, // take default server, mic is on AudioIn 1: Latency.test(2, 0.2, Server.default, 1); // stop measuring and post results Latency.stop;

// results are posted like this: // measured latencies: in samples: [ 1186.0, 1197.0 ] in seconds: [ 0.026893424036281, 0.027142857142857 ]

Spectralyzer

While inter-speaker latency differences are well-known and very often addressed, we have found another common problem to be more distracting for multichannel sonification: Each individual channel of the reproduction chain, from D/A converter to amplifier,

4 http://www.dcs.qmul.ac.uk/research/imc/icad2006/proceedings/concert/index.html 5 http://quarks.svn.sourceforge.net/viewvc/quarks/AmbIEM/ 187 cable, loudspeaker, and speaker mounting location in the room, can sound quite different. When changes in sound timbre can encode meaning, this is potentially really confusing! To address this, the Spectralyzer class allows for simple analysis of a test signal as played into a room, with optional smoothing over several measurements, and then tuning compensating equalizers by hand for reasonable similarity across all speaker channels. While this could be written to run automatically, we consider it more of an art than an engineering task; a more detailed EQ intervention will make the frequency response flatter, but may color the sound more by smearing its impulse behaviour. x = Spectralyzer.new; // make a new spectralyzer x.start; x.makeWindow; // start it, open its GUI x.listenTo({ PinkNoise.ar }); // pink noise should look flat x.listenTo({ AudioIn.ar(1)}); // should look similar from microphone.

Figure B.1: The Spectralyzer GUI window. For full details see the Spectralyzer help file.

( // tuning 2 speakers for better linearity p = ProxySpace.push; ~noyz = { PinkNoise.ar(1) }; // create a noise source ~noyz.play(0, vol: 0.5);

// filter it with two bands of parametric eq ~noyz.filter(5, { |in, f1=100,rq1=1,db1=0,f2=5000,rq2=1,db2=0| MidEQ.ar(MidEQ.ar(in, f1, rq1, db1), f2, rq2, db2); }); ) // tweak the two bands for better acoustic linearity ~noyz.set(\f1, 1200, \rq1, 1, \db1, -5); // take out low presence bump ~noyz.set(\f2, 150, \rq2, 0.6, \db2, 3); // boost bass dip 188

~noyz.getKeysValues.drop(1).postcs; // post settings when done

// move on to speaker 2 ~noyz.play(1, vol: 0.5); // tweak the two bands again for speaker 2 ~noyz.set(\f1, 1200, \rq1, 1, \db1, 0); // likely to be different ... ~noyz.set(\f2, 150, \rq2, 0.6, \db2, 0); // from speaker 1. ~noyz.getKeysValues.drop(1).postcs; // post settings.

SpeakerAdjust

Once one has achieved usable EQ curves for every speaker channel, one can begin to compensate for volume differences between channels (with big timbral differences between channels, measuring volume or adjusting it by listening is rather pointless). The SpeakerAdjust class expects simple specifications for each channel: amplitude (as multiplication factor, typically below 1.0), optionally: delaytime (in seconds, to be independent of the current samplerate), optionally: eq1-frequency, eq1-gain, eq1-relative-bandwidth, optionally: eq2-frequency, eq2-gain, eq2-relative-bandwidth, and repeat for as many bands as desired.

// From SpeakerAdjust.help: // adjustment for 2 channels, amp, dtime, eq specs; // you can add as many triplets of eqspecs as you want. ( var specs; specs = [ // amp, dtime, eq1: frq, db, rq; eq2: frq, db, amp [ 0.75, 0.0, [ 250, 4, 0.5], [ 800, -4, 1]], [ 1, 0.001, [ 250, 2, 0.5], [ 5000, 3, 1]] ];

{ var ins; ins = Pan2.ar(PinkNoise.ar(0.05), MouseX.kr(-1, 1)); SpeakerAdjust.ar(ins, specs) }.play; )

Such a speaker adjustment can be created and added to the end of the signal chain to linearise the given speaker setup as much as possible; of course, adding limiters for speaker and listener protection can be built into such a master effects unit as well. Appendix C

Physics Background

C.1 Constituent Quark Models

The concept of constituent quarks was introduced in the 1960s by Gell-Mann(1964) and Zweig(1964), based on symmetry considerations in the classification of hadrons, the strongly interacting elementary particles. The first CQMs for the description of hadron spectra were introduced in the early 1970s by de R´ujulaet al.(1975). The original CQMs relied on simple models for the confinement of constituent quarks (such as the harmonic oscillator potential) and employed rudimentary hyperfine interactions. Furthermore they were set up in a completely nonrelativistic framework. In the meantime CQMs have undergone a vivid development. Over the years more and more notions deriving from QCD have been implemented, and CQMs are constructed within a relativistic formalism. Modern CQMs all use a confinement potential of linear form, as suggested by QCD. For the hyperfine interaction of the constituent quarks several competing dynamical concepts have been proposed: A prominent representative is the one-gluon-exchange (OGE) CQM, whose dynamics for the hyperfine interaction basically relies on the original ideas of Zweig(1964): the effective interaction between the constituent quarks is generated by the exchange of a single gluon. For the data we experimented with, we considered a relativistic variant of the OGE CQM as constructed by Theussl et al.(2001). A different approach is followed by the so-called instanton-induced (II) CQM (Loering et al.(2001)), whose hyperfine forces derive from the ’t Hooft interaction. Several years ago the physics group at Graz University has suggested a hyperfine interaction based on the exchange of Goldstone bosons. This type of dynamics is motivated by the spontaneous breaking of chiral symmetry (SBχS), which is an essential property of QCD at low energies. The SBχS is considered to be responsible for the quarks to acquire a (heavier) dynamical mass, and their interaction should then be generated by the exchange of Goldstone bosons, the latter being another consequence of SBχS. The Goldstone-boson-exchange (GBE) CQM was originally suggested in a simplified version, based on the exchange of pseudoscalar bosons only (Glozman et al.(1998)). In the meantime an extended version

189 190 has been formulated by Glantschnig et al.(2005).

Quantum-Mechanical Solution of Constituent Quark Models

Modern CQMs are constructed in the framework of relativistic quantum mechanics (RQM). They are characterised by a Hamiltonian operator H that represents the to- tal energy of the system under consideration. For baryons, which are considered as bound states of three constituent quarks, the corresponding Hamiltonian reads

X H = H0 + [Vconf (i, j) + Vhf (i, j)] , (C.1) i

The first term on the right-hand side denotes the relativistic kinetic energy of the sys- tem (of the three constituent quarks), and the sum includes all mutual quark-quark interactions. It consists of two parts, the confinement potential Vconf and the hyperfine interaction Vhf . The confinement potential prevents the constituent quarks from escap- ing the volume of the baryon (being of the order of 10−15 m); no free quarks have ever been observed in nature. The hyperfine potential provides for the fine structure of the energy levels in the baryon spectra. Different dynamical models lead to distinct features in the excitation spectra of baryons. In order to produce the baryon spectra of the CQMs one has to solve the eigenvalue problem of the Hamiltonian in equation C.1. Several methods are available to achieve solutions to any desired accuracy. The Graz group has applied both integral-equation (Krassnigg et al.(2000)) as well as differential-equation techniques (Suzuki and Varga (1998)). Upon solving the eigenvalue problem of the Hamiltonian one ends up with the eigenvalues (energy levels) and eigenstates (quantum-mechanical wave functions) of the baryons. They are characterised according to the conserved quantum numbers, the total angular momentum J (which is half integer in the case of baryons) and the parity P (being positive or negative). The different baryons are distinguished by the ’flavor’ of their constituent quarks, which can be u, d, and s (for ’up’, ’down’, and ’strange’). For example, the proton is uud, the neutron is udd, the ∆++ is uuu, and the Σ0 is uds.

Classification of Baryons

The total baryon wave function ΨXSFC is composed of spatial (X), spin (S), flavor (F ), and color (C) degrees of freedom corresponding to the product of symmetry spaces

singlet ΨXSFC = ΨXSF ΨC , (C.2) 191

It is antisymmetric under the exchange of any two particles, since baryons must obey Fermi statistics. There are several visual representations of the symmetries between the different baryons based on their combinations of quarks; figure C.1 shows one of them.

Figure C.1: Multiplet structure of the baryons as a decuplet. In this ordering of baryon flavor symmetries, all the light and strange baryons are in the lowest layer.

Quarks are differentiated by the following properties:

Color The color quantum numbers are r, b, g (for ’red’, ’blue’, and ’green’). Only white baryons are observed in experiment. Thus the color wave function corresponds to a color singlet state and is therefore completely antisymmetric. As a consequence the rest of the wave function (comprising spatial, spin, and flavor degrees of freedom) must be symmetric.

Flavor According to the Standard Model (SM) of particle physics there are six quark flavors: up, down, strange, charm, bottom, and top. Quarks of different flavours have different masses. Normal hadronic matter (i.e. atomic nuclei) is basically composed only of the so-called light flavors u and d. CQMs consider hadrons with flavors u, d, and s. These are also the ones that are most affected by the SBχS.

Correspondingly, one works in SU(3)F and deals with baryons classified within singlet, octet, and decuplet multiplets. For example, the nucleons (proton and neutron) are in an octet, together with the Λ, Σ, and Ξ particles.

1 Spin All quarks have spin 2 . The spin wave function of the three quarks is constructed 192

within SU(2)S and is thus symmetric or mixed symmetric or mixed antisymmetric. The total spin of a baryon is denoted by S.

Orbital Angular Momentum and Parity The spatial wave function corresponds to a given orbital angular momentum L of the three-quark system. Its symmetry property under spatial reflections determines the parity P .

Total Angular Momentum The total angular momentum J is composed of the total orbital angular momentum L and the total spin S of the three-quark system accord- ing to the quantum-mechanical addition rules of angular momenta: J = L + S. It is always half-integer. The total angular momentum J is a conserved quan- tum number and, together with the parity P , serves for the distinction of baryon multiplets J P .

C.2 Potts model- theoretical background

In mathematical terms, the Hamilton-function H defines the overall energy, which any physical system, and thus also a Potts model, will try to minimize:

X X H = −J SiSj − M Si (C.3) i where J is the coupling parameter between spin Si and its neighbouring spin Sj. J is inversely proportional to the temperature; M is the field strength of an exterior magnetic field acting on each spin Si. The first sum is denoted over nearest neighbours and describes the coupling term. It is responsible for the phase transition. If J = 0, only the second term remains, and the Hamiltonian describes a paramagnet, being only magnetised in the presence of an exterior magnetic field. In our simulations, M was always 0. When studying phase transitions macroscopically, the defining term is the free energy F .

F (T,H) = −kBT lnZ(T,H) (C.4)

It is proportional to the logarithm of the so-called partition function Z of statistical physics, which sums up all possible spin configurations and weights them with a Boltz- mann factor kB. Energetically unfavorable states are less probable in the partition func- tion than energetically favorable ones.

X − H Z = e kB T (C.5)

Sn 193

The partition function Z (eq. C.5) is not calculable in practice due to combinatorial explosion: a three dimensional lattice with a length of 100 and two possible spin states has 21003 = (210)105 ∼ 10300.000 configurations that would have to be summed up - at every time step of the simulation. Also in analytical deduction only few spin models have been solved exactly, and in three dimensions not even the simple Ising model is analytically solvable. Therefore classical treatment relies mainly on approximation methods, which allow partly to estimate critical exponents, and can be outlined briefly as follows: Early theories addressing phase transitions, like Van der Waals theory of fluids and Weiss theory of magnetism can be subsumed under Landau theory or mean-field theory. Mean- field theory assumes a mean value for the free energy. Landau derived a theory, where the free energy is expanded as a power series in the order parameter, and only terms are included which are compatible with the symmetry of the system. The problem is that all of these approaches ignore fluctuations by relying only on mean values. (For a detailed review of phase transition theories please refer to Yeomans(1992).) Renormalization group theory by K. G. Wilson Wilson(1974) solved many problems of critical phenomena, most importantly the understanding of why continuous phase transitions fall into universality classes. The basic idea is to do a transformation that changes the scale of the system but not its partition function. Only at the critical point the properties of the system will not change under such a transformation, and it is then described by so-called fixed points in the parameter space of all Hamiltonians. This is why critical exponents are universal for different systems.

C.2.1 Spin models sound examples

The following audio files can be downloaded from http://sonenvir.at/downloads/spinmodels/.

The first part describes sonifications that enable the listener to classify the phase of the model (sub-critical, critical, super-critical).

Granular sonifications: Random, averaged spin blocks were used to determine the sound grains. The spatial setting cannot be reproduced in this recording. But even without having a clear gestalt of the system, the different characteristics of IsingHot, IsingCritical and IsingCold may easily be distinguished.

Audification approaches: (Please consider that a few clicks in the audio files below are artifacts of the data management and buffering in the computer.)

1. Noise: NoiseA gives the audification of a 3-state Potts model at thermal noise (coupling J = 0.4) 194

NoiseB gives the same for the 5-state Potts model (J = 0.4), evidently the sound becomes smoother the more states are possible, but its overall character stays the same. 2. Critical behaviour: this example was recorded with a 4-state Potts model at and near the critical temperature: SuperCritical - near the critical point clusters emerge. These are rather big but homogeneous, hence a regularity is still perceivable. (J = 0.95) Critical - at the critical point itself, clusters of all orders of magnitude emerge, thus the sound is much more unstable and less pleasant. (J = 1.05) 3. SubCritical - as soon as the system is equilibrated in the subcritical domain

(at T < Tcrit), one spin orientation predominates, and only few random spin flips occur due to thermal fluctuations. (Recorded with the Ising model at J = 1.3.)

The next examples study the order of the phase transition.

Direct audification displays only a very subtle differences between the two types of phase transitions:

1. The 4-state Potts model is played in ContinousTransition. 2. A more sudden change can be perceived in FirstOrderTransition for the 5- state Potts model.

Audification with separate spin channels: For each spin-orientation the lattice is se- quentialised and the resulting audification is played on an own channel. The lattice size was 32x32, and the system was equilibrated at each step. The examples finish with one spin orientation prevailing, which means that only random clicks from a non-vanishing temperature remain.

1. The transition in the 2-state Ising model and the 4-state Potts model are continuous, the change is smooth. 2. In the 5-state and 8-state models the phase transition is abrupt (the data is more distinct the more states are involved). Appendix D

Science By Ear participants

The following people took part in the Science By Ear workshop:

SonEnvir members/moderators

Daye, Christian De Campo, Alberto Eckel, Gerhard Frauenberger, Christopher Vogt, Katharina Wallisch, Annette

Programming specialists

Bovermann, Till, Neuroinformatics Group, Bielefeld University De Campo, Alberto Frauenberger, Christopher Pauletto, Sandra, Music Technology Group, York University Musil, Thomas, Audio/DSP, Institute of Electronic Music (IEM) Graz Rohrhuber, Julian, Academy of Media Arts (KHM) Cologne

Sonification experts

Baier, Gerold, Dynamical systems, University of Morelos, Mexico Bonebright, Terri, Psychology/Perception, DePauw University Bovermann, Till Dombois, Florian, Transdisciplinarity, Y Institute, Arts Academy Berne Hermann, Thomas, Neuroinformatics Group, Bielefeld University

195 196

Kramer, Gregory, Metta Organization Pauletto, Sandra Stockman, Tony, Computer Science, Queen Mary Univ. London

Domain scientists

Baier, Gerold Dombois, Florian Egger de Campo, Marianne, Sociology, Compass Graz Fickert, Lothar, Electrical power systems, University of Technology (TU) Graz Grond, Florian, Chemistry / media art, ZKM Karlsruhe Grossegger, Dieter, EEG Software, NeuroSpeed Vienna Hipp, Walter, Electrical power systems, TU Graz Huber, Anton, Physical Chemistry, University of Graz Markum, Harald, Atomic Institute of the Austrian Universities, TU Vienna Plessas, Willibald, Physics Institute, University of Graz Shutin, Dimitri, Electrical power systems, TU Graz Schweitzer, Susanne, Wegener Center for Climate and Global Change, University of Graz Witrisal, Klaus, Electrical power systems, TU Graz Appendix E

Background on ’Navegar’

The saying has a long history. Plutarch ascribes it to General Pompeius saying this line to soldiers he sent off on a suicide mission, and Veloso may well have read it in a famous poem by Fernando Pessoa. Here are Veloso’s lyrics:

Table E.1: Os Argonautas - Caetano Veloso

O barco, meu cora¸c˜aon˜aoaguenta the ship, my heart cannot handle it Tanta tormenta, alegria so much torment, happiness Meu cora¸c˜aon˜aocontenta my heart is discontent O dia, o marco, meu cora¸c˜ao,o porto, n˜ao the day, the limit, my heart, the port, no Navegar ´epreciso, viver n˜ao´epreciso sea-faring is necessary, living is not

O barco, noite no c´eut˜aobonito the ship, night in the beautiful sky Sorriso solto perdido the free smile, lost Horizonte, madrugada horizon, morning dawn O riso, o arco, da madrugada the laugh, the arc, of morning O porto, nada the port, nothing Navegar ´epreciso, viver n˜ao´epreciso sea-faring is necessary, living is not

O barco, o autom´ovelbrilhante the ship, the brilliant automobile O trilho solto, o barulho the free track, the noise Do meu dente em tua veia of my tooth in your vein O sangue, o charco, barulho lento the blood, the swamp, slow soft noise O porto silˆencio the port - silence Navegar ´epreciso, viver n˜ao´epreciso sea-faring is necessary, living is not

(Literal English translation: Alberto de Campo.)

197 Appendix F

Sound, meaning, language

Sounds can change their meanings in different contexts. This ambiguity has also been interesting for poetry, as this work by Ernst Jandl shows.

Ernst Jandl - Oberfl¨achen¨ubersetzung (’Surface Translation’) mai hart lieb zapfen eibe hold er rennbohr in sees kai. so was sieht wenn mai l¨auftbegehen, so es sieht nahe emma m¨ahen, so biet wenn ¨arschelgrollt ohr leck mit ei! seht steil dies fader rosse m¨ahen, in teig kurt wisch mai desto bier baum deutsche deutsch bajonett schur alp eiertier. Original poem by William Wordsworth My heart leaps up when I behold a rainbow in the sky. so was ist when my life began, so is it now I am a man, so be it when I shall grow old or let me die! The child is father of the man and I could wish my days to be bound each to each by natural piety.

198 Bibliography

Abbott, A. (1990). A Primer on Sequence Methods. Organization Sci- ence, 1(4):375–392.

Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas. Annual Review of Sociology, 21:93–113.

Anderson, M. L. (2003). Embodied cognition: A field guide. Artificial Intelligence, 149(1):91–130.

Anonymous (March 20, 2001). L’histoire: PDG surpay´es. Lib´eration.

Armstrong, N. (2006). An Enactive Approach to Digital Musical Instru- ment Design. PhD thesis, Princeton University.

Baier, G. and Hermann, T. (2004). The Sonification of Rhythms in Human Electroencephalogram. In Proc. Int. Conf. on Auditory Display (ICAD), Sydney, Australia.

Baier, G., Hermann, T., Sahle, S., and Ritter, H. (2006). Sonified Epilec- tic Rhythms. In Proc. Int Conf. on Auditory Display (ICAD), London, UK.

Baier, G., Hermann, T., and Stephani, U. (2007). Event-based sonifica- tion of EEG rhythms in real time. Clinical Neurophysiology, 118(6).

Barnes, J. (2007). ”The Odd Couple”. Review of ”That Sweet Enemy: The French and the British from the Sun King to the Present” by Robert and Isabelle Tombs. New York Review of Books, LIV(5):4–9.

Barrass, S. (1997). Auditory Information Design. PhD thesis, Australian National University.

Barrass, S. and Adcock, x. (2004). Sonification Design Patterns. In Proc. Int. Conf. on Auditory Display (ICAD), Sydney, Australia.

199 200

Barrass, S., Whitelaw, M., and Bailes, F. (2006). Listening to the Mind Listening: An Analysis of Sonification Reviews, Designs and Corre- spondences. Leonardo Music Journal, 16:13–19.

Barrass, T. (2006). Description of Sonification for ICAD 2006 Concert: Life Expectancy. In Proc. Int Conf. on Auditory Display (ICAD), Lon- don, UK.

Beck, U. (1992). Risk Society: Towards a New Modernity. Sage, New Delhi.

Ben-Tal, O., Berger, J., Cook, B., Daniels, M., Scavone, G., and Cook, P. (2002). SONART: The Sonification Application Research Toolbox. In Proc. ICAD, Kyoto, Japan.

Blauert, J. (1997). Spatial Hearing: The Psychophysics of Human Hear- ing. MIT Press.

Blossfeld, H.-P., Hamerle, A., and Mayer, K. U. (1986). Ereignisanalyse. Statistische Theorie und Anwendung in den Wirtschafts- und Sozial- wissenschaften. Campus, Frankfurt.

Blossfeld, H.-P. and Rohwer, G. (1995). Techniques of event history modeling. New approaches to causal analysis. Lawrence Erlbaum As- sociates, Mahwah (N. J.).

Borges, J. L. (1980). The analytical language of john wilkins. In Labyrinths. Penguin.

Boulanger, R. (2000). The Csound Book: Perspectives in Software Synthesis, Sound Design, Signal Processing, and Programming. MIT Press, Cambridge, MA, USA.

Bovermann, T. (2005). MBS-Sonogram. http://www.techfak. uni-bielefeld.de/~tboverma/sc/. Bovermann, T., de Campo, A., Groten, J., and Eckel, G. (2007). Jug- gling Sounds. In Proceedings of Interactive Sonification Workshop ISon2007.

Bovermann, T., Hermann, T., and Ritter, H. (2006). Tangible data scanning sonification model. In Proc. of the International Conference on Auditory Display, London, UK.

Bregman, A. S. (1990). Auditory Scene Analysis. Bradford Books, MIT Press, Cambridge, MA. 201

Bruce, J. and Palmer, N. (2005). SIFT: Sonification Integrable Flexible Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Limerick, Ireland.

Buxton, Bill with Billinghurst, M., Guiard, Y., Sellen, A., and Zhai, S. (2008). Human Input to Computer Systems: Theo- ries, Techniques and Technology. http://www.billbuxton.com/ inputManuscript.html.

Candey, R., Schertenleib, A., and Diaz Merced, W. (2006). xSonify: Sonification Tool for Space Physics. In Proc. Int Conf. on Auditory Display (ICAD), London, UK.

Conner, C. D. (2005). A People’s History of Science: Miners, Midwives and ”Low Mechanicks”. Nation Books, New York, NY, USA.

Cooper, D. H. and Shiga, T. (1972). Discrete-Matrix Multichannel Stereo. J. Audio Eng. Soc., 20:344–360.

Cruz-Neira, C., Sandin, D. J., DeFanti, T. A., Kenyon, R. V., and Hart, J. C. (1992). The CAVE: Audio Visual Experience Automatic Virtual Environment. Commun. ACM, 35(6):64–72.

Day´e,C. and de Campo, A. (2006). Sounds sequential: Sonification in the Social Sciences. Interdisciplinary Science Reviews, 31(6):349–364.

Day´e,C., de Campo, A., and Egger de Campo, M. (2006). Sonifikationen in der wissenschaftlichen Datenanalyse. Angewandte Sozialforschung, 24(1/2):41–56.

Day´e,C., de Campo, A., Fleck, C., Frauenberger, C., and Edelmayer, G. (2005). Sonification as a tool to reconstruct user’s actions in unob- servable areas. In Proceedings of ICAD 2005, Limerick. de Campo, A. (2007a). A Sonification Design Space Map. In Proceedings of Interactive Sonification Workshop ISon2007. de Campo, A. (2007b). Toward a Sonification Design Space Map. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. de Campo, A. and Day´e,C. (2006). Navegar E´ Preciso, Viver N˜ao E´ Preciso. In Proc. Int. Conf. on Auditory Display (ICAD), London, UK. de Campo, A. and Egger de Campo, M. (1999). Sonification of So- cial Data. In Proceedings of the 1999 International Computer Music Conference (ICMC) Beijing. 202 de Campo, A., Frauenberger, C., and H¨oldrich,R. (2004). Designing a Generalized Sonification Environment. In Proceedings of the ICAD 2004, Sydney. de Campo, A., Frauenberger, C., and H¨oldrich,R. (2005a). Sonenvir - a progress report. In Proc. Int. Computer Music Conf. (ICMC), Barcelona, Spain. de Campo, A., Frauenberger, C., Vogt, K., Wallisch, A., and Day´e, C. (2006a). Sonification as an Interdisciplinary Working Process. In Proceedings of ICAD 2006, London. de Campo, A., H¨ormann,N., M., H., P., and W., Vogt, K. (2006b). Soni- fication of lattice data: Dirac spectrum and monopole condensation along the deconfinement transition. In Proceedings of the Minicon- ference in honor of Adriano Di Giacomo on the Sense of Beauty in Physics, Pisa, Italy. de Campo, A., H¨ormann,N., Markum, H., Plessas, W., and Sengl, B. (2005b). Sonification of Lattice Data: The Spectrum of the Dirac Operator Across the Deconfinement Transition. In Proc. XXIIIrd Int. Symposium on Lattice Field Theory, Trinity College, Dublin, Ireland. de Campo, A., H¨ormann,N., Markum, H., Plessas, W., and Sengl, B. (2005c). Sonification of Lattice Observables Across Phase Transitions. In International Workshop on Xtreme QCD, Swansea. de Campo, A., H¨ormann,N., Markum, H., Plessas, W., and Vogt, K. (2006c). Sonification of Monopoles and Chaos in QCD. In Proc. of ICHEP’06 - the XXXIIIrd International Conference on High Energy Physics, Moscow, Russia. de Campo, A., Sengl, B., Frauenberger, C., Melde, T., Plessas, W., and H¨oldrich,R. (2005d). Sonification of Quantum Spectra. In Proc. Int Conf. on Auditory Display (ICAD), Limerick, Ireland. de Campo, A., Wallisch, A., H¨oldrich,R., and Eckel, G. (2007). New Sonification Tools for EEG Data Screening and Monitoring. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. de R´ujula,A., Georgi, H., and Glashow, S. L. (1975). Hadron masses in a gauge theory. Phys. Rev., D12(147).

Dix, A. (1996). Closing the loop: Modelling action, perception and in- formation. In Catarci, T., Costabile, M. F., Levialdi, S., and Santucci, 203

G., editors, AVI’96 - Advanced Visual Interfaces, pages 20–28. ACM Press.

Dix, A., Finlay, J., Abowd, G., and Beale, R. (2004). Human-Computer Interaction. Prentice Hall, Harlow, 3rd edition.

Dombois, F. (2001). Using Audification in Planetary Seismology. In Proc. Int Conf. on Auditory Display (ICAD), Espoo, Finland.

Drake, S. (1980). Galileo. Oxford University Press, New York.

Drori, G. S., Meyer, J. W., Ramirez, F. O., and Schofer, E. (2003). Science in the Modern World Polity: Institutionalization and Global- ization. Stanford University Press, Stanford.

Ebe, M. and Homma, I. (2002). Leitfaden f¨urdie EEG-Praxis. Urban und Fischer bei Elsevier, 3rd edition.

Eidelman, S. e. a. (2004). Review of Particle Physics. Phys. Lett., B592(1).

Fickert, L., Eckel, G., Nagler, W., de Campo, A., and Schmautzer, E. (2006). New developments of teaching concepts in multimedia learning for electrical power systems introducing sonification. In Proceedings of the 29th ICT International Convention MIPRO, Opatija, Croatia.

Fitch, T. and Kramer, G. (1994). Sonifying the Body Electric: Superi- ority of an Auditory over a Visual Display in a Complex Multivariate System. In Kramer, G., editor, Auditory Display. Addison-Wesley.

Frauenberger, C., de Campo, A., and Eckel, G. (2007). Analysing time series data. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada.

Gardner, B. and Martin, K. (1994). Hrtf measurements of a kemar dummy-head microphone. online.

Gaver, W. W., Smith, R. B., and O’Shea., T. (1991). Effective Sounds in Complex Systems: The ARKola Simulation. In Proceedings of CHI ’91, New Orleans, USA.

Gell-Mann, M. (1964). A Schematic Model of Baryons and Mesons. Phys. Lett., 8:214.

Gerzon, M. (1977a). Multi-System Ambisonic Decoder, Part 1: Basic Design Philosophy. Wireless World, 83(1499):43–47. 204

Gerzon, M. (1977b). Multi-System Ambisonic Decoder, Part 2: Main Decoder Circuits. Wireless World, 83(1500):69–73.

Ghazala, R. (2005). Circuit-Bending: Build Your Own Alien Instruments. Wiley, Hoboken, NJ.

Giddens, A. (1990). The Consequences of Modernity. Stanford University Press.

Giddens, A. (1999). Runaway World. A series of lectures on globalisa- tion for the BBC. http://news.bbc.co.uk/hi/english/static/ events/reith_99/.

Glantschnig, K., Kainhofer, R., Plessas, W., Sengl, B., and Wagenbrunn, R. F. (2005). Extended Goldstone-boson-exchange Constituent Quark Model. Eur. Phys. J. A.

Glaser, B. and Strauss, A. (1967). The Discovery of Grounded Theory. Aldine.

Glozman, L., Papp, Z., Plessas, W., Varga, K., and Wagenbrunn, R. F. (1998). Unified Description of Light- and Strange-Baryon Spectra. Phys. Rev., D58(094030).

Goodrick, M. (1987). The Advancing Guitarist. Hal Leonard.

GSL Team (2007). Gnu scientific library. http://www.gnu.org/ software/gsl/manual/gsl-ref.html.

Harrar, L. and Stockman, T. (2007). Designing Auditory Graph Overviews. In Proceedings of ICAD 2007, pages 306–311. McGill University.

Hayward, C. (1994). Listening to the Earth Sing. In Kramer, G., edi- tor, Auditory Display, pages 369–404. Addison-Wesley, Reading, MA, USA.

Hermann, T. (2002). Sonification for Exploratory Data Analysis. PhD thesis, Bielefeld University, Bielefeld, Germany.

Hermann, T., Baier, G., Stephani, U., and Ritter, H. (2006). Vocal Sonification of Pathologic EEG Features. In Proceedings of ICAD 2006, London.

Hermann, T. and Hunt, A. (2005). Introduction to Interactive Sonifica- tion. IEEE Multimedia, Special Issue on Sonification, 12(2):20–24. 205

Hermann, T., N¨olker, C., and Ritter, H. (2002). Hand postures for sonification control. In Wachsmuth, I. and Sowa, T., editors, Gesture and Sign Language in Human-Computer Interaction, Proc. Int. Gesture Workshop GW2001, pages 307–316. Springer.

Hermann, T. and Ritter, H. (1999). Listen to your Data: Model-Based Sonification for Data Analysis. In Advances in intelligent computing and multimedia systems, pages 189–194, Baden-Baden, Germany. Int. Inst. for Advanced Studies in System research and cybernetics.

Hinterberger, T. and Baier, G. (2005). POSER: Parametric Orchestral Sonification of EEG in Real-Time for the Self-Regulation of Brain States. IEEE Multimedia, Special Issue on Sonification, 12(2):70–79.

Hollander, A. (1994). An Exploration of Virtual Auditory Shape Percep- tion. Master’s thesis, Univ. of Washington.

Hunt, A. and Pauletto, S. (2006). The Sonification of EMG data. In Pro- ceedings of the International Conference on Auditory Display (ICAD), London, UK.

Hunt, A. D., Paradis, M., and Wanderley, M. (2003). The importance of parameter mapping in electronic instrument design. Journal of New Music Research, 32(4):429–440.

Igoe, T. (2007). Making Things Talk. Practical Methods for Connecting Physical Objects. O’Reilly.

Jord`aPuig, S. (2005). Digital Lutherie. Crafting musical computers for new musics’ performance and improvisation. PhD thesis, Departament de Tecnologia, Universitat Pompeu Fabra.

Joseph, A. J. and Lodha, S. K. (2002). MUSART: Musical Audio Trans- fer Function Real-time Toolkit. In Proc. Int. Conf. on Auditory Display (ICAD), Kyoto, Japan.

Kramer, G. (1994a). An Introduction to Auditory Display. In Kramer, G., editor, Auditory Display: Sonification, Audification, and Auditory Interfaces, chapter Introduction. Addison-Wesley.

Kramer, G., editor (1994b). Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley, Reading, Menlo Park.

Krassnigg, A., Papp, Z., and Plessas, W. (2000). Faddeev Approach to Confined Three-Quark Problems. Phys. Rev., C(62):044004. 206

Latour, B. and Woolgar, S. (1986). Laboratory Life: The Construction of Scientific Facts. Princeton University Press, Princeton, NJ, (Revised edition with an introduction by Jonas Salk and a new postscript by the authors.) edition.

Leman, M. (2006). The State of Music Perception Research. Talk at ’Connecting Media’ conference, Hamburg.

Leman, M. and Camurri, A. (2006). Understanding musical expressive- ness using interactive multimedia platforms. Musicae Scientiae, special issue.

Lodha, S. K., Beahan, J., Heppe, T., Joseph, A., and Zane-Ulman, B. (1997). MUSE: A Musical Data Sonification Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Palo Alto, CA, USA.

Loering, U., Metsch, B. C., and Petry, H. R. (2001). The light baryon spectrum in a relativistic quark model with instanton-induced quark forces: The non-strange baryon spectrum and ground-states. Eur. Phys. J., A10:395.

Madhyastha, T. (1992). Porsonify: A Portable System for Data Sonifi- cation. Master’s thesis, University of Illinois at Urbana-Champaign.

Malham, D. G. (1999). Higher Order Ambisonic Systems for the Spa- tialisation of Sound. In Proceedings of the ICMC, Beijing, China.

Marsaglia, G. (2003). DIEHARD: A Battery of Tests for Random Number Generators. http://www.csis.hku.hk/ diehard/.

Mathews, M. and Miller, J. (1963). Music IV programmer’s manual. Bell Telephone Laboratories, Murray Hill, NJ, USA.

Mayer-Kress, G. (1994). Sonification of Multiple Electrode Human Scalp Electroencephalogram. Poster presentation demo at ICAD ’94, http: //www.ccsr.uiuc.edu/People/gmk/Projects/EEGSound/.

McCartney, J. (2003-2007). SuperCollider3. http://supercollider. sourceforge.net.

McKusick, V. A., Sharpe, W. D., and Warner, A. O. (1957). Harvey Tercentenary: An Exhibition on the History of Cardiovascular Sound Including the Evolution of the Stethoscope. Bulletin of the History of Medicine, 31:p.463–487. 207

Meinicke, P., Hermann, T., Bekel, H., M¨uller,H. M., Weiss, S., and Ritter, H. (2002). Identification of Discriminative Features in EEG. Journal for Intelligent Data Analysis.

Milczynski, M., Hermann, T., Bovermann, T., and Ritter, H. (2006). A malleable device with applications to sonification-based data explo- ration. In Proc. of the International Conference on Auditory Display, London, UK.

Moore, B. C. (2004). An Introduction to the Psychology of Hearing. Elsevier, fifth edition.

Musil, T., Noisternig, M., and H¨oldrich,R. (2005). A Library for Realtime 3D Binaural Sound Reproduction in Pure Data (PD). In Proc. Int. Conf. on Digital Audio Effects (DAFX-05), Madrid, Spain.

Neuhoff, J. (2004). Ecological Psychoacoustics. Springer.

Noisternig, M., Musil, T., Sontacchi, A., and H¨oldrich,R. (June, 2003). A 3D Ambisonic based Binaural Sound Reproduction System. In Proc. Int. Conf. Audio Eng. Soc., Banff, Canada.

P. Fronczak, A. Fronczak, J. A. H. (2006). Ferromagnetic fluid as a model of social impact. International Journal of Modern Physics, 17(8):1227–1235.

Panek, P., Day´e,C., Edelmayer, G., and et al. (2005). Real Life Test with a Friendly Rest Room (FRR) Toilet Prototype in a Daye Care Center in Vienna – An Interim Report. In Proc. 8th European Conference for the Advancement of Assistive Technologies in Europe, Lille.

Pauletto, S. (2007). Interactive non-speech auditory display of multivari- ate data. PhD thesis, University of York.

Pauletto, S. and Hunt, A. (2004). A Toolkit for Interactive Sonification. In Proceedings of ICAD 2004, Sydney.

Pelling, A. E., Sehati, S., Gralla, E. B., Valentine, J. S., and Gimzewski, J. K. (2004). Local Nanomechanical Motion of the Cell Wall of Sac- charomyces cerevisiae. Science, 305(5687):1147–1150.

Pereverzev, S. V., Loshak, A., Backhaus, S., Davies, J., and Packard, R. E. (1997). Quantum Oscillations between two weakly coupled reser- voirs of superfluid 3He. Nature, 388:449–451. 208

Pich´e,J. and Burton, A. (1998). Cecilia: A Production Interface to Csound. Computer Music Journal, 22(2):52–55.

Pigafetta, A. (1530). Primo Viaggio Intorno al Globo Terracqueo (First Voyage Around the Terraqueous World). Giuseppe Galeazzi, Milano.

Pigafetta, A. (2001). Mit Magellan um die Erde. (Magellan’s Voyage: A Narrative Account of the First Circumnavigation). Edition Erdmann, Lenningen, Germany. (First edition Paris 1525.).

Potard, G. (2006). Guernica 2006: Sonification of 2006 Years of War and World Population Data. In Proc. Int Conf. on Auditory Display (ICAD), London, UK.

Pulkki, V. (2001). Spatial Sound Generation and Perception by Ampli- tude Panning. PhD thesis, Helsinki University of Technology, Espoo.

Raskin, J. (2000). The Humane Interface. Addison-Wesley.

Rheinberger, H.-J. (2006). Experimentalsysteme und Epistemische Dinge (Experimental Systems and Epistemic Things). Suhrkamp, Germany.

Riess, F., Heering, P., and Nawrath, D. (2005). Reconstructing Galileo’s Inclined Plane Experiments for Teaching Purposes. In Proc. of the In- ternational History, Philosophy, Sociology and Science Teaching Con- ference, Leeds, UK.

Roads, C. (2002). Microsound. MIT Press.

Rohrhuber, J. (2006). Terra Nullius. In Proc. Int Conf. on Auditory Display (ICAD), London, UK.

Rohrhuber, J., de Campo, A., and Wieser, R. (2005). Algorithms To- day - Notes on Language Design for Just In Time Programming. In Proceedings of the ICMC 2005, Barcelona.

Ryan, J. (1991). Some Remarks on Musical Instrument Design at STEIM. Contemporary Music Review, 6(1):3–17. Also available online: http://www.steim.org/steim/texts.phtml?id=3.

Saraiya, P., North, C., and Duca, K. (2005). An insight-based method- ology for evaluating bioinformatics visualizations. Transactions on Vi- sualization and Computer Graphics, 11(4):443– 456.

Scaletti, C. (1994). Sound Synthesis Algorithms for Auditory Data Rep- resentations. In Kramer, G., editor, Auditory Display: Sonification, Audification, and Auditory Interfaces. Addison-Wesley. 209

Schaeffer, P. (1997). Trait´edes objets musicaux. Le Seuil, Paris.

Snyder, B. (2000). Music and Memory. MIT Press.

Speeth, S. D. (1961). Seismometer sounds. J. Acoust. Soc. Am., 33:909– 916.

Stockman, T., Nickerson, L. V., and Hind, G. (2005). Auditory graphs: A summary of current experience and towards a research agenda. In Proc. ICAD 2005, Limerick.

Suzuki, Y. and Varga, K. (1998). Stochastic variational approach to quantum-mechanical few-body problems. Lecture Notes in Physics, m54.

TAP, ACM (2004). Acm transactions of applied perception. New York, NY, USA.

Theussl, L., Wagenbrunn, R. F., Desplanques, B., and Plessas, W. (2001). Hadronic Decays of N and Delta Resonances in a Chiral Quark Model. Eur. Phys. J., A12:91.

UN Statistics Division (1975). Towards A System of Social Demographic Statistics. United Nations, Available online at UN Statistics Division (2006).

UN Statistics Division (1989). Handbook of Social Indicators. UN Statis- tics website.

UN Statistics Division (2006). Social Indicators. http://unstats.un.org/unsd/demographic/products/socind/default.htm.

Urick, R. J. (1967). Principles of Underwater Sound. McGraw-Hill, New York, NY, USA.

U.S. Census Bureau (2006). World POPClock Projection. http://www. census.gov/ipc/www/popclockworld.html.

Vercoe, B. (1986). CSOUND: A Manual for the Audio Processing System and Supporting Programs. M.I.T. Media Laboratory, Cambridge, MA, USA.

Vogt, K., de Campo, A., Frauenberger, C., Plessas, W., and Eckel, G. (2007). Sonification of Spin Models. Listening to Phase Transitions in the Ising and Potts Model. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada. 210

Voss, R. and Clarke, J. (1975). 1/f noise in speech and music. Nature, (258):317–318.

Voss, R. and Clarke, J. (1978). 1/f Noise in Music: Music from 1/f Noise. J. Acoust. Soc. Am., 63:258–263.

Walker, B. (2000). Magnitude Estimation of Conceptual Data Dimen- sions for Use in Sonification. PhD thesis, Rice University, Houston.

Walker, B. and Cothran, J. (2003). Sonification Sandbox: A Graphical Toolkit for Auditory Graphs. In Proceedings of ICAD 2003, Boston.

Walker, B. N. and Kramer, G. (1996). Mappings and Metaphors in Auditory Displays: An Experimental Assessment. In Frysinger, S. and Kramer, G., editors, Proc. Int. Conf. on Auditory Display (ICAD), pages 71–74, Palo Alto, CA.

Walker, B. N. and Kramer, G. (2005a). Mappings and Metaphors in Auditory Displays: An Experimental Assessment. ACM Trans. Appl. Percept., 2(4):407–412.

Walker, B. N. and Kramer, G. (2005b). Sonification Design and Metaphors: Comments on Walker and Kramer, ICAD 1996. ACM Trans. Appl. Percept., 2(4):413–417.

Walker, B. N. and Kramer, G. (2006). International Encyclopedia of Ergonomics and Human Factors (2nd ed.), chapter Auditory Displays, Alarms, and Auditory Interfaces, pages 1021–1025. CRC Press, New York.

Wallisch, A. (2007). EEG plus Sonifikation. Sonifikation von EEG-Daten zur Epilepsiediagnostik im Rahmen des Projekts ’SonEnvir’. PhD the- sis, Medical University Graz, Graz, Austria.

Warusfel, O. (2002-2003). LISTEN HRTF database. http://recherche.ircam.fr/equipes/salles/listen/.

Wedensky, N. (1883). Die telefonische Wirkungen des erregten Ner- ven - The Telephonic Effects of the Excited Nerve. Centralblatt f¨ur medizinische Wissenschaften, (26).

Wessel, D. (2006). An Enactive Approach to Computer Music Perfor- mance. In GRAME, editor, Proc. of ’Rencontres Musicales Pluridisci- plinaires’, Lyon, France. 211

Wikipedia (2006a). Gini Coefficient. http://en.wikipedia.org/wiki/Gini coefficient.

Wikipedia (2006b). Magellan. http://en.wikipedia.org/wiki/Magellan.

Wikipedia (2007). Levy skew alpha-stable distribution. http://en.wikipedia.org/wiki/Levy skew alpha-stable distribution.

Williams, S. (1994). Perceptual Principles in Sound Grouping. In Kramer, G., editor, Auditory Display. Addison-Wesley.

Wilson, C. M. and Lodha, S. K. (1996). Listen: A Data Sonification Toolkit. In Proc. Int Conf. on Auditory Display (ICAD), Santa Cruz, CA, USA.

Wilson, K. (1974). Renormalization group theory. Physics Reports, 75(12).

Worrall, D., Bylstra, M., Barrass, S., and Dean, R. (2007). SoniPy: The Design of an Extendable Software Framework for Sonification Research and Auditory Display. In Proc. Int Conf. on Auditory Display (ICAD), Montreal, Canada.

Yeo, W. S., Berger, J., and Wilson, R. S. (2004). A Flexible Framework for Real-time Sonification with SonArt. In Proc. Int Conf. on Auditory Display (ICAD), Sydney, Australia.

Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions. Oxford University Press.

Zouhar, V., Lorenz, R., Musil, T., Zm¨olnig,J. M., and H¨oldrich,R. (2005). Hearing Var`ese’s Po`eme Electronique´ inside a Virtual Philips Pavilion. In Proc. Int. Conf. on Auditory Display (ICAD), Limerick, Ireland.

Zweig, G. (1964). An SU(3) Model for Strong Interaction Symmetry and its Breaking. CERN Report Th.401/Th.412, page 8182/8419.

Zweig, S. (1983). Magellan - Der Mann und seine Tat. (Magellan - The Man and his Achievement). Fischer, Frankfurt am Main. (First ed. Vienna 1938).

Zwicker, E. and Fastl, H. (1999). Psychoacoustics-Facts and Models, 2nd Ed. Springer, Berlin.