— The Role of Timing and Intensity in the Production and Perception of Melody in Expressive Piano Performance —

Dissertation zur Erlangung des Doktorgrades der Philosophie an der Geisteswissenschaftlichen Fakult¨at der Karl-Franzens-Universit¨at Graz

eingereicht von

Mag. phil. Werner Goebl

am Institut f¨ur Musikwissenschaft.

Erstbegutachter: Univ.-Prof. Dr. Richard Parncutt Zweitbegutachter: Gastprof. PD Dr. Christoph Reuter

2003 ii

Vienna, August 27, 2003. This manuscript was typeset with LATEX2ε. Abstract

This thesis addresses the question of how pianists make individual voices stand out from the background in a contrapuntal musical context, how they realise this with respect to the constraints of the piano keyboard construction, and finally how much each of the expressive parameters employed by the performers contributes to the perception of particular voices. Three different empirical approaches were used to investigate these questions: a study in the area of piano acoustics investigated the temporal properties of three different grand piano actions, a performance study with aB¨osendorfer computer-controlled grand piano examined intensity and onset time differences between the principal voice and the accompaniment, and a series of per- ception studies looked at the relative effect of asynchrony and intensity variation on the perceived salience of individual tones in musical chords and real music contexts. First, the temporal behaviour of grand piano actions from different manufac- turers was investigated under two touch conditions: once with the finger resting on the key surface (legato touch) and once hitting the keys from a certain distance above (staccato touch). A large amount of measurement data from three grand pi- anos by different piano makers was gathered with an accelerometer setup monitoring key and hammer movements as well as recording the sound signal. Selected tones were played by two pianists with the two types of touch. From these multi-channel recordings of over 4000 played tones, discrete readings such as the onset time of the key movement, hammer–string and key–bottom contact times, the instant of maximum hammer velocity, as well as peak sound level were obtained. Prototypi- cal functions were determined (and approximated by power curves) for travel times (from finger–key to hammer–string contact), key–bottom times, and the instants of maximum hammer velocity. These varied clearly between the two types of touch, only slightly between the investigated pianos, and not at all between tested keys. However, no effect of touch type was found in peak sound level (dB), indicating that the hammer velocity rather than touch determined the tone intensity. Furthermore, the measurement and reproduction accuracy of the two computer-controlled grand pianos used (Yamaha Disklavier DC2IIXG, B¨osendorfer SE290) was examined with respect to their reliability for performance research. The second approach was through a performance study in which 22 profes- sional pianists played two excerpts by Fr´ed´eric Chopin on a B¨osendorfer computer- controlled grand piano. The performance data were analysed with respect to tone onset asynchronies and dynamic differences between melody and accompaniment.

iii iv Abstract

The melody was consistently found to precede the other voices by around 30 ms, confirming findings from previous studies (melody lead). The earlier an onset of a melody tone appeared with respect to the other chord tones the greater was also its intensity. This evidence supported the velocity artifact hypothesis that ascribes the melody lead phenomenon to mechanical constraints of the piano keyboard (the harder a tone is hit, the earlier it will arrive at the strings). In order to test this hypothesis, the relative asynchronies at the onset of the keystrokes (finger–key asyn- chronies) were inferred through the relation of travel time and hammer velocity from the previous study. Those key onsets showed then almost no asynchrony between the principal and the other voices anymore. This finding indicated that pianists started the key movement basically in synchrony; the typical asynchrony patterns (melody lead) were caused by different sound intensities in the different voices. This relationship was modelled to predict melody lead from intensity differences. It was concluded that melody lead can be largely explained by the mechanical properties of the grand piano action and is not necessarily an independent expressive device applied (or not) by pianists for purposes of expression. In a third approach, the influence of systematic manipulation of the two param- eters found in the previous study (relative onset timing and intensity variation) on the perceived salience was investigated. In a series of seven experiments, trained musicians judged single tones in dyads and three-tone chords in which the relative onset timing and intensity were systematically manipulated. Two experiments fo- cussed on the threshold beyond which two tones sound asynchronous. With piano tones, this threshold was at 30–40 ms, but changed considerably with the intensity of the two tones. With the earlier tone much louder, dyads with as much as 55 ms of asynchrony were heard as simultaneous by musicians. Either musicians perceive fa- miliar combinations of asynchrony and intensity difference as more synchronous than unfamiliar combinations, or sensitivity to synchrony is reduced in the melody-lead condition by forward masking. The other experiments examined loudness ratings of chord tones (target) with each of the two or three tones simultaneously manipulated in relative timing and intensity by up to ±55 ms and +30/−22 MIDI velocity units. The experiments involved various types of tone (pure, sawtooth, synthesised and real piano) and musical material (dyads, three-tone chords, sequences of three-tone chords, and a real music excerpt by Fr´ed´eric Chopin). Generally, loudness ratings depended mainly on relative intensity and relatively little on timing throughout all experiments. Loudness ratings increased with early onsets (anticipation), but only in conditions in which the target tone was hardly heard (equally loud or softer than the other tones). In these cases, anticipation helped to overcome spectral masking. Melodic streaming of tones in chord progressions enhanced the effect of asynchrony only marginally. The two selected voices of the excerpt by Chopin were perceived as more important when they were either delayed or anticipated, but only in com- bination with enlarged intensities. Zusammenfassung

In dieser Arbeit wurde untersucht, in welcher Weise professionelle Konzertpianisten einzelne Stimmen in einem mehrstimmigen musikalischen Kontext herausheben, welche M¨oglichkeiten und welche Einschr¨ankungen ihnen dabei der moderne Kon- zertfl¨ugel bietet bzw. auferlegt, und welche perzeptuellen Konsequenzen die verwen- deten Ausdrucksmittel f¨ur die H¨orer haben. Diese Grundfragen wurden anhand von drei unterschiedlichen methodischen Ans¨atzen behandelt. In einer instrumental-akustischen Studie wurde das Zeitverhalten von Klavier- mechaniken dreier unterschiedlicher Hersteller (Yamaha, Steinway, B¨osendorfer) un- ter verschiedenen Anschlagsbedingungen untersucht. F¨unf ausgew¨ahlte Tasten wur- den einmal von der Taste (Legato-Anschlag) und einmal aus der Luft angeschlagen (Staccato-Anschlag). Der Versuchsaufbau umfaßte ein kalibriertes Mikrophon und zwei Akzelerometer, welches die Bewegungen von Taste und Hammer w¨ahrend des Anschlagvorganges registrierten. Mehrkanalaufnahmen vonuber ¨ 4000 gespielten T¨onen wurden von einem daf¨ur geschriebenen Computerprogramm automatisiert ausgewertet. Zeitliche Zusammenh¨ange, wie die Zeitdauer des Anschlagvorganges (vom Beginn der Tastenbewegung bis zum Auftreffen des Hammers auf der Saite), dem Moment der h¨ochsten Hammergeschwindigkeit oder dem Zeitpunkt, zu dem die Taste den Tastenboden ber¨uhrt, wurden ermittelt und durch prototypische Expo- nentialfunktionen angen¨ahert. Unterschiedliche Anschlagsarten ver¨anderten maßge- blich diese Zusammenh¨ange, wie z. B. die Dauer des Anschlages (travel time), weit mehr als Hersteller oder Tonh¨ohe. Es konnte kein Effekt von Anschlagsart auf den Klavierklang beobachtet werden, der unabh¨angig von der Hammerendgeschwindikeit w¨are. Weiters wurden die Aufnahme- und die Wiedergabepr¨azision der zwei Repro- duktionsfl¨ugel (Yamaha Disklavier DC2IIXG, B¨osendorfer SE290) in bezug auf ihre Verwendbarkeit in der Performanceforschung getestet. In einer zweiten Studie, in der 22 Konzertpianisten auf einem B¨osendorfer Com- puterfl¨ugel zwei kurze Ausschnitte aus St¨ucken von Chopin spielten, wurde unter- sucht, in welcher Weise zeitliche und dynamische Anschlagsdifferenzen zwischen den Melodie- und den Begleitt¨onen zusammenh¨angen. Melodiet¨one erklangen typischer- weise um die 30 Millisekunden (ms) vor den Begleitstimmen, was Ergebnisse fr¨uherer Studien bekr¨aftigt (melody lead). Der starke Zusammenhang zwischen melody lead und Unterschieden in der Dynamik konnte mit der velocity-artifact-Hypothese erkl¨art werden: der Hammer einer heftig angeschlagenen Taste erreicht entsprechend fr¨uher die Saiten und erzeugt einen Ton als einer, der schw¨acher angeschlagen wurde.

v vi Zusammenfassung

Es wurden mithilfe der travel-time-Funktionen die Beginnzeiten der einzelnen Tas- tenbewegungen (finger–key contact times) ermittelt, die dann keine Asynchronien mehr aufweisen. Es konnte somit nachgewiesen werden, daß durch Herausrechnen allein dieser zeitlichen Eigenschaft der Klaviermechanik der gr¨oßte Teil des melody- lead-Ph¨anomens erkl¨art werden kann, das damit in dieser Form nicht als ein von der Dynamik unabh¨angiges Ausdrucksmittel bezeichnet werden kann. Ein melody-lead- Modell wurde entwickelt, das anhand der Dynamik der einzelnen T¨one das Ausmaß der Ungleichzeitigkeit vorhersagen kann. Der dritte Ansatz widmete sich den Auswirkungen von Asynchronizit¨at und Dy- namikdifferenzierung auf die Perzeption durch musikalisch gebildete H¨orer. In einer Serie von sieben H¨orexperimenten wurden zwei Hauptaspekte behandelt. Zum einen wurde gefragt, ab wann Musiker zwei beinahe simultane Kl¨ange als ungleichzeitig empfinden, und zum anderen, wie sich speziell die zeitliche Verschiebung zweier Kl¨ange zueinander auf die perzipierte Dynamikempfindung (salience) auswirkt. Da- zu wurden als Stimulusmaterial zwei- und dreistimmige Akkorde als auch Abfolgen von Akkorden und ein kurzes Musikbeispiel von Chopin verwendet. Musikalisch gebildete Personen beurteilten sowohl unterschiedliche Kl¨ange (Sinus, S¨agezahn, synthetisiertes und akustisches Klavier) als auch unterschiedliche Tonh¨ohen, in de- nen die jeweiligen zu testenden T¨one zeitlich bis ±55 ms und in dynamischer Hinsicht bis zu +30/−22 MIDI-velocity-Einheiten manipuliert wurden. Die Ungleichzeitig- keitsschwelle lag mit 30–40 ms etwas h¨oher als in der Literatur berichtet. Sie konnte aber noch wesentlich h¨oher sein, wenn der fr¨uhere Ton zugleich auch um einiges lauter als der andere war. In dieser melody-lead-Situation wurden sogar Ungleich- zeitigkeiten von 55 ms als gleichzeitig geh¨ort. Dieses Ph¨anomen wurde einerseits mit der Vertrautheit mit Klavierkl¨angen erkl¨art (Musiker erkennen Ungleichzeitigkeiten in ungewohnten Kombinationen von Asynchronie und relativer Dynamik leichter) und andererseits mit Maskierungsph¨anomenen (im speziellen die Nachverdeckung). Der zweite Aspekt bezog sich darauf, ob ein verfr¨uhter oder versp¨ateter Akkord- ton in seiner perzeptuellen Dominanz ver¨andert wahrgenommen wird. Es zeigte sich, daß die beurteilenden MusikerInnen sich haupts¨achlich nach der Dynamik der einzelnen T¨one orientierten und nur kaum nach ihrer Asynchronizit¨at. Diese wurde erst relevant, wenn gleichlaute oder wesentlich leisere T¨one zu beurteilen waren. Dann ‘entkamen’ verfr¨uhte T¨one der spektralen Maskierung und wurden als lauter beurteilt. Wiederholte Akkorde erh¨ohten den Einfluß von Ungleichzeitigkeit auf die Lautheitsbeurteilung nur unwesentlich (streaming-Effekt). Auch in dem Musikbeispiel konnte ein derartiger Effekt nicht nachgewiesen werden. Antizipation sowie Verz¨ogerung wurden nur im Zusammenhang mit dynamischer Verst¨arkung als perzeptuell verst¨arkend bewertet. Acknowledgements

This work was carried out within the framework of a large-scale research project “Computer-Based Music Research: Artificial Intelligence Models of Musical Expres- sion” at the Austrian Research Institute for Artificial Intelligence (Osterreichisches¨ Forschungsinstitut f¨ur Artificial Intelligence, OFAI),¨ Vienna. This project has been financed through the START programme from the Austrian Federal Ministry for Education, Science, and Culture (Grant No. Y99–INF) in form of a generous re- search prize to Gerhard Widmer (http://www.oefai.at/music). The OFAI¨ ac- knowledges basic financial support from the Austrian Federal Ministry for Edu- cation, Science, and Culture and the Austrian Ministry for Transport, Innovation and Technology. Furthermore, the author acknowledges financial support from the European Union for his research visit at the Department of Speech, Music, and Hearing (TMH) at the Royal Institute of Technology (KTH) in Stockholm (Marie Curie Fellowship, HPMT-GH-00-00119-02). Parts of this work were additionally fi- nanced through other projects by the European Union: The Sounding Object project (SOb), IST-2000-25287, http://www.soundobject.org) and the MOSART IHP network, HPRN-CT-2000-00115 supported the studies to measure and to record the B¨osendorfer grand piano in Vienna. Special thanks are due to the B¨osendorfer company, Vienna for providing an SE290 grand piano in excellent condition, to Alf Gabrielsson (Department of Psy- chology, University of Uppsala), who provided a well maintained Disklavier for ex- perimental use, to the Department of Speech, Music, and Hearing (TMH) of the Royal Institute of Technology (KTH), Stockholm for providing the accelerometer equipment for the piano action studies, and to the Acoustics Research Institute of the Austrian Academy of Sciences for generously providing recording equipment for the multiple recording sessions at the B¨osendorfer grand piano in Vienna (with special thanks to Werner A. Deutsch and Bernhard Laback). Furthermore, I am indebted to Tore Persson and especially to Friedrich Lachnit, who maintained and serviced the two reproducing pianos with endless patience. At the outset, I want to thank Gerhard Widmer, the leader of the OFAI¨ music group, for his pioneering spirit while guiding our young research group into the ad- venture of exploring music expression and surrounding topics and at the same time leaving the necessary freedom for unconventional ideas. It has been his merit that I got the unique opportunity to work as a musicologist and pianist in an artificial in- telligence department. I am grateful to my colleagues Simon Dixon, for his advice in

vii viii Acknowledgements

computer programming and logic thinking (His “Well, isn’t there a better way to do this?” helped me to save weeks of computation time and intellectual meanders), to Elias Pampalk especially for implementing the Zwicker loudness model into efficient computer code, to Asmir Tobudic, to Emilios Cambouropoulos for his ‘pragmatic approach’ towards the use of computers, and to Renee Timmers for giving critical and thus essential advice in design and interpretation of psychological experiments and their statistical evaluation. I take this occasion to especially thank Robert Trappl, the head of OFAI,¨ for his endless patience in recruiting research money to allow young researchers to spend their entire power exclusively on their work in an enjoyable environment, and for his fascination with music. I would like to express my sincere thanks to my collaborators Roberto Bresin and Alexander Galembo who shared my fascination with grand pianos and helped me to carry out the time-consuming experimental tests on the inmost functionality of grand piano actions in Sweden and Vienna. Furthermore, I would like to thank Johan Sundberg for enabling my stay as a guest researcher in Stockholm. I take this opportunity to thank further Anders Askenfelt and Erik Jansson for making their expertise and their equipment to monitor the various aspects of piano acoustics available to me. Moreover, I want to mention Giampiero Salvi, Erwin Schoonder- waldt, and Anders Friberg for stimulating discussions and helpful hints during my stay in Stockholm. I am grateful to my supervisor Richard Parncutt for his advice in getting focussed on the more essential research questions and guiding me through the whole process from designing the listening tests until writing up this thesis in English. I am indebted to Christoph Reuter for examining this book and giving essential final hints. I wish to finally thank Oliver Vitouch for his important statistical advice. At this place, I have to say a huge ‘Thank you!’ to all the participants hav- ing shared their exquisite musical expertise either by performing on the computer- controlled grand piano or by listening to my unpleasant and awkward stimuli without running away immediately. Last but not least, I thank my parents and my whole family for the not only mental support during the last three decades of my education. Contents

Abstract iii

Zusammenfassung v

Acknowledgements vii

1 Introduction 1 1.1Background...... 1 1.2Outline...... 4

2 Dynamics and the Grand Piano 7 2.1Introduction...... 7 2.1.1 Theacousticsofthepiano...... 7 2.1.2 Measurementofdynamicsinpianoperformance...... 7 2.1.3 Aims...... 9 2.2Thepianoactionastheperformer’sinterface...... 10 2.2.1 Introduction...... 10 Pianoactiontimingproperties...... 10 Differenttypesoftouch...... 12 2.2.2 Method...... 14 Material...... 14 Equipment...... 15 Calibration...... 16 Procedure...... 16 Dataanalysis...... 17 2.2.3 Resultsanddiscussion...... 18 Influence of touch ...... 18 Timingproperties...... 21 Traveltime...... 21 Key–bottom contact relative to hammer–string contact 25 Timeoffreeflight ...... 26 Comparisonamongtestedpianos...... 28 Acousticproperties...... 30 Risetime...... 30

ix x Contents

Peak sound-pressure level ...... 31 2.2.4 Generaldiscussion...... 32 2.3 Measurement and reproduction accuracy of computer-controlled grand pianos...... 36 2.3.1 Introduction...... 36 2.3.2 Method...... 38 2.3.3 Resultsanddiscussion...... 39 Timingaccuracy...... 39 Dynamicaccuracy...... 42 Twotypesoftouch...... 43 2.3.4 Generaldiscussion...... 47 2.4AnoteonMIDIvelocity...... 50

3Bringing Out the Melody in Homophonic Music—Production Ex- periment 57 3.1Introduction...... 58 3.1.1 Background...... 58 3.1.2 Pianoactiontimingproperties...... 59 3.2Aims...... 60 3.3Method...... 60 3.3.1 Materialsandparticipants...... 60 3.3.2 Apparatus...... 63 3.3.3 Procedure...... 63 3.4Results...... 64 3.4.1 Relationshipbetweenvelocityandtiming...... 67 3.5Discussion...... 70 3.6 Finger–key contact estimation with alternative travel time functions . 74 3.7Amodelofmelodylead...... 78

4 The Perception of Melody in Chord Progressions 79 4.1Introduction...... 80 4.1.1 Perceptionofmelody...... 80 4.1.2 Perceptionofisolatedasynchronies...... 81 4.1.3 Intensity and the perception of loudness and timbre ...... 82 4.1.4 Masking...... 83 4.1.5 Streamsegregation...... 84 4.2Aims...... 86 4.3Perceptionofasynchronousdyads(pilotstudy)...... 87 4.3.1 Background...... 87 4.3.2 Method...... 87 Participants...... 87 Stimuli...... 88 Equipment...... 88 Contents xi

Procedure...... 89 4.3.3 Results...... 89 Perceptionoftonesalience(question1)...... 89 Temporalorderperception(question2)...... 91 4.3.4 Discussion...... 93 4.4Perceptionofdyadsvaryingintonebalanceandsynchrony...... 95 4.4.1 Introduction...... 95 4.4.2 Determinationofbalancebaseline(ExperimentI)...... 96 Method...... 96 Participants...... 96 Stimuli ...... 96 Equipment ...... 96 Procedure...... 97 Resultsanddiscussion...... 97 4.4.3 Perceptionoftonesalience(ExperimentII)...... 98 Method...... 98 Resultsanddiscussion...... 99 4.4.4 Asynchronydetection(ExperimentIII)...... 100 Method...... 101 Resultsanddiscussion...... 101 4.4.5 Conclusion...... 103 4.5 Perception of chords and chord progressions varying in tone balance andsynchrony(ExperimentsIVandV)...... 105 4.5.1 Introduction...... 105 4.5.2 Method...... 106 Participants...... 106 Stimuli...... 106 Equipment...... 107 Procedure...... 108 4.5.3 Resultsanddiscussion...... 109 Effectsofintensitybalanceandasynchrony...... 110 ExperimentIV...... 110 ExperimentV...... 111 Post-hoccomparisons...... 113 Effectsofchord,transposition,andvoice...... 114 ExperimentIV...... 114 ExperimentV...... 115 4.5.4 Conclusion...... 117 4.6 Asynchrony versus relative intensity as cues for melody perception: Excerpts from a manipulated expressive performance (Experiment VI) 118 4.6.1 Background...... 118 4.6.2 Method...... 119 Stimuli...... 119 xii Contents

Procedure...... 122 4.6.3 Resultsanddiscussion...... 123 4.7Model...... 127 4.7.1 Introduction...... 127 4.7.2 Input of the models ...... 127 4.7.3 Resultsanddiscussion...... 128 ExperimentsIVa,IVb,andV...... 128 Intensity...... 128 Unsignedasynchrony...... 129 Signedasynchrony...... 130 Voice...... 130 ExperimentVI...... 130 4.8Generaldiscussion...... 132

5 Conclusions 137

Bibliography 144

A Ratings of Listening Tests 163

B Curriculum Vitae 181 Chapter 1

Introduction

1.1 Background

“And here I shall go back to something I said earlier: since the basis of all audible music is singing and since piano literature is full of cantabile, the first and main concern of every pianist should be to acquire a deep, full, rich tone capable of any nuance, with all its countless graduations, vertically and horizontally. An experienced pianist can easily give three or four dynamic nuances simultaneously: for instance

f mp pp p

to say nothing of using horizontally every possibility inherent in the pi- ano’s tone.” (Neuhaus, 1973, pp. 67–68, Chapter 3: ‘On Tone’, emphasis in original)

Heinrich Neuhaus while dwelling upon the singing quality in piano performance refers exclusively to the dynamic shaping of the tones. However, pianists may alter several expressive parameters to “bring out” the melody in a piano piece, to make it cantabile (singable), to give it singing quality, to let the melody stand out from the background. The most obvious strategy is—as mentioned by Neuhaus—to strike the keys of the principal voice with a slightly firmer blow, so that the melody tones simply sound louder, and to colour the accompaniment tones darker and behind the melody. Other, more subtle ways to make different voices acoustically more distinguish- able include articulation and the use of the right pedal. A melody becomes more cantabile, when it is played more legato than the other tones, that is, when all tones are connected to each other, the previous key released when the next is already de- pressed (finger legato). Finger legato can also be replaced by using the right pedal

1 2 Chapter 1. Introduction

that—in addition to linking tones together by rising the dampers from the strings— also introduces more sympathetic vibrations between all sounding strings, resulting in a more complex sound that lets the melody glow over the accompaniment. To additionally reduce the natural decay of the piano tone (cf. Martin, 1947; Repp, 1997a), the left pedal (of a grand piano) can be used, shifting the piano action side- wise so that only two strings of the triple-strung tones are struck by the hammer. This decreases the decay time of the piano tones and thus increases their effective duration (Weinreich, 1977, 1990). Alongside above mentioned expressive devices, another expressive feature has been investigated. The onsets of melody tones can be anticipated or delayed with respect to the other tones of the same chord in the score. The excessive use of asynchronies as an individual expressive freedom reminds us of old recordings where renowned pianists often used to play bass notes up to some hundreds of milliseconds earlier than the other tones (e.g. Josef Pembaur and Harold Bauer, see Hartmann, 1932). Moreover, a melody can be played more freely and independently from the accompaniment that keeps the meter rigidly. This effect is usually called tempo rubato in its earlier sense (Hudson, 1994), a performance practice going back to the Baroque period. However, another effect has been reported in recent and also in the older literature, that is, melody tones usually sound some 30 ms before the other tones of the same chord (Vernon, 1937; Palmer, 1989, 1996; Repp, 1996a; Goebl, 2001). This effect was called melody lead by Palmer (1996) and is usually accompanied with and presumably causally related to differences in tone intensity (Repp, 1996a; Goebl, 2001). The harder a key is actuated by the pianist’s finger, the faster the hammer will travel to the strings and the earlier a sound will emerge in relation to the beginning of the keystroke. As simple as this physical constraint is, it is just as important for the performing pianist to coordinate the command to the finger to depress a key relatively to how hard that key is intended to be hit so that the outcoming sound starts at the desired instant in time. The time interval between the beginning of the key movement and the hammer hitting the strings is referred to as travel time (see Chapter 2). A soft tone (piano) takes around 160 ms to reach the strings, whereas a forte keystroke only around 25 ms (Askenfelt and Jansson, 1991). These temporal properties can be modified by changing the regulation of the piano action (Askenfelt and Jansson, 1990a,b), they may also differ slightly among piano manufacturers and action designs. Alongside the close relation between hammer velocity and travel time, it is expected that how the key is actuated by the player influences this relation considerably. Not only pianists have to be aware of this temporal peculiarity of the keyboard construction in order to achieve intended onset timing with varying hammer veloci- ties, but it has also to be considered for reproducing devices, such as a B¨osendorfer SE or a Yamaha Disklavier, how their actions behave under what intensity situa- tions. These systems are provided with correction maps that allow to adjust for the different travel times at different hammer (MIDI) velocities. Repp (1996a) reported 1.1. Background 3

that the “prelay function” of his Yamaha Disklavier was not operating, so he had the opportunity to measure the travel time interval with respect to MIDI velocity units and found similar results as Askenfelt and Jansson (1991), however with re- spect to MIDI velocity units and not final hammer velocity in meters per second (as in Askenfelt and Jansson, 1991). Palmer (1996) advocated that melody leads were largely independent of the dy- namic differentiation between voices. She considered note onset asynchronies as an expressive device of pianists to bring out individual voices in contrapuntal contexts independently from other expressive cues such as dynamics, articulation or ped- alling. Her conclusions were based on evidence in data that, e.g., asynchronies were larger in experts’ performances than in students’, asynchronies decreased in ‘unmu- sical’ performances, and melody lead got larger with voice emphasis. However, Repp (1996a) found—with a more detailed methodology—strong relations of dynamic dif- ferences between voices and onset differences. The louder a melody note was played (in comparison to the dynamic level of the other chord tones), the earlier it tended to appear (also in comparison to the timing of the other tones). He explained this interrelationship with the ‘velocity artifact’, referring to the above mentioned tem- poral characteristics of the keyboard construction. However, he could not entirely explain the causal relationship, because correlational evidence (as found between dynamic and timing differences) did not prove causal connection. For him it seemed plausible that “pianists aim for synchrony of finger–key contacts” (Repp, 1996a, p. 3929). The temporal properties of the grand piano action were described in the liter- ature in fine detail by Askenfelt and Jansson (1990b, 1991, 1992a). However, only exemplary data from a single instrument was reported. In order to be able to esti- mate, how much of the above mentioned melody lead phenomenon can be accounted for by this temporal behaviour of the keyboard construction, more data has to be gathered from different instruments, and the interaction of travel time and hammer velocity studied in finer detail, also with respect to different ways of depressing the key. With reliable data about the travel time characteristics for various pianos and key actuation types, it will be possible to infer the asynchronicities at finger–key level from note onset asynchronies (corresponding to hammer–string contact time differences), that is, how asynchronously pianists started the keystrokes within a chord. Thus, Repp’s above mentioned hypothesis that pianists aim for synchrony at finger–key level can be verified or rejected (Repp, 1996a). Melody lead (or lag) may render a tone or voice more salient (prominent) than other tones in a chord, independently of the associated dynamic differences (Parn- cutt and Troup, 2002, pp. 294–296). An early tone onset will initially not be masked by the other chord tones (Rasch, 1978), and according to Bregman and Pinker (1978) asynchronous onsets enable the auditory system to segregate those tones into differ- ent melodic streams (voices). Although the presence of the melody lead phenomenon is likely to be explained by mechanical constraints of the keyboard construction 4 Chapter 1. Introduction

(Repp, 1996a), its perceptual effects may be (even unconsciously) wanted by the performers so that efforts to overcome it (i.e., to play dynamically differentiated chords without anticipating the louder tones or even the opposite) would not be worth it, simply because the reason of playing a voice louder (and thus earlier) is to make it stand out from the context, to make it cantabile,toimpartitasinging quality. Apart from the psycho-acoustic relevance of note onset asynchronies, differences in sound intensity and thus in timbre may entail by themselves psycho-acoustic effects on the perception of a complex chordal sonority. A louder melody tone will also impart a singing quality because it becomes more salient in pitch (Terhardt et al., 1982), there will be less beating between a pair of two tones with increasing loudness difference (Terhardt, 1974), and the compound timbre will sound less rough (Parncutt and Troup, 2002).

1.2 Outline

The central part of this thesis comprises three large chapters (2–4), each representing a different approach to my research question of melody emphasis: an investigation on the acoustics and instrumental characteristics of the grand piano, a performance study approach, and an experimental evaluation of the perceptual hypotheses re- garding the perception of timing and intensity differences in multi-voiced contexts. In Chapter 2, the acoustics of the piano are discussed with special emphasis on the grand piano action and its typical temporal behaviour in different playing situations. Two prototypical ways of depressing the keys were investigated. With the finger resting on the surface of the key and pressing it down starting with zero key velocity (legato touch), and with hitting the key from a certain distance above and thus striking it already with a certain speed (staccato touch). The different behaviour of the grand piano action and the various tone intensities produced with these two kinds of touch was investigated in Section 2.2 (p. 10). Special attention was given to the relationship between the hammer velocity and the time interval between the beginning of the key movement (finger–key contact1) and the sounding tone (hammer–string contact). This function is referred to as travel time function.Two modern reproducing grand pianos were subject of investigation in Section 2.3 (p. 36), where the measurement and reproduction reliability and accuracy were determined and evaluated with respect to the usability of such devices for performance research. Section 2.4 (p. 50) discusses briefly the relation between hammer velocity and sound level or loudness of the resulting tones. Chapter 3 (p. 57) describes a performance study in which 22 skilled pianists

1The term finger–key contact may be misleading, because with a legato touch the finger is already resting on the key surface and thus touching it, so the finger–key contact point would be much earlier than the start of the key movement. However, always the onset of the key movement is meant. 1.2. Outline 5

played two short excerpts by Fr´ed´eric Chopin (from the Etude op. 10, No. 3 and the second Ballade op. 38). The onset asynchronies between the melody and the accompaniment tones were investigated and compared with the differences in tone intensity (in terms of MIDI velocity). This study used data from a B¨osendorfer computer-controlled grand piano (Goebl, 1999b) and has been published in Goebl (2001). The findings from Goebl (2001) were revised by applying the results of the more recent measurements from Chapter 2 (i.e., alternative travel time functions). The adjusted results are reported in Section 3.6 (p. 74). Chapter 4 (p. 79) is dedicated to a series of perceptual experiments (mostly with musically trained participants) that investigated the perceptual influence of onset asynchrony on the perceived salience of individual tones. The main questions are: first, what is the threshold for two musical sonorities to be heard as simultaneous or as separate; and second, can anticipation or delay of a tone alter its perceived salience in a chordal context? Seven listening experiments are dedicated to these questions. In a pilot experiment (Section 4.3, p. 87), two equally loud tones with asynchronies up to ±50 ms were used to investigate the perceived relative loudness of two tones and their order. Different types of tones were used (pure, sawtooth, MIDI-synthesised piano, and real piano) to test whether different attack curves change loudness perception or temporal order identification. Intensity variation was introduced to the next three experiments (Experiments I–III, Section 4.4, p. 95). In Experiment I, participants adjusted the relative level of two simultaneous tones (pure, sawtooth, and piano sound) until they sounded equally loud to them. In Experiment II, they rated the relative loudness of the two tones of dyads with relative timing and intensity simultaneously manipulated by up to ±54 ms and ±20 MIDI velocity units. In Experiment III, listeners judged whether or not the stimuli of the previous experiment sounded simultaneous. In the last three experiments, the stimulus material was extended to three-tone piano chords, sequences of three-tone piano chords (Experiment IV and Experiment V, see Section 4.5, p. 105), and an excerpt of a piece by Fr´ed´eric Chopin (Experiment VI, Section 4.6, p. 118). 6 Chapter 1. Introduction Chapter 2

Dynamics and the Grand Piano

2.1 Introduction

In this chapter, the loudness dimension in expressive piano performance is discussed with respect to the acoustics of the grand piano. Emphasis was given to the temporal behaviour of the grand piano action and its consequences for piano performance.

2.1.1 The acoustics of the piano The acoustics of the piano is a comparably well investigated topic. A comprehensive overview can be found in the piano chapter in Fletcher and Rossing’s book (Fletcher and Rossing, 1998, pp. 352–398). There is a vast amount of detailed studies on the various aspects of the acoustics of the piano covering all steps involved in sound pro- duction: the keyboard and the action (Lieber, 1985; Askenfelt, 1991; Askenfelt and Jansson, 1990a,b, 1991), the hammers (Conklin, 1996a; Giordano and Winans II, 2000), the strings (Askenfelt and Jansson, 1992a; Chaigne and Askenfelt, 1994a,b; Conklin, 1996c; Suzuki, 1986; Weinreich, 1977), hammer–string interaction (Hall, 1986, 1987a,b; Suzuki, 1987; Boutillon, 1988; Hall and Askenfelt, 1988), the sound- board (Conklin, 1996b; Giordano, 1997, 1998a,b), sound radiation (Suzuki, 1986; Bork et al., 1995), and the sound and its decay (Knoblaugh, 1944; Martin, 1947; Nakamura, 1989; Taguti et al., 2002). The differences between grand and upright pianos were investigated by Galembo and Cuddy (1997) and Mori (2000).

2.1.2 Measurement of dynamics in piano performance The dynamics and the timbre of a single piano tone are controlled by a single parameter: the final hammer velocity. As in many instruments, the intensity of a piano tone and its timbre are closely linked; the louder the tone, the higher is its sound level and the more partials are involved causing a brighter sound. However, already more than one tone with simultaneously using the two pedals entails a virtually unlimited variety of possible sounds and timbres so that investigating the

7 8 Chapter 2. Dynamics and the Grand Piano

dynamics in piano performance is not easy at all. There are two possibilities to approach the dynamics of the piano. The first is to directly measure the acoustic output of the piano (the sound), and the second is to measure how that sound is produced.

1. Measuring the sound

• In the amplitude of the radiated sound as, e.g., from recordings. – Physical: Sound-pressure level (dB). – Perceptual: Loudness level (sone), see Moore (1997) and Zwicker and Fastl (1999). • In the amplitude of the string vibrations, confer to Askenfelt (1990); Askenfelt and Jansson (1990b).

2. Measuring the production of sound

• Movement of the piano hammer, e.g., the (final) hammer velocity (in me- ters per second) with computer-monitored pianos as, e.g., a B¨osendorfer SE290 (see Section 2.3, p. 36), Shaffer (1981, 1984), Shaffer et al. (1985), and Shaffer and Todd (1987), or with an optical measurement setup as in Henderson (1936) and Skinner and Seashore (1936). • MIDI velocity with any MIDI instrument (e.g., a digital piano). • Movement of the piano key, i.e., the continuous key acceleration (cf. Askenfelt and Jansson, 1990a, future computer-monitored pianos might also measure such parameters). A historic approach used smoked paper and a tuning fork to investigate key movement (Ortmann, 1925).

Attempts were made to relate these various ways of determining the dynamics of piano tones to each other. The relation of hammer velocity and peak amplitude was investigated by Palmer and Brown (1991); the relation of MIDI velocity units and sound level in dB by Friberg and Sundberg (1995) and Repp (1996d). Another study tried to infer the loudness of single piano tones out of multi-voiced chords by measuring the energy of their fundamentals and first overtones (Repp, 1993b), but could not find satisfactory results. The emphasis of performance research of the past two decades was mainly on timing and tempo issues, because tone onsets are easier and more reliably obtainable from music performances. However, there were several studies focusing on dynamics either by obtaining data from electronic MIDI instruments, such as digital pianos or other keyboards (Palmer, 1989; Repp, 1995a), from computer-monitored pianos (Repp, 1993b, 1996d; Palmer, 1996; Tro, 1994, 1998, 2000a,b; Riley-Butler, 2001, 2002), or by measuring and analysing the sound signal of recordings (Truslit, 1938; Repp, 1993a; Gabrielsson, 1987; Nakamura, 1987; Kendall and Carterette, 1990; Namba and Kuwano, 1990; Namba et al., 1991; Repp, 1999; Lisboa et al., 2002). 2.1.2. How to measure dynamics 9

Using MIDI velocity units to investigate the dynamic dimension of music per- formance bears certain difficulties. The first is that these units are an arbitrary choice of the MIDI instrument’s manufacturer to scale the range of possible dynam- ics to numbers between zero and 127. They are not comparable between instruments (e.g. between a digital piano and a computer-controlled piano, cf. Friberg, 1995). However, within one instrument MIDI velocity units seem to be able to depict a con- sistent picture of what the pianist did. In informal experiments with concert pianists on a Yamaha grand piano, Tro (2000a, p. 173) asked pianists to produce repeated tones on one piano key, while trying to constantly increase the loudness. The MIDI velocity units output by the device increased almost linearly over a range between 25 and 127 units. On the other hand, even to playback a recorded B¨osendorfer SE file with another SE grand piano model will result in obviously distorted dynamic reproduction due to a different response of the second instrument.

2.1.3Aims In this chapter, above mentioned problems of the relation between piano mechanics and tone intensity were investigated with an extensive experimental measurement setup. In Section 2.2, the temporal properties of three grand piano actions by dif- ferent piano manufacturers were investigated with an accelerometer setting. Here, the aim was to provide benchmark functions for performance research and to repli- cate assumptions from earlier work (see Chapter 3, and Goebl, 2001). Two of the three pianos were computer-controlled. Their recording and reproduction precision was measured in Section 2.3 (p. 36) in order to study the reliability of these instru- ments for performance research. The relationship between MIDI velocity units and the sound level of the tones produced by a B¨osendorfer SE290 computer-controlled grand piano was examined for all 97 tones of the keyboard in Section 2.4 (p. 50). 10 Chapter 2. Dynamics and the Grand Piano

2.2 The piano action as the performer’s interface

This work was performed at the Department of Speech, Music, and Hearing at the Royal Institute of Technology (KTH/TMH) in Stockholm, Sweden, in close co- operation with Roberto Bresin and Alexander Galembo. Parts of this work has been presented at the Stockholm Music Acoustics Conference (SMAC’03, cf. Goebl et al., 2003).

2.2.1 Introduction

This is an exploratory study on the temporal behaviour of grand piano actions by different piano manufacturers using different types of touch. Large amounts of data were collected in order to determine as precisely as possible the temporal functions of piano actions, such as travel times versus hammer velocity, or key–bottom contact times relative to hammer string contact. A pianist is able to bring out a wide range of imaginable facets of expression only by varying the manner and the intensity of actuating the 88 keys of the piano keyboard. Since not only the intensity of the keystroke, but also the precise timing of the onset of the tone produced is crucial to expressive performance, it can be assumed that pianists are intuitively acquainted with the temporal properties of the piano action, and that they take them into account while performing expressively. The grand piano action is a highly elaborated and complex mechanical interface, whereby the time and the speed of the hammer hitting the strings is controlled only by varying the manner and the force of striking the keys. The movement of the key is transferred to the hammer via the whippen, on which the jack is positioned that it touches the roller (knuckle) of the hammer shank. During a keystroke, the tail end of the jack makes contact with the escapement dolly (letoff button, jack regulator) causing the jack to rotate away from the roller, and thus breaking the contact between key and hammer. From this moment, the hammer travels with no further acceleration to the strings and rebounds from them immediately (‘free flight of the hammer’). The roller comes back to the repetition lever, while the hammer is caught by the back check. For a fast repetition, the jack slides back under the roller when the key is only released half-way, and the action is ready for another stroke. More precise descriptions of the functionality of grand piano actions can be found in literature (Askenfelt and Jansson, 1990b; Fletcher and Rossing, 1998, pp. 354–358).

Piano action timing properties

The temporal parameters of the piano action have been described in Askenfelt (1990) and Askenfelt and Jansson (1990b, 1991, 1992a). When a key is depressed, the time from its initial position to the bottom contact ranges from 25 ms (forte or 5 m/s final hammer velocity, FHV) to 160 ms (piano or 1 m/s FHV, Askenfelt and Jansson, 2.2. The piano action as the performer’s interface 11

1991, p. 2385).1 In a grand piano the hammer impact times (when the hammer excites the strings) are shifted in comparison to key–bottom contact times. The hammer impact time is 12 ms before the key–bottom contact at a piano touch (hammer velocity 1 m/s), but 3 ms after the key–bottom contact at a forte attack (5 m/s, Askenfelt and Jansson, 1990a, p. 43). The timing properties of a grand piano action were outlined by these data, but more detailed data were not available (Askenfelt, 1999). The timing properties of the piano action can be modified by changing the reg- ulation of the action. Modifications, e.g., in the hammer–string distance or in the let-off distance (the distance of free flight of the hammer, after the jack is released by the escapement dolly), affect the timing relation between hammer–string contact and key–bottom contact or the time interval of free flight, respectively (Askenfelt and Jansson, 1990b, p. 57). Greater hammer mass in the bass (Conklin, 1996a, p. 3287) influences the hammer–string contact durations (Askenfelt and Jansson, 1990b), but not the timing properties of the action. Another measurement was made by Repp (1996a) on a Yamaha Disklavier on which the “prelay function” was not working.2 This gave him the opportunity to measure roughly a grand piano’s timing characteristics in the middle range of the keyboard. He measured onset asynchronies at different MIDI velocities in compar- ison to a note with a fixed MIDI velocity. The time deviations extended over a range of about 110 ms for MIDI velocities between 30 and 100 and were fit well by a quadratic function (Repp, 1996a, p. 3920). The timing characteristics of electronic keyboards vary across manufacturers and are rarely well documented. Each key has a spring with two electric contacts that define the off-states and the on-states. When a key is depressed, the spring contact is moved from the off-position to the on-position (Van den Berghe et al., 1995, p. 16). The time difference between the breaking of the off-contact and the on-contact determines the MIDI velocity values; the note onset is registered near the key–bottom contact. There are several attempts to model piano actions (Gillespie, 1992; Hayashi et al., 1999), also for a possible application in electronic keyboard instruments (Cadox et al., 1990; Van den Berghe et al., 1995). Van den Berghe et al. (1995) performed measurements on a grand piano key with two optical sensors for hammer and key displacement and a strain gauge for key force. Unfortunately, they reported only in one figure an example of their data. Hayashi et al. (1999) tested one piano key on a Yamaha grand piano. The key was hit with a specially developed key actuator able to produce different acceleration patterns. The displacement of the hammer

1Askenfelt and Jansson (1990b) used a Hamburg Steinway & Sons grand piano, model B (7 ft, 211 cm) for their measurements. 2The “prelay function” compensates for the different travel times of the action at different hammer velocities. In order to prevent timing distortions in reproduction, the MIDI input is delayed by 500 ms. The solenoids (the linear motors moving the keys) are then activated earlier for softer notes than for louder notes, according to a pre-programmed function. 12 Chapter 2. Dynamics and the Grand Piano

was measured with a laser displacement gauge. They developed a simple model and tested it in two touch conditions (with constant key velocity and constant key acceleration). Their model predicted the measured data for both conditions accurately.

Different types of touch There has been an ongoing discussion in the literature whether it is only the final velocity of the hammer hitting the strings that influences the tone of the piano (ex- cept pedalling) or whether there is an influence of touch as pianists claim frequently. In other words: is it possible to produce two isolated piano tones without using the pedal with identical final hammer velocities, but with perceptually different sounds? The first scientific approach to this question was by Otto Ortmann from the Peabody Conservatory of Music (Ortmann, 1925).3 He approached the “mystery of touch and tone” at the piano through physical investigation. With a piece of smoked glass mounted to the side of a piano key and a tuning fork, he was able to record and to study key depression under different stroke conditions. He in- vestigated various kinds of keystrokes (percussive versus non-percussive, different muscular tensions, and positions of the finger). He found different acceleration pat- terns for non-percussive (finger rests on the surface of the key before pressing it) and percussive touch (an already moving finger strikes the key). The latter starts with a sudden jerk, thereafter the key velocity decreases for a moment and increases again. During this period, the finger slightly rebounds from the key (or vice versa), then re-engages the key and “follows it up” (p. 23). On the other side, the non-percussive touch caused the key to accelerate gradually. He found that these different types of touch provide a fundamentally different kind of key-control. The percussive touch required precise control of the very first impact, whereas with non-percussive touch, the key depression needed to be controlled up to the very end. “This means that the psychological factors involved in percussive and non-percussive touches are differ- ent” (p. 23). “In non-percussive touches key resistance is a sensation, in percussive touches it is essentially an image” (p. 23, footnote 1). His conclusions were that different ways of touching the keys produced different intensities of tones, but when the intensity was the same, also the quality of the tone must be the same. “The quality of a sound on the piano depends upon its intensity, any one degree of inten- sity produces but one quality, and no two degrees of intensity can produce exactly the same quality” (p. 171). The discussion continued in the 1930s with studies that examined the sound of the piano and defined the hammer velocity to be the most important factor (Hart et al., 1934; Seashore, 1937; White, 1930). This technical view does not reduce the conceptual variety of the pianists’ opportunities to freely and artistically controlling,

3The discussion, however, was certainly not new at that time; see, e.g., Bryan (1913a) and the lively discussion following this contribution (Wheatley, 1913; Heaviside, 1913; Allen, 1913; Morton, 1913; Bryan, 1913b; Pickering, 1913a; Bryan, 1913c; Pickering, 1913b; Bryan, 1913d). 2.2. The piano action as the performer’s interface 13

shaping, and altering their performances.

“It is our opinion that the reduction of the process of controlling the tone from the piano to the process of controlling hammer-velocity does not in any way detract from the beauty of the art, since it shows, among other things, what extreme delicacy of control is called for, and, in turn, to what an extent a great artist is able to bring his command over his mental and physical processes to bear upon the task of obtaining al- most infinitesimal varieties of manipulation of a key-board, no one of the 88 members of which can travel through a greater distance than 3/8 inch.” (White, 1930, pp. 364–365)

The other side argued that different types of noise emerge with varying touch (B´aron and Holl´o, 1935; Cochran, 1931). B´aron and Holl´o (1935) distinguished between finger noise (Fingerger¨ausch) when the finger touches the key (which is absent when the finger velocity is zero as touching the key—in our terminology legato touch), keybed noise (Bodenger¨ausch) when the key hits the keybed, and upper noises (Obere Ger¨ausche) when the key is released again (e.g., the damper hitting the strings). As another source of noise they mentioned the pianists foot hitting the stage floor in order to emphasise a fortissimo passage. In a later study, B´aron (1958) advocated a broader concept of tone quality, including all kinds of noise (finger–key, action, and hammer–string interaction), which he argued to be included into concepts of tone characterisation of different instruments (B´aron, 1958). More recent studies investigated these different kinds of noise that emerge when the key is stricken in different ways (Askenfelt, 1994; Koornhof and van der Walt, 1994; Podlesak and Lee, 1988). The hammer–impact noise (string precursor)ar- rived at the bridge immediately after hammer–string contact (Askenfelt, 1994) and characterises the attack–thump of the piano sound without which it would not be recognised as such (Chaigne and Askenfelt, 1994a,b). This noise was independent of touch type. The hammer impact noises of the grand piano did not radiate equally strongly in all directions (Bork et al., 1995). As three dimensional measurements with a two-meter B¨osendorfer grand piano revealed, higher noise levels were found horizontally towards the pianist and in the opposite direction, to the left (viewed from the sitting pianist), and vertically towards the ceiling (see also Meyer, 1965, 1978, 1999). Before the string precursor, another noise component could occur: the touch precursor, only present when the key was hit from a certain distance above (staccato touch Askenfelt, 1994). It preceded the actual tone by 20 to 30 ms and was much weaker than the string precursor (Askenfelt, 1994). Similar results were reported by Koornhof and van der Walt (1994). The authors called the noise prior to the sounding tone early noise or acceleration noise corresponding in time to finger–key contact. They performed an informal listening test with four participants. The two types of touch (staccato with the early noise and legato) could be easily identified by 14 Chapter 2. Dynamics and the Grand Piano

the listeners, but not anymore with the early noise removed. No further systematic results were reported (Koornhof and van der Walt, 1994). The different kinds of touch also produced different finger–key touch forces (Askenfelt and Jansson, 1992b, p. 345). A mezzo forte attack played staccato typ- ically had 15 N, very loud staccato attacks showed peaks up to 50 N (fortissimo), very soft touches went as low as 8 N (piano). Playing with legato touch, finger–key forces of about one third of those of staccato attacks were found, usually having a peak when the key touched the keybed. At a pianissimo tone, the force hardly exceeded 0.5 N. Although measurement tools improved since the first systematic investigations in the 1920s, no more conclusive results could be obtained as to whether the touch- variant noise components (especially finger–key noise) can be aurally perceived by listeners not involved into tone production.4 It is assumed that the hapto-sensorial feedback to the player influences his/her aural perception of the tone (Askenfelt et al., 1998). The pianist’s perception of the tone starts with finger–key contact while the listener’s (aural) perception starts with the excitation of the strings (assuming that other, e.g., visual cues are avoided). This finding concurs with Ortmann (1925) who said the psychological processes involved in the two types of touches to be essentially different (see above).

2.2.2 Method The present study aimed to collect a large amount of measurement data from dif- ferent pianos, different types of touch, and different keys, in order to determine benchmark functions for performance research. The measurement setup with ac- celerometers was the same as used by Askenfelt and Jansson (1991), but the data processing procedure was automated with custom computer software in order to obtain a large and reliable data set. Each of the measured tones was equipped with two accelerometers monitoring key and hammer velocity. Additionally, a mi- crophone recorded the sound of the piano tone. With this setup, various temporal properties (travel time, key–bottom time, time of free flight) and acoustic properties (peak sound level, rise time) were determined and discussed.

Material Three grand pianos by different manufacturers were measured in this study. 1. Steinway grand piano, model C, 225 cm, situated at the Department of Speech, Music, and Hearing at the Royal Institute of Technology (KTH-TMH) in Stockholm, Sweden. Serial number: 516000, built in Hamburg, Germany, approximately 1992 (this particular grand piano was already used in Askenfelt and Jansson, 1992a).

4The hammer–string impact noise is part of the piano tone and is certainly heard, however, this noise component cannot be varied independently of hammer velocity. 2.2. The piano action as the performer’s interface 15

2. Yamaha Disklavier grand piano DC2IIXG, 173 cm, situated at the De- partment of Psychology at the University of Uppsala, Sweden. Serial number: 5516392, built in Japan, approximately 1999 (The Mark II series were issued 1997 by Yamaha; personal communication with Yamaha Germany, Rellingen).

3. B¨osendorfer computer-controlled grand piano SE290, 290 cm, situated at the B¨osendorfer Company in Vienna; internal number: 290–3, built in Vienna, Austria, 2000. The Stahnke Electronics (SE) system dates back to 1983 (for more information on its development, see Roads, 1986; Moog and Rhea, 1990), but this particular grand piano was built in 2000. The same system used to be installed in an older grand piano (internal number 19–8974, built in 1986, used, e.g., in Chapter 3), but was put into a newer one for reasons of instrumental quality.

Immediately before the experiments, the instruments were tuned, and the piano action and—in the case of the computer-controlled pianos—the reproduction unit serviced. At the Disklavier, this procedure was done by a specially trained Yamaha piano technician. At the B¨osendorfer company, the company’s SE technician took care of this work. The Steinway grand has been regularly maintained by a piano technician of the Swedish National Radio.

Equipment

The tested keys were equipped with two accelerometers: one mounted on the key5 and one on the bottom side of the hammer shank.6 The accelerometer setting (see Figure 2.1) was the same as used in Askenfelt and Jansson (1991). Each of the accelerometers was connected with an amplifier7 with a hardware integrator inside. Thus, their output was velocity in terms of voltage change. A sound-level meter (Ono Sokki LA–210) placed next to the strings of that particular key (approximately 10- cm distance) picked up the sound. The velocities of the key and the hammer as well as the sound were recorded on a multi-channel digital audio tape (DAT) recorder (TEAC RD–200 PCM data recorder) with a sampling rate of 10 kHz and a word length of 16 bit. The DAT recordings were transferred onto computer hard disk into multi-channel WAV files (with a sampling frequency of 16 kHz).8 Further evaluation of the recorded data was done in Matlab programming environment with routines developed by the author for this purpose.

5Br¨uel & Kjær accelerometer type 4393. Mass without cable: 2.4 g; serial number 1190913. 6ENDEVCO accelerometer model 22 PICOMIN. Mass without cable: 0.14 g; serial number 20845. 7Br¨uel & Kjær charge amplifier type 2635. 8Using an analogue connection from the TEAC recorder to a multi-channel sound card (Pro- ducer: Blue Waves, formerly Longhborough Sound Images; Model PC/C32 using its four-channel A/D module) on a PC running Windows 2000 operating system. 16 Chapter 2. Dynamics and the Grand Piano

hammer accelerometer

key accelerometer

Figure 2.1: A B¨osendorfer grand piano action with the SE sensors sketched. Additionally, the placement of the two accelerometers is shown. (Figure generated with computer software by the author. Piano action by B¨osendorfer with permission from the company.)

Calibration The recordings were preceded by calibration tests in order to be sure about the measured units. The accelerometer amplifiers output AC voltages corresponding to certain measured units (in our case, meters per second) depending on their setting, e.g., 1 V/m/s for the key accelerometer. To calibrate the connection between the TEAC DAT recorder and computer hard disk, different voltages (between −2and +2 V DC) were recorded onto the TEAC recorder and measured in parallel by a volt meter. The recorded DC voltages were transferred to computer hard disk as described above. These values were compared with the values measured by the volt meter. They correlated highly (R2 = 0.9998), with a factor slightly above 2. Always before the recording sessions, the microphone was calibrated with a 1-kHz test tone produced by a sound-level calibrator,9 in order to get dB values relative to the hearing threshold.

Procedure Five keys distributed over the whole range of the keyboard were tested: C1 (MIDI note number 24), G2 (43), C4 (60), C5 (72), and G6 (91).10 The author and his colleague (RB) served as pianists to perform the recorded test tones. Each key was hit at as many different dynamic levels (hammer velocities) as possible, with two different kinds of touch: once with the finger resting on the surface of the key (legato touch), once hitting the key from above (staccato touch), striking the key already with a certain speed. Parallel to the accelerometer setting, the grand pianos recorded these test tones

9Br¨uel & Kjær sound-level calibrator type 4230, test tone: 94 dB, 1kHz. 10Only three keys were tested at the Steinway piano (C1, C5, G6). 2.2. The piano action as the performer’s interface 17

with their internal device on computer hard disk (B¨osendorfer) or floppy disk (Dis- klavier). For each of the five keys, both players played in both types of touch from 30 to 110 individual tones, so that a sufficient amount of data was recorded. Immedi- ately after each recording of a particular key, the recorded file was reproduced by the grand piano, and the accelerometer data was recorded again onto the multi-channel DAT recorder. The recordings took place in May 2001 (Steinway, Stockholm), June 2001 (Yamaha, Uppsala) and January 2002 (B¨osendorfer, Vienna). For the Stein- way, 608 individual attacks were recorded, for the Yamaha Disklavier 932, and for the B¨osendorfer 1023.

Data analysis In order to analyse the three-channel data files, discrete measurement values had to be extracted from them. Several instants in time were defined as listed below and determined automatically with the help of Matlab scripts prepared for this purpose by the author.

1. The hammer–string contact was defined as the moment of maximum decel- eration (minimum acceleration) of the hammer shank (hammer accelerometer) which corresponded well to the physical onset of the sound, and conceptually with the ‘note on’ command in the MIDI file. In mathematical terms, the hammer–string contact was the minimum of the first derivative of the mea- sured hammer velocity.11

2. The finger–key contact was defined to be the moment when the key started to move. It was obtained by a simple threshold procedure applied on the key velocity track. In mathematical terms, it was the moment when the (slightly smoothed) key acceleration exceeded a certain threshold which varied relative to the maximum hammer velocity. Finding the correct finger–key point was not difficult for staccato tones (they showed typically a very abrupt initial acceleration). However, automatically determining the correct moment for soft legato tones was sometimes more difficult and needed manual adaption of the threshold. When the automatic procedure failed, it failed by several tens of milliseconds—an error easy to discover in explorative data plots.

3. The key–bottom contact was the instant when the downwards travel of the key was stopped by the keybed. This point was defined as the maximum deceleration of the key (MDK). In some keystrokes, the MDK was not the actual keybed contact, but a rebound of the key after the first key–bottom contact. For this reason, the time window of searching MDK was restricted to 7 ms before and 50 ms after hammer–string contact. The time window

11This measurement was also used to find the individual attacks in a recorded file. All acceler- ations below a certain value were taken as onsets. The very rare silent attacks were not captured with this procedure, as well as some very soft attacks. 18 Chapter 2. Dynamics and the Grand Piano

was iteratively modified depending on the maximum hammer velocity until the correct instant was found. The indicator MDK was especially clear and non-ambiguous when the key was depressed in a range of medium intensity (see Figures 2.2 and 2.3).

4. The maximum hammer velocity (in meters per second) was the maximum value in the hammer velocity track before hammer–string contact.

5. An intensity value was derived by taking the maximum energy (RMS) of the audio signal immediately after hammer–string contact, using a RMS window of 10 milliseconds.

To inspect the recorded key and hammer velocity tracks and the sound signal, an interactive tool was created in order to display one keystroke at a time in three panels, one upon the other. The user could click to the next and the previous keystroke, zoom in and out, and change the display from velocity to acceleration or displacement. Screen shots of this tool are shown below (see Figure 2.2 and Figure 2.3). The data was controlled and inspected on errors with the help of this tool.

2.2.3Results and discussion Influence of touch To illustrate the difference between the two types of touch recorded (legato and staccato), one of each is shown in Figure 2.2 and Figure 2.3. These two examples have a similar maximum hammer velocity. The left hand side panels show velocity, those on the right acceleration. Lines indicate finger–key (“fk,” blue dashed line), hammer–string (“hs,” red solid line) and key–bottom contact times (“kb,” green dash-dotted line). In the legato attack (Figure 2.2, with the finger resting at the key surface before hitting it), the key accelerated smoothly and almost constantly (about 8 ms before hammer–string impact there was an interrupt in the movement, which could be due to the escapement of the jack). The staccato attack (Figure 2.3) showed a sudden acceleration in the beginning, whereas the hammer started to move up with a certain time delay. The parts of the piano action were compressed by the strong initial impact. Only after the inertia of the hammer was overcome, the hammer moved up towards the strings. After this initial input, the key almost stopped moving. Shortly before hammer–string impact, it accelerated again, but did not reach its original speed. The acceleration of the key showed two negative peaks, whereas the second indicated the moment of key–bottom. In some very strong attacks, the first negative peak (maximum deceleration) can surpass the second. Due to this fact, the key–bottom finding procedure had to be restricted to a certain time window around hammer–string 2.2. The piano action as the performer’s interface 19

0.6 fk kb hs 300 fk kb hs hs−fk: 45.9 ms hs−fk: 45.9 ms 0.4 kb−hs: −1.9 ms 200 kb−hs: −1.9 ms 0.2 100

0 0

−0.2 −100

Key velocity (m/s) −0.4 −200 Key acceleration (m/s2) −0.6 −300 3 fk kb hs 4000 fk kb hs maxHv: 2.654 m/s maxHv: 2.654 m/s 2 2000 1

0 0

−1 −2000 −2 Hammer velocity (m/s)

−3 Hammer acceleration (m/s2) −4000 0.3 fk kb hs 0.3 fk kb hs SPL: 98.33 dB SPL: 98.33 dB 0.2 0.2

1/+1) 0.1 1/+1) 0.1 − − 0 0

−0.1 −0.1

Amplitude ( −0.2 Amplitude ( −0.2

−0.3 −0.3 −80 −60 −40 −20 0 20 −80 −60 −40 −20 0 20 Time (ms) Time (ms)

Figure 2.2: A legato attack played at middle C (C4, 60) on the Yamaha grand piano. Key velocity (upper left panel), key acceleration (upper right panel), hammer velocity (middle left), hammer acceleration (middle right), and the sound signal are displayed. The dashed lines (blue) indicate finger–key contact (“fk”), the solid lines (red) hammer–string contact (“hs”), and the dotted lines (green) represent key–bottom contact (“kb”).

0.6 fkhs kb 300 fkhs kb hs−fk: 28.3 ms hs−fk: 28.3 ms 0.4 kb−hs: 1.5 ms 200 kb−hs: 1.5 ms 0.2 100

0 0

−0.2 −100

Key velocity (m/s) −0.4 −200 Key acceleration (m/s2) −0.6 −300 3 fkhs kb 4000 fkhs kb maxHv: 2.552 m/s maxHv: 2.552 m/s 2 2000 1

0 0

−1 −2000 −2 Hammer velocity (m/s)

−3 Hammer acceleration (m/s2) −4000 0.3 fkhs kb 0.3 fkhs kb SPL: 97.41 dB SPL: 97.41 dB 0.2 0.2

1/+1) 0.1 1/+1) 0.1 − − 0 0

−0.1 −0.1

Amplitude ( −0.2 Amplitude ( −0.2

−0.3 −0.3 −80 −60 −40 −20 0 20 −80 −60 −40 −20 0 20 Time (ms) Time (ms)

Figure 2.3: A staccato attack played at the middle C (C4, 60) on the Yamaha grand piano. (Annotations as in Figure 2.2). 20 Chapter 2. Dynamics and the Grand Piano

STEINWAY C

lg st rp 250 G6 (91) C5 (72) C4 (60) 200 G2 (43) C1 (24)

150

100

50

0 0 1 2 3 4 5 6 7 YAMAHA DISKLAVIER

250 Hayashi, const. speed Hayashi, const. acc. 200

150

100 Travel time (ms)

50 Figure 2.4: Travel times (from finger– key to hammer–string contact) against 0 maximum hammer velocity for the three 0 1 2 3 4 5 6 7 grand pianos (three panels), different BÖSENDORFER SE290 types of touch (legato: “lg,” staccato: “st,” and reproduction by the piano: 250 old TCC “rp”), and different keys (from C1to new TCC G6, see legend of upper panel; only C1, 200 C5, and G6 were measured on the Stein- way). In the middle panel, travel time 150 data is plotted as reported by Hayashi et al. (1999, p. 3543), for constant key 100 speed (solid line with dots) and for con- stant key acceleration (solid line). In 50 the bottom panel, the solid line depicts the timing correction curve (TCC) of t 0 the older B¨osendorfer grand piano ( = 0 1 2 3 4 5 6 7 89.16h−0.570, used in Goebl, 2001), the Maximum hammer velocity (m/s) dash-dotted line that of the newer grand piano (t =84.27h−0.562). 2.2. The piano action as the performer’s interface 21

contact (see Section 2.2.2). Independently of the type of touch, the hammer–string contact is always the minimum acceleration (middle panel on the right). The key reached the keybed shortly before the hammer–string contact (2 ms) with the legato touch, but 1.5 ms after the hammer–string contact with the staccato touch. The whole attack process (from finger–key to hammer–string) needed 46 ms with the legato touch, but only 28 ms with the staccato touch although similar hammer velocities were produced. The two attacks displayed in Figure 2.2 and 2.3 sounded indistinguishable to the author (while listening informally to the material). Their difference in hammer velocity was obviously negligibly small. In some staccato attacks played by one of the two pianists, a clear touch noise of the finger nail hitting the key surface was perceivable in the samples. This noise was absent in the legato keystrokes of that pianist. In these tones, the difference between legato and staccato touch was evident. We have to bear in mind here that the microphone was very close to the strings, a position in which an audience would never sit in a concert.12 An example of such an staccato tone with nail noise is displayed in Figure 2.19 (p. 46). In the sound signal, first noisy activation starts shortly after the finger touched the key. It is interesting that the touch noise was so clearly audible in some samples. Was it transmitted through the piano construction to the microphone or simply via the air? Nevertheless, systematic listening tests have to be performed to more conclusively discuss the perception of the present samples. This will remain a topic for future investigation with this material.

Timing properties

The different types of touch result in different acceleration patterns as illustrated above. Hence, the timing properties of the piano action change with the different types of touch. In this section, we discuss some typical measures: travel time, key– bottom time relative to hammer–string impact, hammer–string contact duration, and the time of free flight of the hammer.

Travel time The time interval between finger–key contact and hammer–string impact is defined here as the travel time.13 The travel times of all recorded tones are plotted in Figure 2.4 against hammer velocity separately for different types of touch (indicated by colour), for different keys (denoted by symbol), and for the three grand pianos (different panels). The present data were generally congruent with findings by Askenfelt and Jansson (1991).

12However, in some professional recordings the microphones are sometimes placed very close to the piano so that such finger–key noises got clearly perceivable. 13This terminology might be misleading, because “time” refers to a point in time, although in this case a time duration is meant. Terms like “travel time” or “time of free flight” were used according to the term “rise time” that is commonly used in acoustic literature (see, e.g., Truax, 1978). 22 Chapter 2. Dynamics and the Grand Piano

Table 2.1: Power curves of the form t = a · hb fitted into travel time data separately for the types of touch (legato, staccato, reproduction) and piano. t stands for travel time and h for maximum hammer velocity (see Figure 2.4).

legato staccato repro Steinway t =98.57h−0.7147 R2 =0.983 t =65.19h−0.7268 R2 =0.959 Yamaha t =89.41h−0.5959 R2 =0.965 t =57.43h−0.7748 R2 =0.969 t =63.38h−0.7228 R2 =0.990 B¨osendorfer t =89.96h−0.5595 R2 =0.939 t =58.39h−0.7377 R2 =0.968 t =60.90h−0.7731 R2 =0.992

Some very basic observations can be drawn from this figure. The two pianists were able to produce much higher hammer velocities on all three pianos with a stac- cato attack (beyond 7 m/s), whereas with a legato attack, the maximum hammer velocities hardly exceeds 4 m/s. There was a small trend towards higher hammer velocities at higher pitches (due to smaller hammer mass, see Conklin, 1996a). The highest velocities on the Yamaha and the Steinway were obtained at the G6, but at the middle C on the B¨osendorfer. The lowest investigated key (C1) showed slightly lower maximal hammer velocities by comparison to the fastest attacks (loudest at- tacks on the Steinway: C1: 6 m/s versus G6: 6.6 m/s, on the Yamaha: C1: 5.6 m/s versusG6:6.8m/s,andontheB¨osendorfer: C1: 5.3 m/s versus G6: 5.8 m/s and C4: 6.7 m/s). Since the keys were played by human performers, this variability between keys could be due to the human factor. The travel times ranged from 20 ms to around 200 ms (up to 230 ms on the Steinway) and depicted clearly different patterns for the two types of touch. The travel time curves were independent of pitch although lower keys have much greater hammer mass than in the high register (Conklin, 1996a). The data plotted in Figure 2.4 were approximated by power curves of the form t = a · hb separately for the type of touch (“lg,” “st,” and “rp”) and the three pianos. The results of these curve interpolations are listed in Table 2.1. From these numbers, we learn that the travel time curves of the reproducing systems (“rp”) resembled more the staccato than the legato curves. Staccato touch needed less time to transport the hammer to the strings than a legato touch which smoothly accelerated the key (and thus the hammer). The travel times were more spread out when the tones were played legato, indicating that there was a more flexible control of touch in this way of actuating the keys (also reflected in the lower R2 values in Table 2.1). On the Steinway, the staccato data showed higher variability, almost similar to the legato data. The B¨osendorfer reproducing system (see Chapter 2.3) uses a timing correction similar to the Yamaha Disklavier’s “prelay function” (Repp, 1996a, cf.) to correct for the different travel times of tones with different intensity. In order to get the tones sounding on the required instant in time, the system has to advise its solenoids to start to act—that is, hitting the key at its backside upwards—earlier for a softer than for a louder tone. For this purpose, the SE system recalculates the timing char- 2.2. The piano action as the performer’s interface 23

Eprom SE 19−8974 (1999) 220

200 440 Hz 180

160 0.32 m/s

140 0.50 m/s 120

100 0.80 m/s

80 1.28 m/s Travel time (ms) 60 2.00 m/s 3.20 m/s 40 5.12 m/s 20

0 12 24 36 48 60 72 84 96 108 Pitch (MIDI note numbers)

Figure 2.5: The timing correction matrix (TCM) for the SE built into the older B¨osendorfer grand piano (19–8974) as measured in 1999. Each of the seven lines represents measure- ments for a particular final hammer velocity (as plotted on the right hand side). acteristics for each key individually by running a calibration program on demand. Among other parameters, the calibration function records the time interval between the key sensor response (2–3 mm below the key’s resting position) and the hammer– string contact (as measured by one of the two trip points at the hammer sensor, for detailed functionality see Chapter 2.3) for seven final hammer velocities (0.32, 0.50, 0.80, 1.28, 2.00, 3.20, 5.12 m/s) and all 97 keys (the B¨osendorfer Imperial 290 cm grand piano has nine additional keys in the bass). This data matrix is stored in internal system memory (EEPROM X2816AP). The content of this hardware chip of the SE system in Vienna was transferred into a file twice. Once, for the older pi- ano (19–8974, measured 1999), and once for the new piano (290–3, measured 2002). The calibration matrices (timing correction matrix, TCM) of the older B¨osendorfer (used in Goebl, 2001, cf. Chapter 3) and the one of the newer grand piano (used in the present study) are plotted in Figure 2.5 and Figure 2.6, respectively. The matrices contained both irregularities from the piano action and the elec- tronic playback system. Since the playback system was identical in the two figures and only the piano changed, we can assume that differences in the two matrices were due to different piano actions (the newer grand piano also possesses a slightly re-designed action, personal communication with B¨osendorfer). What can be seen from these data is that travel time does not depend on hammer mass which becomes much larger in the bass. In the TCM of the newer grand piano, 24 Chapter 2. Dynamics and the Grand Piano

Eprom SE 290−3 (2002) 220

200 440 Hz 180

160 0.32 m/s

140

120 0.50 m/s

100 0.80 m/s 80

Travel time (ms) 1.28 m/s 60 2.00 m/s 3.20 m/s 40 5.12 m/s 20

0 12 24 36 48 60 72 84 96 108 Pitch (MIDI note numbers)

Figure 2.6: TCM for the same SE system (as in the previous figure) built into a newer B¨osendorfer grand piano (290–3, measured in 2002). The bass register with the wrapped strings crossing the middle register strings ranges from C0 (12) to C#2 (37). the transition from the bass register (with the wrapped strings crossing the middle strings) to the lower middle register can be seen. The strings in the bass register are positioned some centimeters higher than the other strings so that these keys need to be regulated slightly different from the rest. Nevertheless, the register change was not obvious in the TCM of the older piano. These data represent measurements originally not collected for scientific purposes but to internally calibrate a reproducing system. Apart from providing prototypical travel time data Goebl (used in 2000), the developer of the system (W. Stahnke) did not offer any more specific information on that calibration data. It must be assumed that it also reflects properties of the electronic equipment or even that it cannot be interpreted at all. Due to this interpretational uncertainty, only the data averaged over the 97 keys was taken into consideration. The power curves fitted into the averaged (seven) data points of the two TCMs are called ‘timing correction curves’ (TCC) in Goebl (2001). They are plotted onto the B¨osendorfer data in Figure 2.4. It is evident that both curves were very similar to each other and to the curve obtained from the legato touch. This was somewhat surprising, because we found that the B¨osendorfer SE generates typically a staccato curve at reproduction, but measured a timing correction curve that was more similar to the legato pattern than to the staccato pattern. Nevertheless, the functions used in earlier work (Goebl, 2000, 2001) were replicated with the present measurement 2.2. The piano action as the performer’s interface 25

STEINWAY C YAMAHA DISKLAVIER BÖSENDORFER SE290 40 40 40 lg st rp 35 35 35 G6 (91) C5 (72) 30 30 30 C4 (60) G2 (43) 25 25 25 C1 (24)

20 20 20

15 15 15

10 10 10

5 5 5

0 0 0 Key bottom times (ms)

−5 −5 −5

−10 −10 −10 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Maximum hammer velocity (m/s)

Figure 2.7: Key–bottom contact times relative to the moment of hammer–string contact separately for the three grand pianos (different panels), five keys (different markers), and types of touch (colour). Negative values indicate key–bottom contacts before, positive values contacts after hammer–string.

setup. The impact of different power curve approximations on the interpretations of results found by Goebl (2000, 2001) is studied and discussed in detail in Section 3.6, p. 74. Also the travel time function of the Yamaha Disklavier was compared to some data reported in the literature. In the middle panel in Figure 2.4, travel time data is plotted as printed in Figure 26 in Hayashi et al. (1999).14 The author transferred the graph of this figure into discrete data with the help of a ruler in order to compare their findings with the present data. The graph with keystrokes at constant speed (solid line with dots in middle panel of Figure 2.4) was similar to the staccato data, the graph with keystrokes at constant acceleration resembled more the legato type of touch in the Disklavier’s data.

Key–bottom contact relative to hammer–string contact Figure 2.7 dis- plays the key–bottom contact times relative to hammer–string contact (kbrel = kb − hs). Negative values indicate key–bottom contacts before hammer–string, pos- itive values key–bottom contacts after the hammer hits the strings (see overview display in Figure 2.9a, p. 29). The keybed was reached by the key up to 35 ms after hammer–string contact in very soft tones (up to 39 ms at the B¨osendorfer) and as early as 4 ms before in very strong keystrokes. This finding coincides with Askenfelt and Jansson (1990a,b, see Section 2.2.1), but since much softer tones were measured in the present study (as low as 0.1 m/s), the key–bottom times extended more after hammer–string contact. However, the different types of touch behave quite differently. Keystrokes pro- duced in a legato manner tended to reach the keybed earlier than keystrokes hit

14Hayashi et al. (1999) used “the 11th key” of a Yamaha grand piano, model C7. 26 Chapter 2. Dynamics and the Grand Piano

Table 2.2: Power functions of the form kb = a · hb + c fitted into the data from Figure 2.7 separately for the types of touch and the different pianos. (kb is key–bottom; h the maximum hammer velocity.)

legato staccato repro

Steinway kb =19.09h−0.3936 − 12.3 kb =59.57h−0.1131 − 51.19 R2 =0.794 R2 =0.738

Yamaha kb =14.63h−0.4158 − 11.05 kb =10.15h−0.6825 − 3.743 kb =16.2h−0.1639 − 12.47 R2 =0.933 R2 =0.893 R2 =0.836

B¨osendorfer kb =11.59h−0.4497 − 9.983 kb =13.96h−0.3559 − 10.15 kb =10.09h−1.108 − 2.085 R2 =0.855 R2 =0.698 R2 =0.942 in a staccato manner. This was especially evident for the B¨osendorfer and for the Yamaha, but not for the Steinway. Askenfelt and Jansson (1992b, p. 345) stated that the interval between key–bottom and hammer–string contact varies only marginally between legato and staccato touch. They obviously refer with this statement to one of their earlier studies (Askenfelt and Jansson, 1990b), where the investigated grand piano was also a Steinway grand piano.15 Power functions were fitted into the data as depicted in Figure 2.7, separately for the two types of touch and the different pianos. They are listed in Table 2.2. Since the data to fit contains also negative values on the y axis, power functions of the form kb = a· hb + c were used. The data spread out more than in the travel time curves (reflected in smaller R2 values) and showed considerable difference between types of touch, except for this Steinway, where touch did not divide the data visibly. Recall that these data apply to specific instruments and depend strongly on their regulation so that generalisation to other instruments may be problematic. Askenfelt and Jansson (1990b) considered key–bottom times as being haptically felt by pianists and thus as being important for the vibrotactile feedback in piano playing. Temporal differences of the order of 30 ms are in principle beyond the temporal order threshold (Hirsh, 1959), but these time differences may be perceived subconsciously and perhaps as response behaviour of a particular piano. Especially, the different key–bottom behaviour for the different kinds of touch might be judged by the pianists as part of the response behaviour of the action (Askenfelt and Jans- son, 1992b). The staccato tones have in addition to the shorter travel time also a longer time interval after hammer–string contact so that the tone appears even earlier and thus louder and more direct than a legato keystroke with comparable intensity.

Time of free flight In order to estimate the time interval after the jack made contact with the escapement dolly (and the hammer travels without any further

15Askenfelt and Jansson (1990b) used a Steinway Model B, #443001, built in Hamburg 1975. 2.2. The piano action as the performer’s interface 27

STEINWAY C YAMAHA DISKLAVIER BÖSENDORFER SE290 90 90 90 lg st rp 80 80 80 G6 (91) C5 (72) 70 70 70 C4 (60) G2 (43) C1 (24) 60 60 60

50 50 50

40 40 40

30 30 30

20 20 20 Time of free flight (ms) 10 10 10

0 0 0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Maximum hammer velocity (m/s)

Figure 2.8: Time of free flight of the hammer. Time intervals between the point of maximum hammer velocity and hammer–string contact are plotted against maximum hammer velocity. acceleration towards the strings) and the sound, the time interval between the point of maximum hammer velocity and hammer–string contact was calculated (‘time of free flight’). These intervals are plotted in Figure 2.8 against maximum hammer velocity. Power curves were approximated also for this data (as listed in Table 2.3). The piano action of this Yamaha grand piano showed two different behaviours for staccato touch at medium intensities (between around 1 and 2 m/s): maximum hammer velocity occurred at two different instants (Figure 2.8, middle panel). This was accounted for by two separate curve fits (Table 2.3). With escapement, the pianist loses control over the tone. The point of maximum hammer velocity coincided well with the escapement point for medium and hard keystrokes, but was sometimes considerably earlier for soft keystrokes. On this Steinway, the time of free flight was almost zero (e.g., below two milliseconds) beyond hammer velocities of 2 m/s. For this B¨osendorfer, the free flight times went below 2 ms at around 2.5 m/s. The same was true for the the legato tones and part of the staccato tones of the Disklavier. However, the other crowd of staccato tones depicted comparatively larger times of free flight up to velocities of about 4 m/s (see Figure 2.8). This early moment of maximum hammer velocity might be ascribed to a hammer lifting off from the jack because of the strong initial force of a staccato attack even before the moment of escapement. There was a tendency for all three measured pianos that staccato tones had longer free flight times (that is later instants of maximum hammer velocity) than legato tones. The differences are of the order of some milliseconds (e.g., at 0.5 m/s they were 11, 17, and 15 ms for this Steinway, the Yamaha, and the B¨osendorfer, respectively according to the curve fits in Table 2.3). Moreover, legato data did not exceed 30 ms at this Steinway, barely at this B¨osendorfer, but considerably (around 25 data points) at this Yamaha, whereas staccato data ranged up to 80 ms and more 28 Chapter 2. Dynamics and the Grand Piano

Table 2.3: Power functions of the form f = a · hb fitted into the data from Figure 2.8 separately for the types of touch and the different pianos. (f is the time interval of free flight; h the maximum hammer velocity.)

legato staccato repro

Steinway f =3.73h−1.6850 f =6.76h−1.7710 R2 =0.9763 R2 =0.7139

−1.7850 −1.2880 −1.1410 Yamaha f =3.72h fearlier =18.08h f =4.04h −2.1490 flater =6.71h R2 =0.9331 R2 =0.9664/0.9514 R2 =0.9623

B¨osendorfer f =5.04h−1.1870 f =7.64h−1.83 f =17.6h−1.866 R2 =0.983 R2 =0.9864 R2 =0.9895 in all three pianos. These findings have interesting implications to piano playing. Longer times of free flight with staccato touch suggest that legato touch allows closer control of the attack than a sudden keystroke from above, because the pianist has longer connec- tion with the hammer and thus longer control over the acceleration. Moreover, the pianist might lose contact to the hammer with staccato touch before the jack is escaped by the jack regulator because the hammer lifts off from the jack. However, we have to bear in mind that the instant of maximum hammer velocity can be quite different from the moment of escapement at very soft tones, in other words, a pianist can also decelerate willingly until escapement. That means that an early instant of maximum hammer velocity might also be due to an hesitating keystroke. Further evaluation of the data (e.g., determining the moment of escapement) will clarify these questions. This remains for future investigation. Moreover, the earlier the hammer reaches its maximum velocity, the more energy it loses on its travel towards the strings, and the larger is the difference between the maximum velocity and the velocity at which the hammer hits the strings. Therefore, playing from the key surface is also a more economic way of playing piano. Espe- cially the very early hammer velocity maxima at the Yamaha’s staccato tones16 re- flect especially uneconomical and uncontrolled ways of attack. This conclusion also coincides with suggestions from piano teaching literature (e.g., G´at, 1965), where legato touch is considered to be more economic and to produce less noise during the attack process.

Comparison among tested pianos In Figure 2.9a, all power curve approxima- tions as reported above (cf. Table 2.1, 2.2, and 2.3) are plotted in a single display, separately for the type of touch (panels) and the three tested piano actions (line style) against the time (in seconds) relative to the hammer–string contact. The

16We found around 20 such tones on our B¨osendorfer and 3 on our Steinway. 2.2. The piano action as the performer’s interface 29

(a)

7 Steinway C Legato touch 6 Yamaha Disklavier Bösendorfer SE290 5 Key−bottom 4 3 Finger−key 2 Max. HV 1

−200 −175 −150 −125 −100 −75 −50 −25 0 25

7 Staccato touch 6 5 4 3 Maximum hammer velocity (m/s) 2 1

−200 −175 −150 −125 −100 −75 −50 −25 0 25 Time relative to hammer−string (ms)

(b)

7 Legato touch Steinway C 6 Yamaha Disklavier Bösendorfer SE290 5 Hammer−string 4 3 2 Key−bottom 1

0 25 50 75 100 125 150 175 200

7 Staccato touch 6 5 4 3 Maximum hammer velocity (m/s) 2 1 Max. HV 0 25 50 75 100 125 150 175 200 Time relative to finger−key (ms)

Figure 2.9: Temporal properties of the three tested grand piano actions. Power curve ap- proximations (cf. Table 2.1, 2.2, and 2.3) for finger–key contact time, instant of maximum hammer velocity (max. HV), and key–bottom contact time (right) for the three pianos (line style) and the two types of touch (panels), (a) relative to hammer–string contact and (b) relative to finger–key contact. In (b), the different instants in time (instant of maximum hammer velocity, hammer–string contact, key–bottom contact) become visually barely distinguishable. 30 Chapter 2. Dynamics and the Grand Piano

temporal differences between extremes in intensity were largest for the finger–key times and smallest for key–bottom times. The differences of the curves between the pianos by different manufacturers were small compared to the differences introduced through diverse touch. The finger–key curve of this Steinway action was the left- most except for loud legato tones. Also our Steinway’s key–bottom curve was the right-most of the three actions. Thus, the Steinway action needed more time for the attack operation than the other two pianos, except for very loud legato tones. The most striking difference between the tested piano actions was the early curve of the hammer velocity maxima on the Disklavier (see Table 2.3, p. 28) which was around 20 ms earlier than the other curves. In Figure 2.9b, the same curves are plotted relative to finger–key. Although in this display the different curves for maximum hammer velocity, hammer–string and key–bottom contact are hard to distinguish, it makes clear, how close together these three different points in time are in comparison to the start of the key acceleration. These data apply only to the tested instruments and temporal behaviour changes considerably with regulation (especially key–bottom contact and the time of free flight, see Askenfelt and Jansson, 1990b). We do not know how different the tempo- ral properties of other instruments of these three manufacturers will be. The timing properties of the actions can be varied considerably by regulation (see Dietz, 1968; Askenfelt and Jansson, 1990b). Changes in regulation (Hammer–string distance, let-off distance) resulted in changes of the key–bottom timing and the time inter- val of the hammer’s free flight, respectively, of up to 5 ms (for a medium intensity Askenfelt and Jansson, 1990b, pp. 56–57). The differences between piano actions in the present data are approximately of the same order.17 It can be concluded that the temporal behaviour of the tested piano actions by different manufacturers were similar. However, no definitive conclusions can be drawn whether or not these (comparably small) differences in temporal behaviour were crucial for the pianist’s estimation of the piano’s quality and whether they apply also to other instruments of these manufacturers.

Acoustic properties Rise time The hammer–string contact was defined as the conceptual onset of a tone, which corresponds closely to the physical onset. From perceptual studies we know that the perceptual onset of a tone might be slightly later than its physical onset, depending on the rise times of the tones (Vos and Rasch, 1981a,b). In this paragraph, the rise time characteristics of the pianos were investigated with re- spect to their pitch and their intensity. For this purpose, the time interval between hammer–string contact and the maximum of the energy of the sound.18 was defined as the rise time of the piano tone.

17Note that all three pianos were maintained and regulated by professional technicians before the measurement so that all pianos were in concert condition before the tests. 18The RMS was calculated with a fixed window of 10 ms. 2.2. The piano action as the performer’s interface 31

STEINWAY C YAMAHA DISKLAVIER BÖSENDORFER SE290

lg st rp 30 30 30 G6 (91) C5 (72) C4 (60) 25 25 25 G2 (43) C1 (24)

20 20 20

15 15 15 Rise time (ms) 10 10 10

5 5 5

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Maximum hammer velocity (m/s)

Figure 2.10: Rise times of piano tones (from hammer–string to the maximum RMS energy of the sound signal) against maximum hammer velocity, separately for three pianos, five pitches, and two types of touch.

The rise times for the three pianos are plotted in Figure 2.10 The rise times ranged between 4 and 23 ms. The data grouped according to their pitch, but did not change with intensity (e.g., the louder the sooner the tones develop). The lower tones needed up to 25 ms to reach their maximum, whereas high pitches achieved maximum energy already after 5 ms. For some soft tones, there was a tendency to slightly shorter rise times by comparison to louder tones of the same pitch. Since the rise times were invariant over the whole dynamic range, also the per- ceptual onsets will not change with tone intensity. Also the perceptual onsets will be later the lower the pitch. This implies for performance research that the measured onsets at computer-monitored instruments (corresponding well with the physical onset of the tone) have to be delayed for lower pitches. According to the present data, the differences will be at most of the order of 10 ms. These differences as small as they are might not be crucial for the analysing researcher, but essential for automatic transcription systems.

Peak sound-pressure level The peak sound-pressure level (in dB) for all tones is plotted against the maximum hammer velocity in Figure 2.11 separately for the pianos, pitch, and type of touch. The microphone position was always very close to the strings (about 10 cm distance). Different pitches depicted slightly different curves with a tendency for the lower pitches to have lower peak sound-pressure levels. There was no effect of type of touch. The same hammer velocity resulted in equal sound level independently of the type of touch. Only for the very soft legato tones on the B¨osendorfer, the same maximum hammer velocity resulted in different sound levels according to the type of touch. For these cases, the maximum hammer velocity was considerably higher than the speed at which the hammer touched the strings for staccato tones, but not for legato tones. 32 Chapter 2. Dynamics and the Grand Piano

STEINWAY C YAMAHA DISKLAVIER BÖSENDORFER SE290 120 120 120

110 110 110

100 100 100

90 90 90

80 80 80

70 70 70

Peak SPL (dB) lg st rp 60 60 60 G6 (91) C5 (72) C4 (60) 50 50 50 G2 (43) C1 (24) 40 40 40 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Maximum hammer velocity (m/s)

Figure 2.11: Peak sound-pressure level (dB) against maximum hammer velocity (m/s) for different pianos, pitch, and type of touch.

2.2.4 General discussion This study provided benchmark data on the temporal properties for three different grand pianos under two touch conditions (legato and staccato). Prototypical func- tions were obtained for travel times, key–bottom times, and the instants of maximum hammer velocity by fitting power curves into measured data. The temporal prop- erties varied considerably between type of touch, only marginally between pianos, and not at all between the different tested keys. The latter was not surprising, since piano technicians generally aim to adjust a grand piano action so that all keys show similar and consistent behaviour over the whole range of the keyboard. The tested pianos were all maintained and tuned by highly skilled technicians before the experiments. Different kinds of actuating the keys produced different ranges of hammer veloc- ity. Very soft tones could only be achieved with a legato touch and the extremely loud attacks only with staccato touch. Playing from the keys (legato) did not allow hammer velocities beyond 4–5 m/s, thus for some very loud intensities hitting the keys from above was the only possible means. A better tone control was achieved through legato touch, because the time of free flight was shorter than in a staccato keystroke. Additionally, depressing a key in a legato manner caused less touch noise which is usually regarded as a desired aesthetic target in piano playing and teaching (cf. e.g., G´at, 1965). The two types of touch (in the present terminology legato and staccato) do represent two poles of a variety of different ways to actuate the piano key (i.e., late acceleration versus early, hesitating in between, or accelerating from escapement point). It must be assumed that a professional pianist will (even unconsciously) be able to produce many different steps of touch between legato and staccato. The travel times and the key–bottom times changed considerably with intensity of key depression. A soft tone may take over 200 ms longer from the first actuation 2.2. The piano action as the performer’s interface 33

by the pianist’s finger to sound production compared to a very sudden fortissimo attack. Moreover, travel times and key–bottom times changed considerably with touch. A staccato tone needed around 30–40 ms less from finger–key to hammer– string than a legato tone with the same hammer velocity. These findings were not surprising, but the performing artist has to anticipate these changes in temporal behaviour while playing in order to achieve the desired timing of the played tones. The pianist not only has to estimate before playing a tone, how long the keystroke will take for what desired dynamic level, but also for what intended way of actuating the key. These complex temporal interactions between touch, intensity and the tone onset are dealt with and applied by the pianist unconsciously; they are established over years of intensive practising and extensive self-listening. Immediately, musical situations come to mind in which loud chords tend to come early with pianists at beginning or intermediate level; or that crescendo passages tend to accelerate in tempo as well, because each keystroke is performed with a harder blow and thus quicker in order to achieve the crescendo, but the time intervals between finger activity were not correspondingly increased. A keystroke starts for the pianist kinesthetically with finger–key contact (the acceleration impulse by the finger) and ends at key–bottom, but it starts aurally (for pianist and audience) at (or immediately after) hammer–string. Typical dynamics (as measured in Chapter 3) of piano performances at intermediate dynamic levels fall between 40 and 60 MIDI velocity units (0.7 to 1.25 m/s) and thus typical travel times are between 80 and 108 ms, thus varying as much as about 30 ms. At such keystrokes, the key–bottom times are between 3.5 and 0.5 ms before hammer–string contact, thus a range of 3 ms. It can be assumed here that with such moderate intensity levels (and a default touch which is likely to be rather legato), the changes in travel times due to varying intensity might not be directly relevant for the player since they are small and at the threshold of perceivability. Thus, they are sufficiently large to produce the typical melody lead (see Chapter 3). At that typical dynamic range, key–bottom times are even more unlikely to be perceived by the pianist separately from the sound (hammer–string), since those temporal differences are there of the order of 1–4 ms. However, the differences between key–bottom and hammer–string can be up to 40 ms in extreme cases which is of the order of or just beyond just noticeable differences (Askenfelt and Jansson, 1992b, p. 345). Also as Figure 2.9b makes visually evident, the travel times are far larger than the time differences of the other readings (maximum hammer velocity, hammer–string, key–bottom), so it can be assumed that the pianist (especially in the dynamic middle range) only senses two points in time: the start of the keystroke (finger–key) and its end which coincides with the beginning of the sound. Although the piano hammer cannot be controlled anymore after the jack was rotated away by the escapement dolly, pianist do still apply force to the key at keybed. In piano education, pianists are usually made aware of that fact. Neverthe- less, pianists pressed down the key although it already arrived at key–bottom (which is certainly after the jack’s escapement). This effect was far stronger for amateur 34 Chapter 2. Dynamics and the Grand Piano

pianists than for expert performers (Parlitz et al., 1998). Experts stopped applying pressure immediately after the key hit keybed whilst amateurs continued force. The immediate reduction of force saves energy and allows relaxation and preparation for a next keystroke. Furthermore, senso-motoric feedback is considered an utmost important factor for pianists not only for judging the action’s response, but also to judge the piano’s tone (Galembo, 1982, 2001). In an extended perception experiment, Galembo (1982) asked a dozen professors from the Leningrad Conservatory of Music to rate the in- strumental quality of three grand pianos under different conditions. The participants agreed that the Hamburg Steinway grand piano was superior, followed by the Bech- stein grand piano, while the lowest quality judgement received a grand piano from the Leningrad piano factory. In different discrimination tasks, the participants were not able to distinguish between the instruments (although all indicated to be able to) only by listening to them when played by some other person behind a curtain. But they could very well discriminate between instruments when they played on them blindly or deaf-blindly (Galembo, 1982, 2001). This study implied that the hapto-sensoric feedback of the piano action to the playing pianist is crucial for the estimation of the instrumental quality. Another important factor on the hapto-sensoric feedback sensed by the pianist is the room acoustics (Bolzinger, 1995). The same piano action might feel easily to handle in a room with reverberant acoustic, whilst the same action feels intractable and tiring in a room without any reverberation. Similarly, the timbre of that instru- ment might be judged differently with changing room acoustics. A pianist is usually not able to separate the influences of room acoustics from properties of the instru- ment and directly attributes room acoustics to instrumental properties (Galembo, 2001). The reported temporal properties of the piano actions were derived from isolated piano tones (without pedal) such as they virtually never occur in piano performances. For a new keystroke, the key does not necessarily have to come back to its resting position, but, due to the double repeating feature of modern grand piano actions, the hammer is captured by the check and the repetition lever stopped by the drop screw (Askenfelt and Jansson, 1990b). When the key is released approximately half way (of the approximately 10 mm touch depth), the jack is able to resile back underneath the roller and another keystroke can be performed. This point is usually some 2– 4 mm below the key surface. For such keystrokes, the key can travel only 6–8 mm, so the travel times are expected to be shorter than in legato key depression from the key’s resting position. Also for such repeated keystrokes, it would be impossible to calculate or to determine a finger–key contact point in time. The different kinds of touch present in the study sometimes displayed portions of noise that stemmed from the finger–key interaction and were clearly perceivable. Especially at the staccato tones of one of the playing pianists (RB), the nail hitting the key was audible in the samples and visible in the wave form of the sound. Although this issue was not investigated systematically here (controlled listening 2.2. The piano action as the performer’s interface 35

tests and comprehensive analyses of the noise portions in the sound), these findings coincided with results from the literature (Askenfelt, 1994; B´aron, 1958; B´aron and Holl´o, 1935; Koornhof and van der Walt, 1994; Podlesak and Lee, 1988). However, it is stated here that the type of touch influences the pianist more through kinaesthetic feedback, through the different times at which the tone is to be expected after hitting the keys, and through the different motor efforts involved more than the manifold emerging noises which cease away at a certain distance from the piano (Bork et al., 1995). Another interesting issue with respect to the reported data is whether there is a relationship between the actions’ temporal properties and the instrumental quality of the tested grand pianos. The author’s opinion as a pianist was that from the three investigated grand pianos in this study the Steinway was qualitatively superior to the other two, although the B¨osendorfer was a high standard concert grand piano. The small Yamaha baby grand was the least interesting instrument also due to its size. However, all pianos were on a mechanically high standard and they were well maintained and tuned. The most convincing feature of the Steinway was (to the author’s opinion) apart from the clear tone the extremely precise action that allowed virtually every subtle control over touch and tone. It is assumed here that one of the most important features of a ‘good’ piano is a precise and responsive action. In the data reported above, some differences between the pianos could be ob- served that might influence the subjective judgement of instrumental quality. The Steinway showed (1) no difference in touch at the shape of the travel time functions; (2) no difference in touch at key–bottom times; (3) short time intervals of free flight (already around zero at keystrokes beyond a hammer velocity of 1.5 m/s, while for the B¨osendorfer at around 2.5 m/s, for the Yamaha above 3 m/s). Moreover, the Disklavier showed many very early hammer velocity maxima at velocities between about 1 and 2 m/s, the B¨osendorfer some, the Steinway almost none. Although further evaluative investigations would be required to more conclu- sively state any hypotheses on the relation of temporal behaviour of grand piano actions and instrumental quality, it seems likely that the constant behaviour over type of touch and late hammer velocity maxima are crucial for precise touch control and a subjective positive appreciation of instrumental quality. 36 Chapter 2. Dynamics and the Grand Piano

2.3Measurement and reproduction accuracy of computer-controlled grand pianos

This section examined the precision of the two reproducing pianos used in the previ- ous section in order to determine benchmark data for performance research on how reliable those devices are. Parts of this work have already been published (Goebl and Bresin, 2001). A slightly modified version of this section will appear in the Journal of the Acoustical Society of America (Goebl and Bresin, 2003a) and was presented at the Stockholm Music Acoustics Conference (SMAC’03, cf. Goebl and Bresin, 2003b).

2.3.1 Introduction Current research in expressive music performance mainly deals with piano interpre- tation because obtaining expressive data from a piano performance is easier than, e.g., from string or wind instruments. Pianists are able to control only a few pa- rameters on their instruments. These are the tone19 onsets and offsets, the intensity (measured as the final hammer velocity), and the movements of the two pedals.20 Computer-controlled grand pianos are a practical device to pick up and to measure these expressive parameters and—at the same time—provide a natural and famil- iar setting for pianists in a recording situation. Two systems are most commonly used in performance research: the Yamaha Disklavier (Behne and Wetekam, 1994; Palmer and Holleran, 1994; Repp, 1995b, 1996c,a,d, 1997b; Juslin and Madison, 1999; Bresin and Battel, 2000; Timmers et al., 2000; Riley-Butler, 2001, 2002), and the B¨osendorfer SE system (Palmer, 1996; Bresin and Widmer, 2000; Goebl, 2001; Widmer, 2001, 2002a,b). Some studies made use of various kinds of MIDI keyboards which do not provide a natural playing situation to a classical concert pianist be- cause they have a different tactile and acoustic response (e.g., Palmer, 1989; Repp, 1994). Both the Disklavier and the SE system are integrated systems (Coenen and Sch¨afer, 1992), which means that they are permanently built into a modern grand piano. They are based on the same underlying principle. That is, to measure and reproduce movements of the piano action, above all the final speed of the hammer before touching the strings. These devices are not designed for scientific purposes and their precise functionality is unknown or not revealed by the companies. There- fore, exploratory studies on their recording and playback precision are necessary in order to examine the validity of the collected data. Both devices have sensors at the same places in the piano action (see Figure 2.1

19The onset of a sounding tone is very often called “note onset,” because of the MIDI world’s terminology. In this paper, the terms “tone” and “note” are used synonymously, since we are not talking about musical notation. 20The middle or sostenuto pedal only prolongs certain tones and is not counted as an individual expressive parameter. 2.3. Measurement and reproduction accuracy 37

on page 16). There is a set of shutters mounted on each of the hammer shanks.21 This shutter interrupts an infrared light beam at two points just before the hammer hits the strings: the first time approximately 5 mm before hammer–string impact, the second time when the hammer crown just starts to contact the strings. These two points in time yield an estimate of the final hammer velocity (FHV ). In the case of the Disklavier, no further information about how this data is processed was obtainable. On the B¨osendorfer, the time difference between these two trip points is called (by definition) inverse hammer velocity (IHV ) and is stored as such in the internal file format. Since the counter of this infrared beam is operating at 25.6 kHz, the final hammer velocity (in meters per second) is: FHV = 128/IHV (Stahnke, 2000; Goebl, 2001, p. 572). The timing of the trip point closer to the strings is taken as the note onset time which has a resolution of 1.25 ms. It seems that the Disklavier uses the same measuring method for hammer velocity and note onset, but as the company does not distribute any more specific details, this is only speculation. The MIDI files of the Disklavier provided 384 MIDI ticks per 512 820 µs (as defined in the tempo command in the MIDI file), thus a theoretical timing resolution of 1.34 ms. A second set of sensors is placed under the keys to measure when the keys are depressed and released. Again, the exact use of this information at the Disklavier cannot be reconstructed, but the B¨osendorfer uses this information for releasing the keys correctly (note offsets) and to reproduce silent tones (when the hammer does not reach the strings). The Disklavier used in this study does not reproduce any silent notes at all. The data picked up by the internal sensors are stored in the Disklavier on an internal floppy drive or externally by using the MIDI out port. The SE system is linked with a special cable plugged into an ISA card of a personal computer running MS DOS. Internal software controls the recording. The information is stored in standard MIDI format on the Disklavier, and in a special file format on the B¨osendorfer (each recording comprises a set of three files with the extensions “.kb” for keyboard information, “.lp” for the loud (right) pedal, and “.sp” for the soft (left) pedal). Although the SE file data are encrypted, the content of the files can be listed with the supplied software and used for analysis. The reproduction is carried out with linear motors (solenoids) placed under the back of each key. The cores of the coils of the Disklavier have a length of approx- imately 7 cm, whereas those of the SE system are at least double that length or more. Pedal measurement and reproduction is not discussed in the present study. Only a few studies provide some systematic information about the precise func- tionality of these devices. Coenen and Sch¨afer (1992) tested five different repro- duction devices (among them a B¨osendorfer SE225 and a Yamaha Disklavier grand piano, DG2RE) on various parameters, but their goal was to evaluate their re- liability for compositional use; their main focus was therefore on the production mechanism. They determined practical benchmark data like scale speed, note repe-

21On the Disklavier, the hammer shutter is mounted closer to the fixed end of the hammer, whereas the SE has its shutter closer to the hammer (as displayed in Figure 2.1). 38 Chapter 2. Dynamics and the Grand Piano

tition, note density (maximum number of notes which can be played simultaneously), minimum and maximum length of tones, and pedal speed. In their tests, the in- tegrated systems (Disklavier, SE) performed generally more satisfactorily than the systems which are built into an existing piano (Autoklav, Marantz pianocorder). The B¨osendorfer, as the most expensive device, had the best results in most of the tasks. Bolzinger (1995) performed some preliminary tests on a Yamaha upright Dis- klavier (MX-100 A), but his goal was to measure the interdependencies between the pianist’s kinematics, performance, and the room acoustics. With his Disklavier, he had the opportunity to play back files and to simultaneously record the movements of the piano with the same device using the MIDI out port. That way, he obtained very easily a production-reproduction matrix of MIDI velocity values, showing a linear reproducing behaviour only at MIDI velocity units between approximately 30 and 85 (Bolzinger, 1995, p. 27). On the Disklavier in the present study, this parallel playback and recording was not possible. Maria (1999) developed a com- plex methodology to perform meticulous tests on a Disklavier (DS6 Pro), but no systematic or quantitative measurements are reported so far. The focus of this study lies on the recording and reproducing accuracy of two computer-controlled grand pianos with respect to properties of the piano action (hammer–string contact, final hammer velocity), and properties of the sounding pi- ano tone (peak sound-pressure level). In addition to this, we report the correspon- dence between physical sound properties and their representation as measured by the computer-controlled pianos (MIDI velocity units), in order to provide a benchmark for performance research (see also Palmer and Brown, 1991; Repp, 1993b). Another issue discussed in the following is the timing behaviour of the grand piano action in response to different types of touch and their reproduction by a reproducing piano. Selected keys distributed over the whole range of the keyboard were depressed by pianists with many degrees of force and with two kinds of touch: with the finger resting on the surface of the key (legato touch), and with an attack from a certain distance above the keys (staccato touch). These different kinds of touch are described in Askenfelt and Jansson (1991).

2.3.2 Method The two computer-controlled grand pianos (the Yamaha Disklavier and the B¨o- sendorfer SE290), the experimental setup, and the procedure were the same as in Section 2.2.2. Immediately before the experiments, both instruments were tuned, and the piano action and the reproduction unit serviced. In the case of the Diskla- vier, this procedure was done by a specially trained Yamaha piano technician. At the B¨osendorfer company, the company’s SE technician took care of this work. This method delivered (1) the precise timing (onset) and dynamics of the original recording, (2) the internally stored MIDI file of the Disklavier or its correspondent of the SE device, and (3) the precise timing and dynamics of the reproduction. For data analysis, only few of the discrete readings done in Section 2.2.2 were 2.3. Measurement and reproduction accuracy 39

YAMAHA DISKLAVIER II BÖSENDORFER SE290 20 20 y = −0.143 ⋅ x −1.381 lg st G6 (91) 2 10 10 R = 0.942 C5 (72) C4 (60) G2 (43) 0 0 C1 (24)

−10 −10

−20 −20

−30 −30 Delay of MIDI file (ms)

−40 y = −0.053 ⋅ x +1.715 −40 R2 = 0.572 −50 −50 0 100 200 300 400 0 100 200 300 400 Original time (s)

Figure 2.12: Timing delays (ms) as a function of recorded time (s) between the original recording and the MIDI file as recorded by the computer-controlled grand pianos for two types of touch: legato (“lg”) and staccato (“st”). Negative values indicate that an onset in the MIDI file was earlier than in the original recording. The straight lines are linear fits of the whole data. used: the hammer–string contact corresponding to the ‘note on’ time in the MIDI file, the maximum hammer velocity, the peak sound-pressure level, and the MIDI velocity value as stored in the recorded MIDI files (or the internal file format of the B¨osendorfer SE system). The onset differences between the original recording and the MIDI file, and those between the original recording and its reproduction were calculated.22 Since the three measurements (original recording, MIDI file, and reproduction) were not synchronised in time by the measurement procedure, their first attacks were defined as being simultaneous. Care was taken that the first tones always were loud attacks, to minimise synchronisation error, since timing error was smaller the faster the attack was. If there were soft attacks at the beginning, the files were synchronised by the first occurring loud attack (hammer velocity over 2 m/s or 77 MIDI velocity units).

2.3.3 Results and discussion

Timing accuracy

In Figure 2.12, the note onset delays of the MIDI file in comparison to the original recording are plotted against the recorded time separately for the two pianos. It is evident that both MIDI files show a constantly decreasing delay over time.

22 delayMIDI =MIDI onset – original onset; delayrepro =reproduced onset – original onset. 40 Chapter 2. Dynamics and the Grand Piano

YAMAHA DISKLAVIER II BÖSENDORFER SE290 30 30 lg st G6 (91) 20 20 C5 (72) C4 (60) G2 (43) C1 (24) 10 10

0 0

−10 −10

Residual timing error (ms) −20 −20 y = 0.00115⋅x2−0.239⋅x+11.620 y = 8.419e−006⋅x3−0.00257⋅x2+0.275⋅x−8.615 R2 = 0.3968 R2 = 0.6928 −30 −30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 MIDI velocity

Figure 2.13: The residual timing error (ms) between the MIDI file and the original record- ing as a function of MIDI velocity, as recorded by the computer-controlled pianos. Again, negative values indicate onsets too early in the MIDI data, in comparison to the original file. The data was approximated by polynomial functions.

This constant timing error in the MIDI file was larger for the SE system than for the Disklavier. The origin of this systematic timing error is yet unknown, but it is likely that the internal counters of the systems (in the case of the SE system, it is a personal computer) do not operate in exactly the desired frequency, probably due to a rounding error. This time drift over time was small (0.0053% or 0.014%, respectively) and negligible for performance research Friberg (tempo changes of that order are far below just-noticeable differences, cf. 1995). But, when such a device has to play in time with, i.e., an audio tape, the synchronisation error will already be perceivable after some minutes of performing. To illustrate the recording accuracy without this systematic error, the residual timing error (the differences between the fitted lines and the data) is plotted in Figure 2.13 separately for the two pianos against recorded MIDI velocity.23 In an earlier conference contribution, a different normalisation method was applied on the same data of the Disklavier (see Goebl and Bresin, 2001). The variance was larger for the Disklavier than the SE system (Yamaha mean: 1.4 ms, standard deviation (s.d.): 3.8 ms; B¨osendorfer mean: 0.2 ms, s.d.: 2.1 ms), but for both pianos, the residual timing error bore a trend with respect to the loudness of the recorded tones. The Disklavier tended to record softer tones later than louder ones; the SE showed the opposite trend, but to a smaller extent and with much less variation (Figure 2.13). The data in Figure 2.13 were approximated by polynomial curves;

23For the SE system, the final hammer velocity needs to be mapped to MIDI velocity values by choosing a velocity map. In the present study, a logarithmic map was always used: MIDIvelocity = 52 + 25 · log2(FHV ). 2.3. Measurement and reproduction accuracy 41

YAMAHA DISKLAVIER II BÖSENDORFER SE290 30 30 rp G6 (91) 20 20 C5 (72) C4 (60) G2 (43) C1 (24) 10 10

0 0

−10 −10

Delay of reproduction (ms) −20 −20

−30 −30 0 20 40 60 80 100 120 0 20 40 60 80 100 120 MIDI velocity

Figure 2.14: Timing delays (ms) between the original and its reproduction by the computer-controlled piano. (No systematic trend had to be removed.)

the formulas are printed there. The R2 values were different for the two pianos. The Disklavier’s approximation explained hardly 40% of the variance, while at the SE system it was about 70%. The Disklavier’s curve fit indicated a larger erroneous trend in recording, and—in addition to that—it possesed larger variabilty around that curve.

The timing delays between the original recording and its reproduction are plotted in Figure 2.14 separately for the two pianos. The systematic timing error of the recording was not observed, so the display against recorded time (as in Figure 2.12) was not required. Evidently, the error in recording was cancelled out by the same error in reproduction. The difference between the two systems became most evident in this display. While the reproduced onsets of the Disklavier differed as much as +20 and −28 ms (mean: −0.3 ms, s.d.: 5.5 ms) from the actual played onset, the largest timing error of the SE system rarely exceeded ±3 ms with a tendency of soft notes coming up to 5 ms too soon (mean: −0.1 ms, s.d.: 1.3 ms). Interestingly, the recording accuracy of the SE system was lower than its reproduction accuracy. Obviously, its internal calibration function aimed successfully to absolute precise reproducing capabilities. It could also be that the SE takes the first trip point (5 mm before the strings) as being the note onset, but calibrates itself correspondingly to overcome this conceptual mistake. However, this assumption is contradicted by information obtained by the SE’s developer, Wayne (Stahnke, 2000, see also Goebl, 2001). 42 Chapter 2. Dynamics and the Grand Piano

YAMAHA DISKLAVIER II BÖSENDORFER SE290 7 7

6 6

5 5

4 4

3 3

lg st 2 2 G6 (91) C5 (72) 1 1 C4 (60) G2 (43) C1 (24) 0 0 Maximum hammer velocity (m/s) reproduced 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Maximum hammer velocity (m/s) original

Figure 2.15: The maximum hammer velocity (m/s) as played by the pianists (x axes) and reproduced by the computer-controlled pianos (y axes). (The diagonal line indicates ideal reproduction.)

Dynamic accuracy The second of the investigated parameters was dynamics which was measured in terms of the speed of the hammer hitting the strings (m/s) or peak sound-pressure level (dB). We defined the hammer velocity to be the maximum hammer velocity (see above) since this value was easy to obtain automatically from the recorded hammer track. Usually, this value corresponded very well with the velocity of the hammer when starting to touch the strings (final hammer velocity), but especially for soft notes the maximum hammer speed was larger than the hammer speed at the strings. In this case the time between the escapement (when the hammer loses physical connection to the key, that is, when the jack is catapulted away by the escapement dolly; for more detail see Askenfelt and Jansson, 1990b; Goebl et al., 2003) and hammer–string contact can be as long as 100 ms or more. The actual final hammer velocity was hard to determine from the hammer accelerometer measurements, but the computer-controlled devices measure an average velocity of the last 5 mm of the hammer’s travel to the strings (approximately the last 10% of that distance). In Figure 2.15, the reproduced maximum hammer velocity is plotted against the original maximum hammer velocity. It becomes evident that the Disklavier’s solenoids were not able to reproduce above a certain hammer speed. This varied slightly between keys, e.g., the G6 (with less hammer mass than hammers at a lower pitch) was accelerated up to 3.5 m/s, whereas a C1 (with a comparatively heavy hammer) only up to 2.4 m/s. On the SE system, this ceiling effect was not so evident, and there was no obvious effect of pitch as for the Disklavier. Especially in very loud staccato tones, the first impact of the finger hitting the key resulted in a very high-peak hammer velocity which decreased significantly until hammer–string 2.3. Measurement and reproduction accuracy 43

YAMAHA DISKLAVIER II BÖSENDORFER SE290 120 120

110 110

100 100

90 90

80 80

70 70 lg st G6 (91) C5 (72) Peak SPL (dB) reproduced 60 60 C4 (60) G2 (43) C1 (24) 50 50 50 60 70 80 90 100 110 120 50 60 70 80 90 100 110 120 Peak SPL (dB) original

Figure 2.16: Peak sound level (dB) as measured in the tones performed by the pianists (x axes) and reproduced by the computer-controlled pianos (y axes). contact. The solenoid was not able to reach this high-peak hammer velocity (and is not programmed to do so), but it aimed to reproduce the measured final hammer velocity properly (see also Figure 2.18). In this light, the maximum hammer velocity did not seem to be a proper measure. Instead, the peak sound-pressure level (dB) was considered (see Figure 2.16). This display compares acoustic properties of the played tones with their repro- duction (peak SPL in dB, Figure 2.16). Here, the SE system revealed a much more precise reproducing behaviour over the whole dynamic range than the Disklavier. In the latter, the dynamic extremes flattened out, soft tones were played back too loudly and very loud tones too softly. In Figure 2.17, the relation between MIDI velocity units and peak sound-pressure level is displayed separately for the recording (a) and its reproduction (b). On both instruments, different pitches exhibited a different curve. The higher the pitch, the louder the radiated sound at the same MIDI velocity. The reproduction panel (Figure 2.17b) reflects the reproducing limitations of the Disklavier already shown in Figure 2.16.

Two types of touch

Examples of a legato attack (Disklavier, see Figure 2.18) and a staccato attack (SE, see Figure 2.19) are shown in order to demonstrate the reproducing behaviour of the computer-controlled pianos. In these figures, instantaneous key and hammer velocity with the sound signal are plotted. In Figure 2.18 on the left side, a legato attack as played by one of the authors is shown with its smooth acceleration, on the right its reproduction by the Disklavier. The Disklavier hit the key always in a staccato 44 Chapter 2. Dynamics and the Grand Piano

(a) YAMAHA DISKLAVIER II BÖSENDORFER SE290

lg st 110 110 G6 (91) C5 (72) C4 (60) 100 100 G2 (43) C1 (24)

90 90

80 80 SPL (dB)

70 70

60 60

0 20 40 60 80 100 120 0 20 40 60 80 100 120 MIDI velocity (b) YAMAHA DISKLAVIER II BÖSENDORFER SE290

rp 110 110 G6 (91) C5 (72) C4 (60) 100 100 G2 (43) C1 (24)

90 90

80 80 SPL (dB)

70 70

60 60

0 20 40 60 80 100 120 0 20 40 60 80 100 120 MIDI velocity

Figure 2.17: Peak sound-pressure level (dB) against MIDI velocity as recorded by the computer-controlled pianos. The upper panels show legato touch (“lg”), and staccato touch (“st”) as played by the pianist (a), the lower display the reproduction (“rp”) by the computer-controlled pianos (b). 2.3. Measurement and reproduction accuracy 45

fk kb hs fkhs kb hs−fk: 36.8 ms hs−fk: 25.9 ms 0.5 kb−hs: −3.0 ms 0.5 kb−hs: 1.6 ms

0 0

Key velocity (m/s) −0.5 Key velocity (m/s) −0.5

4 fk kb hs 4 fkhs kb maxHv: 3.765 m/s maxHv: 2.794 m/s 2 2

0 0

−2 −2 Hammer velocity (m/s) Hammer velocity (m/s) −4 −4 0.4 fk kb hs 0.4 fkhs kb SPL: 101.13 dB SPL: 98.53 dB 0.2 0.2 1/+1) 1/+1) − − 0 0

−0.2 −0.2 Amplitude ( Amplitude (

−0.4 −0.4 −60 −50 −40 −30 −20 −10 0 10 −60 −50 −40 −30 −20 −10 0 10 Time (ms) Time (ms)

Figure 2.18: A forte attack (C4, MIDI note number 60) played by one pianist (left panel) ‘from the key’ (legato touch), and its reproduction by the Yamaha Disklavier (right). The upper panels plot key velocity, the middle hammer velocity, the bottom panels the sound signal. The three lines indicate the finger–key contact (start of the key movement, “fk,” left dashed line), the key–bottom contact (“kb,” dotted line), and the hammer–string contact (“hs,” solid line). manner, with an abrupt acceleration at the beginning of the attack. The parts of the piano action were condensed before its inertia was overcome and the hammer started to move upwards. The solenoid’s action resulted in a shorter travel time (the time between finger–key contact (“fk”) and hammer–string contact (“hs”) is 26 ms instead of 37 ms, see Figure 2.18, upper panels). The travel time difference between production and reproduction was even larger at very soft keystrokes. This could be one reason why soft notes appear earlier in the reproduction by the Disklavier than louder notes. In this particular attack, the difference in peak hammer velocity was clearly audible. When the (final) hammer velocities became similar, the two sounds, in- dependently on how they were produced (legato—staccato—reproduced) became indistinguishable, as informal listening to the material suggests. Systematic listen- ing tests have to be performed in future work. Furthermore, we cannot tackle here the old controversy as to whether it is only hammer velocity that determines the sound of an isolated piano tone (White, 1930; Hart et al., 1934; Seashore, 1937), or if the pianist can alter the piano tone with a specific type of touch so that there are more influencing factors like various types of noise emerging from the piano action and the pianist’s interaction with it (B´aron and Holl´o, 1935; B´aron, 1958; Podlesak 46 Chapter 2. Dynamics and the Grand Piano

fk kb hs fk kb hs 1.5 1.5 hs−fk: 16.1 ms hs−fk: 18.4 ms kb−hs: −5.4 ms kb−hs: −1.6 ms 1 1

0.5 0.5

0 0 Key velocity (m/s) Key velocity (m/s) −0.5 −0.5

fk kb hs fk kb hs 6 maxHv: 5.792 m/s 6 maxHv: 5.390 m/s 4 4 2 2 0 0 −2 −2 −4 −4

Hammer velocity (m/s) −6 Hammer velocity (m/s) −6 0.5 fk kb hs 0.5 fk kb hs SPL: 107.71 dB SPL: 107.70 dB 1/+1) 1/+1) − − 0 0 Amplitude ( Amplitude (

−0.5 −0.5 −30 −25 −20 −15 −10 −5 0 5 10 15 −30 −25 −20 −15 −10 −5 0 5 10 15 Time (ms) Time (ms)

Figure 2.19: A fortissimo attack (C4, MIDI note number 60) played by one pianist (left panel) from a certain distance above the key (staccato touch), and its reproduction by the B¨osendorfer SE grand piano (right). The upper panels plot key velocity, the middle hammer velocity, the bottom panels the sound signal. The three lines indicate the finger– key contact (start of the key movement, “fk,” left dashed line), the key–bottom contact (“kb,” dotted line), and the hammer–string contact (“hs,” solid line).

and Lee, 1988; Askenfelt, 1994; Koornhof and van der Walt, 1994). In the context of touch, the author considers the hapto-sensoric feedback from the piano to the player as crucial. Through this feedback, the specific touch of one keystroke might influence the performer’s possibilities to play a next keystroke.

A very loud staccato attack is plotted in Figure 2.19 with the original, human attack on the left, and its reproduction by the B¨osendorfer SE on the right. The point of maximum hammer velocity was 6 ms before hammer–string contact in the original recording, but only 2.5 ms in the reproduction. Although the reproduced maximum hammer velocity was lower (4.6 m/s instead of 5.6 m/s), the reproduced peak SPL was slightly higher than that of the original sound (109.63 dB instead of 108.92 dB). The human player accelerated the key extremely abrupt so that the hammer reached its highest speed quite before hitting the strings and—of course— lost energy at its free flight to the strings. Since the reproducing solenoid could not accelerate the key in the same abrupt way as the human player, the hammer reached maximum speed later, and—in this example—the machine performed with less energy loss than the human player. 2.3. Measurement and reproduction accuracy 47

2.3.4 General discussion

In this study, we measured the recording and reproducing accuracy of two computer- controlled grand pianos (Yamaha Disklavier, B¨osendorfer SE) with an accelerometer setting in order to determine their precision for piano performance research. Both devices showed a systematic timing error over time which was most likely due to a rounding error in the system clock (the internal hardware at the Disklavier, a com- mon personal computer at the SE). This linear error removed, the B¨osendorfer had a smaller (residual) timing error than the Disklavier, but both exhibited a certain trend with respect to the loudness of the tones. The Disklavier tended to record soft tones too late whereas the SE had the tendency to record soft tones too early. But within these tendencies, the SE was more consistent. During reproduction, the superior performance of the B¨osendorfer became more evident: the timing error was smaller than during recording whereas the Disklavier increased in variance in comparison to its recording. The important point for performance research is the recording accuracy of these systems. Apart from the systematic error that only marginally affects the mea- sured tempo value (0.0053% or 0.014%, respectively), the residual timing error (Fig- ure 2.13) was considerably large for the Disklavier and smaller for the B¨osendorfer. The measurement precision can be improved by substracting these trends using the polynomial curve approximations as displayed in Figure 2.13. To examine reproducing accuracy in the loudness dimension, we used the max- imum hammer velocity and the peak sound-pressure level as measures. Maximum hammer velocity did not correspond to the velocity measures captured by the sen- sors of the two systems. Considering the peak sound levels of the sounding signal, both devices recorded at a similar precision. However, the Disklavier system could not reproduce very loud tones properly most likely due to its smaller solenoids. The lower the pitch (and thus the greater the hammer mass), the lower was the max- imum sound-pressure level of the Disklavier’s reproduction. The reproduction of soft notes was also limited (very soft notes were played back somewhat louder by the Disklavier), because the tested Disklavier prevented very soft tones from be- ing silently reproduced with a minimum velocity matrix, adjustable by the internal control unit. It was also due to this function that the Disklavier was not able to reproduce silent notes, a crucial feature especially for music of the 20th century. The B¨osendorfer exhibited linear reproducing behaviour over the whole dynamic range (from 60 to 110 dB SPL). As another, and indeed very important criterion of recording and reproducing capability, we did not investigate the two pedals.24 The use of the right pedal was not investigated extensively up to date (apart from Repp, 1996b, 1997b). We did

24We are talking only of the right and the left pedal of grand pianos, since the middle pedal— the sostenuto pedal—only varies the tone length of certain keys depressed during its use, which is recorded and reproduced by simply holding down the corresponding keys the same time this pedal was depressed. 48 Chapter 2. Dynamics and the Grand Piano

not have any hypotheses of how pedal recording and reproducing accuracy should be approached. This item remains for future work. Both the Disklavier and the SE system are based on the same underlying prin- ciple. That is, to measure and reproduce movement of the piano action (and the pedals), in particular the final speed of the hammer before touching the strings. This principle is fundamentally different from what a performing artist does when playing expressively. The artist controls his/her finger and arm movements in order to reproduce a certain mental image of the sound by continuously listening to the resulting sound and by feeling the hapto-sensory feedback of the keys (Galembo, 1982, 2001). In this way, the performer is able to react to differences in the action, the voicing, the tuning, and the room acoustics, just to mention a few variables that have a certain influence on the radiated sound. On the other hand, a reproducing piano aims to reproduce a certain final hammer velocity independently of whether room acoustics, tuning, or voicing changed since the recording or not. Even if the reproduction takes place on the same piano and immediately after the recording, the tuning might not be the same anymore and the mechanical reproduction, as good as it might be, does not result in an identical sounding performance as the pi- anist played it before. This obvious limitation of such devices becomes most evident when a file is played from a different piano or in a different room. Especially, if the damping point (the point of the right pedal where it starts to prevent the strings from freely oscillating) is a different one on another piano, the reproduction could sound too blurred (too much pedal) or too “dry” (too little pedal). One possible solution to this problem could be a reproducing device with “ears,” in other words, the piano should be able to control its acoustical output via a feedback loop through a built-in microphone. If put into a different room, the device could check the room acoustics, its pedal settings, and its current tuning and voicing before the playback starts, much the same as a pianist warming up before a concert. Such a system would require a representation of loudness or timbre other than MIDI velocity, indicating at what relative dynamics a certain note was intended to sound in a pianist’s performance. As the present study was planned to investigate the usefulness of the two devices in question for performance research, we have to consider the obtained results in the light of practical applications. Although the B¨osendorfer is the older system, it generally performed better. The disadvantage of the B¨osendorfer is its price, around double the price of a grand piano of that size. Moreover, the SE system is not produced anymore, and there were only about 35 exemplars sold around the world, and very few in academic institutions (such as Ohio State University,at Columbus, USA, or the Hochschule f¨ur Musik at Karlsruhe, Germany).25 On the other hand, the Disklavier is a consumer product, the price level generally cheaper than the B¨osendorfer (depending on type of system), and therefore more likely to be obtained by an institution.

25The SE system was recently completely re-engineered and was expected to be available com- mercially at the B¨osendorfer company by mid-2002 (Dain, 2002). 2.3. Measurement and reproduction accuracy 49

The Disklavier measured in this study was certainly not the top model of the Yamaha corporation. Since then, Yamaha issued the Mark III series and the high- end series, called Pro (e.g., the special Pro2000 Disklavier). The latter series uses an extended MIDI format (with a velocity representation using more than 7 bits), and additional measures like key release velocity to reproduce the way the pianist released a particular key. It can be expected that these newer devices perform significantly better than the tested Mark II grand piano. Since these more sophisticated devices were not available for the authors or were too far away from the accelerometer equipment, which was too costly to transport, this has to remain a subject for future investigations. This study examined the reliability of computer-controlled pianos for perfor- mance research. It showed that not all of the data as output by such devices can be blindly relied on. Although especially the timing data can be listed with around one millisecond of precision, this seemingly high accuracy has to be interpreted by the researcher with caution. Differences of ±10 ms with an effect of tone intensity (as found for recording accuracy of the examined Disklavier) might blur performance data considerably so that, e.g., a study on tone onset asynchronies as reported in Chapter 3 with a B¨osendorfer SE would not have delivered reliable results with a Disklavier such as examined in the present study. However, strictly speaking each model (e.g., an upright Disklavier) has to be measured and examined individually before its specific accuracy can be determined for the purpose of performance stud- ies. 50 Chapter 2. Dynamics and the Grand Piano

2.4 A note on MIDI velocity

When a hammer hits the strings with a certain velocity at a certain pitch, it produces a tone with a certain intensity. The same hammer velocity with a different hammer at an adjacent pitch will produce a tone with similar loudness, but still it will not sound equally loud. These differences are due to a slightly different regulation of the action, to a different density of the hammer felt, and to different resonances of the strings, the soundboard, and room acoustics. Although piano technicians try to maintain action, hammers, and the tuning so that adjacent tones show similar behaviour in sound quality and touch and that the whole keyboard will exhibit a consistent behaviour, total equality of tones will not be possible to achieve at an acoustic musical instrument. In Figure 2.11 (p. 32) and in Figure 2.17 (p. 44), the same hammer velocity resulted in different peak sound levels at different pitches. Repp (1997a) measured peak sound level of every second tone in a range of 5 octaves (from C2 to C7) pro- duced by a Yamaha Disklavier baby grand piano26 with 5 different MIDI velocities (from 20 to 100). He found large unsystematic variability from one tone to the next with slightly higher intensity in the middle register (Repp, 1997a, p. 1880, Fig. 2). These data were similar to measurements of an earlier study of his (Repp, 1993b). In order to obtain a complete picture of the peak sound level behaviour of a grand piano, the B¨osendorfer grand piano used in Chapter 3 was examined over the whole range of the keyboard. Computer-generated files advised the SE system to produce tones of MIDI velocities between 10 and 110 in steps of 2 units for all of its 97 keys (4947 tones in total).27 Each tone lasted for 300 ms and was followed by silence of variable length (longer in the bass, shorter in the middle, longer again in the treble where strings are not damped anymore). In order to avoid immoderate warming of the linear drives of the reproducing system, the tones were arranged so that the pause between two attacks was kept maximum. The microphones (two AKG CK91 positioned in an ORTF setup28) were positioned aside the grand piano at the open lid about 1.5 meters from the strings and connected to a digital audio tape (Tascam DA–P1 DAT recorder, set to a sampling frequency of 44100 Hz, 16-bit word length, and stereo). The recordings were transferred digitally to computer hard disk using a “Creative SB live! 5.1 Digital” soundcard into WAV files and analysed with the help of Matlab scripts. The signal was transferred into its sone representation with an implementation of Zwicker’s loudness model (Zwicker and Fastl, 1999) by Elias Pampalk (similar approaches were used in Pampalk et al., 2002, 2003).29 The audio

26Yamaha Disklavier grand piano, Mark II. An exemplar of the same series was used in the present experiments, see Section 2.2 and 2.3 27The B¨osendorfer Imperial 290 cm grand piano features 9 additional keys in the bass so that the lowest key is the C0 (see grey keys in Figure 2.20). 28The two microphones are from from each other in approximately the same distance as the two human ears, in an angle of 120 degrees. 29Another approach of implementing Zwicker’s model was used by Langner et al. (1998). How- ever, this implementation could not be used here due to copyright restrictions. 2.4. A note on MIDI velocity 51

(a) Bösendorfer SE 290−3 (Jan 7, 2002) 0 110

MIDI velocity

−6 110 100 100 90 −12 80 90 70 −18 60 80 50

−24 40 70

30 −30 60 20 MIDI velocity −36 50

10 −42 40 Peak sound level (dB) (avg)

−48 30

20

440 Hz 10 C0 C1 C2 C3 C4 C5 C6 C7 C8 (12) (24) (36) (48) (60) (72) (84) (96) (108) Pitch (b) Bösendorfer SE 290−3 (Jan 7, 2002) 30 110

100 25 90

20 80

MIDI velocity 70 15 MIDI velocity 110 100 60

10 90 50 80

70 40 Peak loudness (sone) (avg) 5 60 50 40 30 30 20 0 10 20

440 Hz 10 C0 C1 C2 C3 C4 C5 C6 C7 C8 (12) (24) (36) (48) (60) (72) (84) (96) (108)

Figure 2.20: Lines of equal MIDI velocity against pitch measured at the B¨osendorfer SE 290–3, once in terms of dB peak sound level (a) or in terms of sone peak loudness (b). MIDI velocity ranged from 10 to 110 in steps of 2 units. The data were averaged over the two channels of the recording. 52 Chapter 2. Dynamics and the Grand Piano

signal was converted into the frequency domain and bundled into critical bands according to the Bark scale. After determining spectral and temporal masking effects, the loudness sensation (sone) was computed from the equal loudness levels (according to Terhardt, 1979) which in turn was calculated from the sound-pressure level in decibel (dB–SPL). The present sone implementation deviates to those used in Pampalk et al. (2002, 2003) that the calculation of equal loudness contours was replaced with a model by Terhardt (1979). The loudness envelope was sampled at 11.6 ms intervals according to the window size and sampling rate used (1024 at 44100 samples per second with 50% overlap). The onsets were determined from the loudness curve automatically by a simple threshold procedure.30 Peak sound level values (in dB) and peak loudness values were taken for each onset separately for the two channels. From the nominally produced 4947 tones, 266 were detected as silent (when the hammer was too slow to hit the strings) or missing tones.31 Theresultsaredisplayed in Figure 2.20 in terms of lines of equal MIDI velocity against pitch. In the upper panel, the intensity is plotted in terms of dB peak sound level,32 in the lower one in terms of sone (in both panels, the data was averaged over the two channels of the recording). Every fifth line (every tenth MIDI velocity value) is printed black to ease orientation in the figure. The individual pitches showed considerable different peak sound levels at the same MIDI velocity. No systematic trend over the keyboard could be observed, but the lines of equal MIDI velocity ran always parallel; virtually no crossing of lines occurred. This indicates that the intrinsic properties of a given pitch were consistent over the whole dynamic range. In Figure 2.20b, the sone representation showed a less regular picture. The lines crossed often; the order of MIDI velocity units did not always correspond to the order of sone values. There was a trend over the keyboard that low pitches depicted smaller sone values than higher pitches. This trend could be explained by the nature of Zwicker’s loudness model that adds up individual loudness per frequency band (bark). The higher the pitch, the more energy appears in the higher frequency bands and thus, the overall sone values become higher. The sudden peak in the highest octave is likely due to a drop in the equal-loudness contours between around 2700 and 3700 Hz, reflecting a sensitivity of the ear in that region. However, these facts alone cannot explain the shape of the present representation. Going back to the data of the individual channels, only one of the two channels showed this trend over pitch, but not the other one. Since the two microphones

30An onset was defined as depicting a larger loudness increment than 0.2 sone. This simple definition worked stable over the whole range of the keyboard and robustly differentiated between onsets and silent tones. 31Between the C#2 (26) and and Eb (28), all tones below MIDI velocity 40 were missing due to a tape error (90 tones). 32Not calibrated to a reference hearing threhold and thus in terms of negative level values from the sound file’s maximum amplitude 2.4. A note on MIDI velocity 53

Bösendorfer SE 290−3 (Nov 6, 2001 & Jan 7, 2002) 0

−6 MIDI velocity −12 80

−18 60 Figure 2.21: Selected lines −24 of equal MIDI velocity (20, 40 40, 60, and 80 units) for two −30 channels of two recording sessions (Nov 6, 2001and −36 Jan 7, 2002) plotted against

Peak sound level (dB) pitch (from C4 to C6). −42 Jan 7,2002, ch.1 20 Jan 7,2002, ch.2 Nov 6,2001, ch.1 −48 Nov 6,2001, ch.2 440 Hz C5 C6 C4 (60) (72) (84) Pitch pointed at different directions during recording, it is likely that this trend (the higher the louder) was due to microphone position, sound radiation, and room acoustics. In the present recording, one microphone pointed more towards the treble strings than the other. The microphone pointing to the treble strings captured more the di- rect sound including all high-frequency noise components especially from the strings close it, while the other captured more indirect sounds after reflections from the walls. This might explain why the channel form the microphone pointing towards the treble strings showed an increase of peak loudness over pitch. In addition to that, the values derived from the signal represent only peak sound- level values or peak loudness values. As loudness perception integrates over time during an interval of approximately half a second (cf., e.g., Hall, 2002, p. 119), the overall energy of a single tone might not increase over pitch as displayed in Figure 2.20b. Repp (1997a) found that the variation in peak sound level over the keyboard changed significantly with microphone position. To replicate this finding, the two channels of the recording accomplished on January 7, 2002 were compared with an earlier recording performed in November 6, 2001 on the same B¨osendorfer SE grand piano.33 In the latter recording session, tones from C4 (60) to C6 (84) were

33The recording equipment was identical to the recording session on January 7, 2002. The microphones were placed more closely to the strings (approximately 1meter from the strings at 54 Chapter 2. Dynamics and the Grand Piano

Table 2.4: Mean correlation coefficients between (eight) lines of equal MIDI velocity (from 20 to 90 in steps of 10 units) of the four sources (two channels of two recording sessions). The displayed coefficients were averaged over 21(auto correlation) or 64 coeffi- cients.

7.Jan’02 Ch.1 7.Jan’02 Ch.2 6.Nov’01 Ch.1 6.Nov’01 Ch.2 7.Jan’02 Ch.1 0.9546 0.5221 −0.1023 0.0543 7.Jan’02 Ch.2 0.9708 0.1690 0.2601 6.Nov’01 Ch.1 0.8604 0.0759 6.Nov’01 Ch.2 0.8023 produced with all MIDI velocities ranging from 20 to 90 units, again each tone lasting for 300 ms. Selected lines of equal MIDI velocity (20, 40, 60, and 80 MIDI velocity units) from this recording are compared to the more recent recording in Figure 2.21. The lines of the different sources did not depict parallel behaviour. A tone which had a peak in one channel, did not necessarily have a peak in another. To quantify the relations of the four sets of lines to each other, all lines of equal MIDI velocity (from 10 to 90 in steps of 10 units) from the four sources (two channels of two recording sessions) were correlated to each other. The result was a correlation matrix of 32 by 32 coefficients. The mean coefficients for each source are listed in Table 2.4. The mean correlations of the eight lines of equal MIDI velocity to lines of their own group showed high correlation coefficients, whereas no other combination got a significant correlation coefficient (except the two channels of the 2002 recording). This finding suggested that the digitised samples of the recorded sound exhibited a consistent intensity pattern when recorded from exactly the same position, but may have a totally different pattern with another microphone position. No ultimate conclusions can be drawn from the peak sound levels of these recordings other than that the intrinsic intensity response of a given piano cannot be derived from it, at least not with these methods. Nevertheless, for the purpose of the study reported in Section 4.4 and 4.5, the peak sound level for samples from only a single source (one channel) was sufficiently reliable. The perception of dynamics of piano tones can be partly independently from its sound level (Parncutt and Troup, 2002). Although usually changes in dynamics result both in changes in loudness and changes in timbre, listeners might use the timbral information more to overcome different loudness levels due to differences in the distance to the source or differences in recording volume at recordings. Imagine a piano that is played by someone in the room next to you. Although the level is not as loud as if you were in the room in which the piano was played, you will be able to tell how loud the pianist played (e.g., fortissimo or mezzo forte). Similarly, when listening to a piano recording on a stereo system, you can turn up and down the the right-hand side of the piano viewed by the sitting pianist). 2.4. A note on MIDI velocity 55

volume and you hear (possibly after a short moment of adaptation) exactly what dynamics, what timbral intensity the piano was played with. To overcome the above mentioned problems of inferring dynamic level of a piano from loudness information derived from recorded samples, a perceptual scale of piano dynamics is suggested here that includes timbral models of the piano tone for each pitch and the whole dynamic range in order to deduct intensity information independently from the sound level of the signal. However, the author is aware of the problems arising from sympathetic vibrations of more than one piano tone at a time and the use of pedals. Also for this reason, this has to remain for future investigations. 56 Chapter 2. Dynamics and the Grand Piano Chapter 3

Bringing Out the Melody in Homophonic Music—Production Experiment

This chapter reports research already published in Goebl (2000, 2001). As reported in the recent literature on piano performance research, the melody—as the most important voice—is not only played louder, but also around 30 ms earlier (melody lead). This effect is generally associated with, and presumably causally related to, differences in hammer velocity between the melody and accompaniment (velocity artifact). The velocity artifact explanation implies that pianists initially strike the keys in synchrony; it is only different velocities that make the hammers arrive at different points in time. Two pieces by Fr´ed´eric Chopin were performed on a B¨osendorfer computer- controlled grand piano (SE290) by 22 skilled pianists. The performance data were investigated with respect to the relative tone-onset timing (tone-onset asynchrony) and dynamic differences between the melody tones and the accompaniment. Fur- thermore, this study examined the asynchronies at the beginning of the key move- ment (finger–key). These asynchronies were estimated through calculation. For this, Goebl (2000, 2001) used information from an internal computer memory chip of the B¨osendorfer SE system in which the system stores internal calibration measurements of how long the hammer of each key needs to travel from its resting position to the string contact in relation to the also measured final hammer velocity. This infor- mation was extracted with the help of the SE developer Wayne Stahnke, who never confirmed the interpretation of that data. Since the first publication, the piano action timing properties and especially the travel time functions were studied in detail with an extended measurement setup (as reported in Chapter 2). The results from Goebl (2000, 2001) were adjusted with these more recent travel time functions and reported in Section 3.6 (p. 74).

57 58 Chapter 3. Production of Melody

3.1 Introduction

Simultaneous notes in the printed score (chords) are not played strictly simultane- ously by pianists. An emphasised voice is not only played louder, but additionally precedes the other voices typically by around 30 ms; this phenomenon is referred to as melody lead (Hartmann, 1932; Vernon, 1937; Palmer, 1989, 1996; Repp, 1996a). It is still unclear whether this phenomenon is part of the pianists’ deliberate expressive strategies and used independently from other expressive parameters (Palmer, 1996), or whether it is mostly due to the timing characteristics of the piano action (velocity artifact, Repp, 1996a) and thus a result of the dynamic differentiation of different voices. Especially in chords played by the right hand, high correlations between hammer velocity differences and melody lead times (between melody notes and ac- companiment) seem to confirm this velocity artifact explanation (Repp, 1996a). The data used in previous studies, derived mostly from computer-monitored pianos, represent asynchronies at the hammer–string contact points. The present study examined asynchrony patterns at the finger–key contact points as well. These finger–key asynchronies represent what pianists initially do when striking chords. If the velocity artifact explanation is correct, the melody lead phenomenon should disappear at the finger–key level. This means that pianists tend to strike the keys almost simultaneously, and it is only the different dynamics (velocities) that result in the typical hammer–string asynchronies (melody lead).

3.1.1 Background In considering note onset asynchronies, one has to differentiate between asynchronies that are indicated in the score (arpeggios, appoggiaturas) and asynchronies that are performed but not especially marked in the score. The latter come in two kinds: (1) The melody precedes other voices by about 30 ms on average (melody lead), or (2) the melody lags behind the other voices. Asynchronies of the second type occur mainly between the two hands and usually show much larger timing differences (over 50 ms). A typical example would be when a bass note is played clearly before the melody (melody lag or bass lead ), which is well known from old recordings of piano performances, but has been observed in contemporary performances too (Palmer, 1989; Repp, 1996a). Asynchronies of the first type are common within one hand (especially within the right hand, as the melody often is the highest voice), but may also occur between the hands. Note asynchronies have been studied since the 1930s, when Hartmann (Hart- mann, 1932) and the Seashore group (Vernon, 1937) conducted the first objective investigations of piano performances. Hartmann used piano rolls as a data source and found mostly asynchronies of the second type. Vernon (1937) differentiated between asynchronies within one hand and asynchronies between different hands. For the former he observed melody lead (type 1), whereas the latter mainly showed bass note anticipation (type 2). 3.1. Introduction 59

In the recent literature, Palmer (1989, 1996) and Repp (1996a) have studied the melody lead phenomenon. Palmer (1989) used electronic keyboard recordings to analyse chord asynchronies among other issues. Six pianists played the beginning of the Mozart Sonata K. 331 and of Brahms’ Intermezzo op. 117/1 (“Schlaf sanft, mein Kind...”). The melody led by about 20 to 30 ms on average; this effect decreased for deliberately ‘unmusical’ performances and for melody voices in the middle of a chord (Brahms op. 117/1). In a second study, melody lead was investigated exclusively (Palmer, 1996). Six pianists played the first section of Chopin’s Pr´elude op. 28/15 and the initial 16 bars of Beethoven’s Bagatelle op. 126/1 on a B¨osendorfer computer-monitored grand piano (SE290, as in the current study). Again, melody lead was found to increase with intended expressiveness, also with familiarity with a piece (the Bagatelle was sight-read and repeated several times), and with skill level (expert pianists showed a larger melody lead than student pianists). In another study published at the same time, in part with the same music, Repp (1996a) analysed 30 performances by 10 pianists of the whole Chopin Pr´elude op. 28/15, a Pr´elude by Debussy and Tr¨aumerei by Schumann on a Yamaha upright Disklavier. To reduce random variation, Repp averaged over the three performances produced by each pianist. He then calculated timing differences between the (right hand) melody and each other voice, so that asynchronies within the right hand and between hands could be treated separately. He argued that melody lead could be explained mostly as a consequence of dynamic differences between melody and accompaniment. Dynamic differences (differences in MIDI velocity) were positively correlated with timing differences between the melody and each of the other voices, and these correlations were generally higher for asynchronies within the right hand than for those between hands. Palmer (1996) also computed correlations between melody lead and the average hammer velocity difference between melody and accompaniment, but her correla- tions were mostly non-significant. In her view, the anticipation of the melody voice is primarily an expressive strategy that is used independently from other perfor- mance parameters such as intensity, articulation, and pedal use. In a perception test, listeners had to identify the intended melody in a multi-voiced piece by rating different artificial versions: one with intensity differences and melody lead, one with melody lead only, and one without any such differences. Melody identification was good for the original condition (melody lead and intensity difference), but the re- sults in the melody lead condition did not differ much from the results in the neutral condition, especially for non-pianist listeners. Only pianist listeners showed some success in identifying the intended melody from melody leads alone. A condition with intensity differences only was not included (Palmer, 1996, p. 47).

3.1.2 Piano action timing properties The temporal properties of the piano action were explained in detail in Section 2.2.1 (p. 10) and will not discussed here again. 60 Chapter 3. Production of Melody

3.2 Aims

Almost nothing is known about asynchronies at the finger–key level, because none of the instruments used for acquiring performance data measure this parameter. However, to clarify the origin of melody lead, it is important to consider exactly those finger–key asynchronies. When pianists stress one voice in a chord, do they hit the keys asynchronously or do their fingers push the keys down at the same time but with different velocities, so that the hammers arrive at the strings at different points in time? To examine this question, it is necessary to determine the finger–key contact times. One possibility might be to observe finger key contacts by using a video camera or by special electronic measurements at the keyboard. In this study, the finger–key contacts were inferred from the time the hammer travels from its resting position to the strings at different final hammer velocities (timing correction curve). With the help of this function, the finger–key contacts could be accurately estimated; also the size of the expected melody lead effect in milliseconds could be predicted from the velocity differences between the voices, assuming simultaneous finger–key contacts.

3.3 Method

3.3.1 Materials and participants The Etude op. 10, No. 3 (first 21 measures, Figure 3.1) and the Ballade op. 38 (initial section, bars 1 to 45, Figure 3.2) by Fr´ed´eric Chopin were recorded on a B¨osendorfer SE290 computer-monitored concert grand piano1 by 22 skilled pianists (9 female and 13 male).2 They were professional pianists, graduate students or professors at the ‘Universit¨at f¨ur Musik und darstellende Kunst’ (University of Music and Performing Arts) in Vienna. They received the scores several days before the recording session, but were nevertheless allowed to use the music scores during recording. Their average age was 27 years (the youngest was 19, the oldest 51). They had received their first piano lesson at 6 and a half years of age on average. They had received piano instruction for a mean of 22 years (standard deviation= 7); 8 of them had already finished their studies; about half of them played more than 10 public concerts per year. After the recording, the pianists were asked to play the initial 9 bars of the Ballade in two additional versions: first with a particularly emphasised highest voice (voice 1, see Figure 3.2) and second with an emphasised third voice (the lowest voice in the upper stave, played also by the right hand, see Figure 3.2). The

1This grand piano is situated in the B¨osendorfer company in Vienna (4., Graf-Starhemberg- gasse 14) and has the internal B¨osendorfer number 19–8974 and was built in August 1986 (only pianos that are sold outside the company get serial numbers). 2The recordings were performed between January 13 and February 9, 1999 (see Goebl, 1999a,b). 3.3. Method 61

; Lento ma non troppo ( = 100) > > # # > > & # # 24 legatoj œ œ œ œ œ œ œ œ. œ œ œ œ œ œ œ œ œ 1 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ 2 œ œ œ œ œ œ 1 1 p J J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ? # # 2 3œ œ œ œ œ œ œ œ 2 œ œ œ œ œ œ # # 4 ‰ œ œ œ œ œ œ œ œ œ œ œ œ œ œ 3 œ œ œ œ œ œ œ 4 œ œ œ œ 7 œ œ œ œ œ > > > > > > # # j > riten. 6 # r j r ten. # œ œ œ œ œ # œ # œ œ œ œ œ.œ œ & nœ œ œ œ œ œ œ œ œ œ œ œ œœ œ œ œ œœ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ #œ œ > J J œ œ œ cresc. ? # # œ> œ œ> œ œ œ nœ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ # # œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ

> stretto riten...... 12 # # > œ œ œ œ . œ. œ œ œ. œ # # œ œ œ œ œ œ œ œ #œ œ#œ œ œ œ #œ œ 1 #œ œ œ œ œ œ œ œ & œ œ œ œ œ œ œ œ œ nœ œ œ œ œ œ œ œ œ œ œ œ œ œ œ 2 œ œ œ œ œ œ œ œ œ œ œ œ œ 3 œ œ œ œ œ œ œ œ œ œ cresc. cresc. con forza ? # # œ œ œ œ œ œ œ œ œ œ 4# œ œ œ œ œ œ œ œ # # œ œ œ œ œ 5 œœ œœ œœ œœnœœ œœ œœ œœ œ œ œ œ œ œ œ œ œ œ 6 œ œ œ œ œ 7

ten. ten. ten. rallent. 17 # # œ œ œ œ œ > # # œ œ œ œœ œ œ œ œ œ œ œ & œ œ œ œ œ œ œ œ œ œ œ œ ˙ œ œ œ œ œ œ œ œ œ œ œ π ƒ ˙ sempreœ legatoœ dim. œ œ œ œ ? #### ˙ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ ˙ œ œ œ œ œ œ œ œ œ œ œ ten. °

Figure 3.1: Fr´ed´eric Chopin. Beginning of the Etude in E major, op. 10, No. 3. The numbers against the note heads are voice numbers (soprano: 1, ..., bass: 7). (Score prepared with computer software by the author following Paderewski Edition.)

purpose of these special versions was to investigate how pianists change melody lead and dynamic shaping of the voices when they were explicitly advised to emphasise one particular voice.

All performance sessions were recorded onto digital audio tape (DAT), and the performance data from the B¨osendorfer grand piano were stored on a PC’s hard disk. The performances were consistently of a very high pianistic and musical level.3 At the end of the session, the participants filled in a questionnaire. The pianists were not paid for their services.

3All recordings can be downloaded from http://www.ai.univie.ac.at/∼wernerg in MP3 format. 62 Chapter 3. Production of Melody

Andantino j 6 œ. œ œ œ j j j j j j b œ œ œ œ œ œ œ 1 œ œ œœ œœ œ œœ œ œ œ . œ œ œ œ œ œ . œ œ & 8 1J J J J œ œ œ œ 1œœ œ œ œ œ 3 J J J œ 2œ œ œ œ œ œ œ œ œ œ œ œ œ 3 J sottoœ œ voce œ œ œ œ œ œ œ œ œ œ œ œ œ j J ? 6 4 4 œ œ œ œ œ œ œ œ œ œ jœ œ œ œ œ œ œ œ œ œ J J J J 5 J œ œ œ œ œ œ œ œ œ œ œ b 8 J J J 4œ œ œ œ J 5 œ J J J J J ° *

j j j œ. œ j j j j j j & b œ. œ œ . œ. œ œ œ œœ œœ œ œ œœ œ œ œ œ œ œ. œ œ œ œ œ œ . œ œ œ œ œ œ œ œ . œ œ œ œ œ J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ J J J J J J 8 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ j j ? b œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ J J J J J J J J œ œ J J J J J

j j j j j j j j œ œ œ œ. œ j j œ œ œ œ œ œ œ œ. œ œ œ œ œ. œ œ œ & b œ. œ œ. nœ œ œ #œ œ œ 1œ œ œ g œ nœ œ œ œ œ œ œ œ œ œ œ œ œ œ. œ œ œ œ g œ œ œ œ œ 2œ œ g œ œ œ œ œ œ œ œ œ œ œ g œ œ œ œ 3 J J J J J J J j J π 16 œ œ œ œ œ œ œ œ j œ œ œ œ œ œ œ œ œ œ œ œ œ ? œ œ œ œ œ œ œ œ 4 œ œ œ œ œ œ œ œ œ œ œ œ œ œ b œ œ œ œ J œ œ œ œ œ œ 4 œ J œ œ œ œ J J J J J J J J J œ J 5J J

j j œ. œ œ . œ . œ œ œ j j j j j j & b œ œ œ nœœ œ. œ. œ œœ œ œ œ œ œœ œ œ œ œ œ œœ œ. œ œ œ œ œ œ œ. œ œ œ œ J J œ œ œ bœ J J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ J J J J 24 œ œ œ œ j œ œ œ œ œ œ œ œ œ œ œ j j ? b J J œ œ œ œ œ œ œ œ œ œ œ J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ J J J J J œ œ J J J J J

j j j j b œ. . œ. œ œ œ œ œ œ œ. œ œ œ œ œ. œ œ œœ. œ . œ . œ. œ œ. œ. & œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œœ #œ œ œ bœ œ œ.#œ œnœ œ œ. bœ œ J J J J J J J J J œ œ œ œ J 32 œ J J ? œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ j j b œ œ œ œ œ œ œ œ œ J œ J J J œ œ j . œ œ œ œ œ j j J J J J J J #œ œ œ œ œ œ #œ œ

j j j j j 3 U b œ œ. œ œ. œ . œ . œ œ œ œ œ œ œ œ œ œ & g œ #œ œ nœ œ œ. œ œ œ . œ œ œ œ œ œ œ œ œ œ œ ˙ . g œ. œ œ œ œ œ œ œ œ œ œ œ œ œ ˙ . J œ œ J œ œ J œ œ œ œ œ ˙ 40 j Jj Jj smorzandoj j- - j - -j - - - - ? œ ˙ . b œ. œ œ œ œ œ œ.œ œ œ.œ œ œ œ œ œ œ œ œ œ ˙ . œ œ. œ œ œ œ œ . œ œ œ œ œ œ œ œ œ œ œ œ œ ˙ . u J J °œ *

Figure 3.2: Fr´ed´eric Chopin. The beginning of the second Ballade op. 38 in F major. The voices are numbered as in Figure 3.1, but the highest voice number is now 5 for the bass. (Score prepared with computer software by the author following Henle Urtext Edition.) 3.3. Method 63

TCC 250 mean TCM curve fit Figure 3.3: The timing characteristics of 200 a grand piano action: the hammer travel times as a function of final hammer veloc-

150 ity. This timing correction curve (TCC) was fitted to average data derived from an EPROM chip of the B¨osendorfer SE 100 system (confer to Figure 2.5 on Page 23). Travel time (ms) The y axis represents the time interval between finger–key contact times (mea- 50 sured 2–3 mm below the key surface) and the hammer–string contact times. 0 0.3 0.4 0.6 0.81 2 3 4 5 Final hammer velocity (m/s)

3.3.2 Apparatus To provide accurate performance data, a B¨osendorfer SE290 Imperial computer- monitored concert grand piano.4 was used. The precise functionality of the B¨osen- dorfer SE system is described in Section 2.3

3.3.3 Procedure Note onsets and the hammer velocity information were extracted from the perfor- mance data. These data were matched to a symbolic score in which each voice was individually indexed, beginning with 1 as the highest voice5 (see Figure 3.1 and Fig- ure 3.2). Wrong notes (substitutions) or missing notes (deletions) were marked as such. The rate of not-played or wrongly played notes was very low: for all pianists 0.43% for the Etude (of ntotal = 9988), 0.69% for the Ballade (of ntotal = 16082), 6 and 1.75% for the two repeated versions of the Ballade (of ntotal = 5764). Timing differences and hammer velocity differences between the first voice (me- lody) and each other voice were calculated separately for all nominally simultaneous

4“SE” stands for Stahnke Electronics, 290 indicates the length of the piano in cm. 5The lowest voice played by the right hand was called 3. If there were three simultaneous notes in the right hand, the middle one was labeled 2. The highest voice played by the left hand was indexed 4, the bass line 5 in the Ballade, and 7 in the Etude. Voices 5 and 6 in the Etude occurred only in measures 16 and 17. In the Ballade, there was only one chord (bar 19) with three simultaneous notes in the left hand. Here, the two higher notes were labeled 4, the bass 5. 6Additional notes (insertions) that were so soft (or silent) that they did not disturb the perfor- mance and were apparently not perceived as mistakes, were not counted as errors. In the Etude we observed 181 such notes over the 22 performances (+1.8%), in the Ballade 189 (+1.17%). Similar observations were made also by Repp (1996c). 64 Chapter 3. Production of Melody

Calculation procedure 80 finger-key estimate hammer-string TCC FHV 74 Figure 3.4: Finger–key times calcula- 90.21 ms 0.985 m/s tion procedure. A typical example of 68 125.62 ms 0.554 m/s a four voiced chord with melody lead

62 (at hammer–string level, closed circles), 134.5 ms 0.492 m/s and the estimation of its finger–key con- 56 tact times (open circles) according to the MIDI pitch number 138.9 ms 0.465 m/s TCC (see Figure 3.3). 50

6900 7000 7100 7200 Time (ms) events in the score. All missing or wrong notes, as well as chords marked in the score as arpeggio (Ballade) or as appoggiatura (Etude) were excluded.7 The finger–key contact times were calculated for each note by subtracting from the hammer–string impact time the corresponding travel time, which was determined by the TCC (see Figure 3.3). From this, finger–key asynchronies were calculated, again between voice 1 and all other voices separately for all nominally simultaneous events in the score. The calculation procedure is sketched in Figure 3.4.

3.4 Results

Figure 3.5 shows the mean velocity profiles (top graphs) as well as the mean asyn- chrony profiles (bottom graphs) of the 22 performances of the Ballade and the Etude and their overall averages. All pianists played the first voice consistently louder than the other voices. None of the pianists chose another voice to be played as the loudest voice. The velocity levels of the individual voices were fairly constant in the perfor- mances of the Ballade, so averaging over all notes in a voice made sense. For the performances of the Etude the dynamic climax of bar 17 caused a strong increase in the velocity values. Therefore, in Figure 3.5 the section from bar 14 to 18 was averaged separately and was not included in the overall average. Again, the first voice showed clearly the highest velocity values. The two bottom graphs in Figure 3.5 show the hammer–string and the finger–key asynchrony profiles for the two pieces. The thicker lines with the standard deviation bars represent the average of the mean asynchrony profiles of the 22 performances (thin lines without symbols). In the hammer–string domain, the melody preceded other voices, as expected, by about 20–30 ms. In the Ballade the asynchrony profiles of the individual perfor- mances were very similar to each other, and the melody lead was slightly greater relative to the left hand voices than to the right hand voices. The individual chord

7The excluded events for the Etude were ([bar number].[relative position in the bar]): 7.75, 8.25, and 21.0; for the Ballade: 18.5, 20.5, 40.0, and 45.0. 3.4. Results 65

Etude all Ballade mean 2.0 bars 142.0 18

1.0 1.0 0.8 0.8 0.7 0.7 0.6 0.6 FHV (m/s) 0.5 FHV (m/s) 0.5 0.4 0.4 right hand left hand right hand left hand

1234567 12345 Voice Voice hs all fk all 1 hs average1 fk average 2

right hand 2 3 right hand

4 3 Voice Voice

5 4 left hand 6 left hand 5 7

-40 -20 0 20 40 60 80 -40 -20 0 20 40 60 80 Asynchrony (ms) Asynchrony (ms)

Figure 3.5: The individual and mean final hammer velocity (FHV) and asynchrony profiles (with standard deviation bars) of 22 performances for the Etude (left-hand panel) and the Ballade (right). In the top panel, the mean intensity values by pianists and voice are plotted. The thicker lines with squares indicate the average across pianists. In the Etude, bars 14–18 are averaged separately. The profiles at the bottom show the averaged timing delays of voices relative to voice 1. Solid lines represent hammer–string (“hs”) asynchronies, dashed lines inferred finger–key (“fk”) asynchronies. The horizontal bars are standard deviations, computed across individual performers. 66 Chapter 3. Production of Melody

profiles for the Etude showed more variability among pianists, especially in the left hand, where the bass voice (7) tended to lead for some pianists (for an example, see below). The asynchronies at the finger–key level (Figure 3.5, dashed lines, average with circles) were consistently smaller than those at hammer–string level. In particular, the melody lead within the right hand is reduced to about zero, whereas the left hand tends to lead the right hand. Two repeated-measure analyses of variance (ANOVA) on the average melody leads for each voice in each performance with type of asyn- chronies (hammer–string and finger–key) and voice (2 to 5 in the Ballade and 2 to 7 in the Etude) as within-subject factors separately for the two pieces (Etude, Ballade) showed significant main effects of type of melody lead and significant interactions between type and voice.8 A real outlier was Pianist 3, who played the melody 40–70 ms before the ac- companiment, as shown in Figure 3.6. This was a deliberate strategy that Pianist 3 habitually uses to emphasise melody. In personal communication with Pianist 3, he confirmed this habit and called it a personal speciality. His finger–key profiles still showed a melody lead of about 20 ms and more. A similar but smaller tendency was shown by two other pianists. This finding suggests that melody lead can be applied deliberately and used as an expressive device—in addition to a dynamic differentiation—to highlight the melody. We argue here that, when melody lead is used as a conscious expressive device, it should be observable at the finger–key level. This strategy seems to be fairly rare. The results of the two emphasised versions of the first 9 bars of the Ballade are shown in Figure 3.7. In the top graphs, the mean intensity values are plotted by voices. In the first voice version (top left graph), the emphasised voice was played louder than in the normal version (mean FHV 1.28 m/s versus 1.01 m/s), while the accompaniment maintained its dynamic range. The melody lead increased up to 40 to 50 ms (Figure 3.7, bottom left graph). When the third voice was emphasised, that voice was played loudest (at about FHV 1.12 m/s on average), with the melody somewhat attenuated (0.84 m/s) and the other voices as usual (top right graph). The third voice led by about 20 ms compared to the first voice, while the left hand lagged by about 40 ms (Figure 3.7). Thus, when pianists are asked to emphasise one voice, they play this voice louder, and the timing difference changes correspondingly. The first nine bars of the (normal version of the) Ballade were compared with these two special versions (Ballade 1st voice, Ballade 3rd voice) with regard to ham- mer velocity and melody lead. A repeated-measure ANOVA on the average hammer velocities of each voice in each performance with instruction (normal, 1st,3rd)and

8The repeated-measure ANOVA for the Ballade: significant effect of type [F (1, 21) = 718.2,p< .001], no significant effect of voice [F (3, 63) = 1.2,p > .05], and a significant interaction between type and voice [F (3, 63) = 112.3,p < .001]; for the Etude: significant effects of type [F (1, 21) = 603.9,p<.001], and voice [F (5, 105) = 5.59,p<.002], and an interaction between type and voice [F (5, 105) = 34.83,p<.001]. 3.4. Results 67

Pianist 3 Etude Ballade hammer-string finger-key 1 1

2

right hand 2 3 right hand

4 3 Voice Voice

5 4 left hand 6 left hand 5 7

-40 -20 0 20 40 60 80 -40 -20 0 20 40 60 80 Asynchrony (ms) Asynchrony (ms)

Figure 3.6: The asynchrony profiles of pianist 3 (with standard deviation bars) at hammer– string contact (solid lines with triangles) and finger–key contact (dashed lines with circles). voice (1–5) as within-subject factors was conducted. Significant effects on instruc- tion [F (2, 21) = 4.98,p < .05], voice [F (4, 84) = 466.2,p < .001], and a significant interaction between instruction and voice [F (8, 168) = 88.58,p<.001] indicate that pianists changed the dynamic shaping of the individual voices significantly. An- other repeated-measure ANOVA was conducted on the melody leads averaged for each voice in each performance, again with instruction (normal, 1st,and3rd)and voice (2–5) as within-subjects factors. It showed significant effects of instruction [F (2, 42) = 114.41,p<.001] and voice [F (3, 63) = 24.12,p< .001], and an interac- tion between instruction and voice [F (6, 126) = 31.29,p<.001].

3.4.1 Relationship between velocity and timing

Generally, it was the case that the larger the dynamic differences, the greater the extent of melody lead. The velocity differences between the first voice and the other notes were negatively correlated with the timing differences. The mean correlation coefficients across the 22 pianists are shown in Table 3.1a), separately for each piece 68 Chapter 3. Production of Melody

1st voice 3rd voice

all 2.0 mean 2.0

1.0 1.0 0.8 0.8 0.7 0.7 0.6 0.6 FHV (m/s) 0.5 FHV (m/s) 0.5 0.4 0.4 right hand left hand right hand left hand

12345 12345 Voice Voice hs all fk all hs average 1 1 fk average

2 right hand 2 right hand

3 3 Voice Voice

4 4 left hand left hand 5 5

-40 -20 0 20 40 60 80 -40 -20 0 20 40 60 80 Asynchrony (ms) Asynchrony (ms)

Figure 3.7: Average velocity and asynchrony profiles of the 22 individual performances in the Ballade’s emphasized melody conditions. On the left-hand side, the first voice was emphasized, on the right, the third voice. The solid lines indicate hammer–string (“hs”) contacts, dashed lines finger–key (“fk”) contacts. 3.4. Results 69

Table 3.1: (a) Mean correlation coefficients, with standard deviations (s.d.), between melody lead and final hammer velocity differences across 22 pianists. nmax indicates the maximum number of note pairs that went into the computation of each correlation (missing notes reduced this number at some individual performances). #r∗∗ indicates ∗∗ the number of highly significant (p<0.01) individual correlations (#rmax = 22). (b) The mean correlation coefficients, with standard deviations (s.d.), between observed and predicted melody lead across 22 pianists, and the number of highly significant (p<0.01) correlations of the pianists (#r∗∗).

Etude Ballade Ballade 1st voice Ballade 3rd voice right hand left hand right hand left hand right hand left hand right hand left hand (a) nmax 126 103 181 269 29 58 29 58 mean −0.45 −0.15 −0.42 −0.31 −0.55 −0.29 −0.73 −0.53 s.d. 0.12 0.20 0.13 0.12 0.17 0.22 0.14 0.17 #r∗∗ 212 222016122218

(b) nmax 126 103 181 269 29 58 29 58 mean 0.66 0.34 0.58 0.50 0.72 0.55 0.79 0.63 s.d. 0.10 0.23 0.13 0.13 0.17 0.22 0.11 0.13 #r∗∗ 22 14 22 22 21 21 22 22

and for right-hand (within-hand) and left-hand (between-hand) comparisons.9 The within-hand coefficients were substantially higher than the between-hand coefficients. This suggests a larger independence between the hands than between the fingers of a single hand. Especially for the Etude, almost all of the between-hand coefficients were non-significant (with the exception of two pianists). The coefficients for the special versions were slightly higher than those for the ‘normal’ versions. These correlation coefficients assume a linear relationship between melody leads and the velocity differences. However, the expected effect resulting from the piano action timing properties (velocity artifact) does not represent a linear, but rather an inverse power relation (see Figure 3.3). To test the presence of this effect in the data, the observed timing differences were correlated with the timing differences predicted by the TCC (Table 3.1b). These correlations were generally higher than the correlations between timing differences and final hammer velocity differences. Eighty-seven out of 88 individual coefficients were highly significant for the right

9The negative sign of the correlation coefficients stems from the way of calculating timing and velocity differences and has no relevance for data interpretation: from the onset time of each accompanying note (tn) the onset time of the corresponding melody note (t1) is subtracted (tn −t1), so the melody lead is positive. Similarly, the velocity differences are calculated as: vn − v1,which results in negative values. Therefore, the correlation coefficients between melody leads and velocity differences are negative, whereas the coefficients between observed and predicted melody leads are positive. 70 Chapter 3. Production of Melody

hand. This result shows that the connection of melody lead and intensity variation is even better explained by the velocity artifact than by a linear correlation, as done in previous studies (Palmer, 1996; Repp, 1996a). Some of the individual left hand correlation coefficients between observed and predicted melody lead were non-significant in the Etude, but not in the Ballade or in the special versions (Table 3.1b). This suggests not only the general trend of larger between-hand asynchrony variability, but it is also due to large bass anticipations— the type 2 asynchronies mentioned above—played by some pianists, who clearly struck some bass notes earlier. To illustrate these bass anticipations, the beginning of the Etude performed by pianist 5 is shown in Figure 3.8. In the bottom graph of Figure 3.8, we can observe five bass leads. Two are quite small (bars 6 and 7 about 35–40 ms), two are somewhat larger (bars 2 and 8 about 75 ms) and one is huge (bar 9, 185 ms). All bass leads are even larger in the finger–key domain (see Figure 3.8, open symbols). In this example, most of the large bass leads occur at metrically important events. These bass leads are well perceivable and often exceed the range of the melody leads.

3.5 Discussion

In this study, a large and high quality set of performance data was analysed. In addition to the measuring of asynchronies at the hammer–string impact level, we estimated the asynchronies at the start of the key acceleration (finger–key level) through calculation. The hypothesis that melody lead occurs as a consequence of dynamic differentiation was supported in three ways.

1. The consistently high correlations between hammer–string asynchronies and dynamic differences show the overall connection of melody lead and velocity difference. The more the melody is separated dynamically from the accompa- niment, the more it precedes it. These findings replicate Repp’s results (Repp, 1996a).

2. In addition to these findings, the estimated finger–key asynchronies show that, with few exceptions, the melody lead phenomenon disappears at finger–key level. Pianists start to strike the keys almost synchronously, but different velocities cause the hammers to arrive at the strings at different points in time.

3. With the help of the timing correction curve (TCC), melody lead was predicted in ms. The correlations between this predicted and the observed melody lead were even higher than the correlations between velocity differences and melody lead. Differences in hammer velocity account for about half of the variance in asynchronies in the data. The other variance could be due to deliberate expression, or motor noise. 3.5. Discussion 71

Pianist 5 2.8 2.5 1 2.2 2 1.9 3 4 1.6 7 1.3

1

0.7 FHV (m/s)

0.4

> > > j riten. # # legato > > # # 2 œ. œ œ œ r j r œten. & 4 j œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ nœ œ œ œ œ œ œ œ œ œ œ œ# œ œ œœ œ œ# œ œ œœ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ #œ œ > p J J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ cresc. œ œ œ ? # # 2 œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ> œ œ> œ œ œnœ œ œ œ œ # # 4 ‰ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ > > > > 100

50

0

−50

1

Delay (ms) −100 2

3 −150 4

−200 7

1 2 3 4 5 6 7 8 9 Score time (bars)

Figure 3.8: The dynamic profiles (top panel) and the note onset asynchronies (bottom) for the first bars of the Etude op. 10, No. 3 for pianist 5. Top graph: The final hammer velocity (FHV) is plotted against nominal time according to the score. Each voice is plotted separately. The melody is played clearly more loudly than the other voices. The bottom graph shows the time delay of each note relative to onset time of the corresponding melody note (voice 1). The closed symbols represent hammer–string asynchronies, the open symbols the estimated finger–key contact times. 72 Chapter 3. Production of Melody

The findings of this study are consistent with interpretations of Repp (1996a, velocity artifact explanation) rather than those of Palmer (1989, 1996), who regarded melody lead to be produced independently of other expressive parameters (e.g. dynamics, articulation). Of course it remains true that melody lead can help a listener to identify the melody in a multi-voiced music environment. Temporally offset elements tend to be perceived as belonging to separate streams (stream segregation, Bregman, 1990), and spectral masking effects are diminished by asynchronous onsets (Rasch, 1978, 1979, 1988). But in the light of the present data, perceptual segregation is not the main reason for melody lead. Primarily, the temporal shift of the melody is a result of the dynamic differentiation of the voices, but both phenomena have similar perceptual results, that is, separating melody from accompaniment. Nevertheless, pianists clearly played asynchronously in some cases. Some bass notes were played before the melody. Bass lead time deviations were usually around 50 ms and extended up to 180 ms in some cases. These distinct anticipations seem to be produced intentionally, although probably without immediate awareness. This bass lead has been well documented in the literature, not only as a habit of an older generation of pianists, but also in some of today’s pianists’ performances (Palmer, 1989; Repp, 1996a). The case of Pianist 3 suggests that pianists can enlarge the melody lead delib- erately if they wish to do so. In this case, even in the finger–key domain melody lead was observable. However, it does not seem possible for pianists to dynamically differentiate voices in a chord without producing melody lead in the hammer–string domain. At least there was no example in the present data that would prove this. In the examples of deliberately produced asynchronies (bass lead and enlarged melody lead), the extent of the asynchrony usually exceeded 30 ms. Such asyn- chronies may be regarded as a deliberate expressive device under direct control of the pianists. According to the pianists in this study, they were produced in a some- what subconscious way (personal communication with the pianists), but pianists reported a general awareness of the use of these asynchronies and that they could suppress them if they wanted to. However, the use of the ‘normal’ melody lead that was produced by all pianists was unconscious. Pianists reported that they emphasise one voice by playing it dynamically louder, but not earlier (the same was reported by Palmer, 1989, p. 335). The asynchronies in the finger–key domain were computed by using a timing correction curve which provides the time interval from key-press to hammer–string impact as a function of final hammer velocity. The key shutter reacts when the key is depressed about 2 to 3 mm (the touch depth of the key is usually about 9.5 mm Askenfelt and Jansson, 1991, p. 2383, varying slightly across pianos). Thus, to be precise, the finger–key domain represents points in time when keys are depressed by 2 to 3 mm. However, almost nothing is known about how keys are accelerated and released in reality. In very precise acceleration measurements by Van den Berghe et al. (1995, p. 17), it can be seen that sometimes keys are not released entirely, especially in repetitions. The modern piano action has the double repetition fea- 3.5. Discussion 73

ture that allows a second strike without necessarily releasing the key entirely. If the system measured onsets close to the zero position, some onsets would not be detected as such. Nevertheless, the 2 to 3 mm below zero level still gives a good impression about the asynchronies at the start of a key acceleration. For more accurate statements about played and perceived onset asynchronies in piano per- formance, evaluation of acceleration measurements at different points in the piano action would be necessary. This study was concerned with the particular properties of the piano. Other keyboard actions (harpsichord, organ) may have similar timing properties as far as the key itself is concerned (a key that is depressed faster reaches the keybed earlier than a slower one), but their actions respond differently due to their different way of producing sound: the harpsichord plucks the strings, and on the organ a pipe valve is opened or closed. Additionally, they do not allow continuous dynamic differentiation like a piano does, and therefore performers may choose timing as a means to separate voices. However, we note a difference in the played repertoire: homophonic textures, like the Chopin excerpts used in this study, are seldom seen in the harpsichord or organ repertoires. According to Vladimir Horowitz, when accenting a tone within a chord one should “raise the whole arm with as little muscular effort as possible, until the fingers are between three and five inches above the key. During the up and down movements of the arm, prepare the fingers by placing them in position for the depression of the next group of notes and by holding the finger which is to play the melody-note a trifle lower and firmer than the other fingers which are to depress the remaining keys of the chord.” (Eisenberg, 1928).10 This would suggest that an asynchrony at the key is intended, but Horowitz goes on: “The reason for holding the finger a trifle lower is only psychological in effect; in actual practice, it isn’t altogether necessary. Experience shows that in the beginning it is almost impossible to get a student to hold one finger more firmly than the others unless he is also permitted to hold it in a somewhat different position from the others. Holding it a little lower does not change the quality or quantity of tone produced and does not affect the playing in any way but it does put the student’s mind at greater ease” (Eisenberg, 1928). As the pianists in the present study, Horowitz is aiming at intensity differences here, but not at differences in timing: “The finger which is held a trifle lower and much firmer naturally strikes the key a much firmer blow than do the more relaxed fingers which do not overcome the resistance of the key as easily as does the more firmly held finger. The tone produced by the key so depressed is therefore stronger than the others” (Eisenberg, 1928). The quote suggests that Horowitz was unaware of the consequences of his recommendation for onset synchrony or that he did not consider onset asynchrony as an important goal.

10This article may be found at http://users.bigpond.net.au/nettheim/horo28.htm. 74 Chapter 3. Production of Melody

3.6 Finger–key contact estimation with alterna- tive travel time functions

The timing correction curve (TCC) used in Section 3.3.3 (p. 63) was replicated with results from extensive measurements on the same grand piano in Section 2.2.3 (p. 21). The present section presents finger–key chord profiles inferred through the travel time approximations obtained in Section 2.2.3. In Figure 3.9, the three travel time approximations as listed in Table 2.1 (p. 22) are plotted together with the TCC used in Section 3.3.3 (t =89.16h−0.570;see Figure 3.3, p. 63 and Figure 2.4, p. 20). It is evident that the TCC and the legato curve fit and the staccato and the reproduction curve fits were very similar to each other (also reflected in the coefficiants in Table 2.1). It was surprising that the TCC obtained from the internal calibration function coincided better with the legato approximation rather than with the staccato or the reproduction curve fits. The internal sensor below the key reacts at about 2 mm key depression (see Figure 2.1, p. 16). Thus the registered TCC would be expected to depict travel times shorter than the reproduction by the system as measured by the accelerometer setup, where the finger–key was determined as the beginning of the key movement and thus at 0 mm key depression. However, since there are some uncertainties in obtaining and interpreting the TCC from the B¨osendorfer’s internal calibration mode, the finger–key approximations as displayed in the Figures 3.5 and 3.7 were re-calculated with the curve approximations of the legato and the staccato data using the same procedure as in Section 3.3.3 (see also Figure 3.4, p. 64).

TCC (Goebl, JASA 2001) 250 BoeSE SE290–3, legato BoeSE SE290–3, staccato BoeSE SE290–3, repro Figure 3.9: Comparison of dif- ferent travel time approxima- 200 tions. Displayed are the tim- ing correction curve (TCC) as

150 used in Section 3.3.3 and Goebl (2001), and the three power curves fitted onto legato, stac- 100 cato and the reproduction data Travel time (ms) as reported in Section 2.2.3 (see Table 2.1, p. 22). The TCC 50 and the legato curve are almost identical as well as the staccato 0 and the reproduction curve. 0 1 2 3 4 5 6 7 Hammer velocity (m/s) 3.6. Finger–key contact estimation with alternative travel time functions 75

Etude Ballade

1 mean hs mean hs mean fk 1 mean fk mean fk−new (st) mean fk−new (st) mean fk−new (lg) mean fk−new (lg)

2

2

3

4 3 Voice Voice

5

4

6

7 5 −40 −20 0 20 40 60 80 −40 −20 0 20 40 60 80 Asynchrony (ms) Asynchrony (ms)

Figure 3.10: Grand average asynchrony profiles for the Etude (left) and the Ballade (right) for 22 pianists. The profiles at hammer–string level (“hs,” diamonds with solid line) and finger–key (“fk,” circles with dotted line) are identical to those depicted in Figure 3.5 (p. 65). The profile with squares and a dash-dotted line represent finger–key times inferred by the power function of the staccato data; the profile with asterisks and a solid line those inferred by the legato data (see Figure 3.9).

In Figure 3.10, the grand average chord profiles at finger–key level inferred through the legato and the staccato curve approximations are displayed together with those already shown in Figure 3.7 (p. 68) separately for the Ballade and the Etude. Although there were considerable differences between the legato and the staccato curve (see Figure 3.9), the finger–key profiles were very similar to those inferred through the TCC. Also the two emphasised melody conditions of the first nine bars of the Ballade showed the same behaviour with the two alternative travel time approximations (Figure 3.11). Also there, the ‘old’ and the ‘new’ finger–key profiles coincided well. Thus, the basic findings of Goebl (2001) could be replicated here. For the sake of completeness, also the correlation coefficients between observed and predicted melody lead as shown in Table 3.1 (p. 69) were re-calculated separately for melody leads predicted through the different alternative travel time functions (legato, staccato, reproduction, see Figure 3.9). The mean correlation coefficients 76 Chapter 3. Production of Melody

1st Voice 3rd Voice

mean hs mean hs 1 mean fk 1 mean fk mean fk−new (st) mean fk−new (st) mean fk−new (lg) mean fk−new (lg)

2 2

3 3 Voice Voice

4 4

5 5

−40 −20 0 20 40 60 80 −40 −20 0 20 40 60 80 Asynchrony (ms) Asynchrony (ms)

Figure 3.11: Grand average asynchrony profiles for the emphasised melody conditions of the Ballade for 22 pianists. On the left-hand side, the first voice was emphasised; on the right, the third voice. The profiles at hammer–string level (“hs,” diamonds with solid line) and finger–key (“fk,” circles with dotted line) are identical to those plotted in Figure 3.7 (p. 68). The profile with squares and a dash-dotted line represent finger–key times inferred by the power function of the staccato data; the profile with asterisks and a solid line those inferred by the legato data (see Figure 3.9). across 22 performances, with standard deviations (s.d.), and the number of highly significant correlation coefficients are listed in Table 3.2, separately for the different travel time functions. The maximum number of pairs of observed and predicted melody leads was different for the 22 performances (due to missing or wrong notes), but identical for the different travel time functions and thus only listed once in the table. The results of the four different calculations are similar. There was one coefficient more highly significant at the Etude’s left hand for the staccato and reproduction travel time function (15 instead of 14), while the Ballade’s 3rd voice version depicted 21 significant coefficients, but only 20 with the other three functions. These minute differences in the results do not affect the evidence given from the data and the conclusions drawn in Section 3.5 (p. 70). 3.6. Finger–key contact estimation with alternative travel time functions 77

Table 3.2: The mean correlation coefficients, with standard deviations (s.d.), between observed and predicted melody lead across 22 pianists, and the number of highly signifi- cant (p<0.01) correlation coefficients of the 22 performances (#r∗∗), separately for the different travel time approximations. The TCC data is identical to Table 3.1b (p. 69. nmax indicates the maximum number of note pairs that went into the computation of each correlation (missing or wrong notes reduced this number at some individual performances).

Etude Ballade Ballade 1st voice Ballade 3rd voice right hand left hand right hand left hand right hand left hand right hand left hand

nmax 126 103 181 269 29 58 29 58

TCC mean 0.66 0.34 0.58 0.50 0.72 0.55 0.79 0.63 s.d. 0.10 0.23 0.13 0.13 0.17 0.22 0.11 0.13 #r∗∗ 22 14 22 22 21 21 22 22

Legato mean 0.66 0.34 0.58 0.50 0.73 0.55 0.79 0.63 s.d. 0.10 0.23 0.13 0.13 0.17 0.22 0.11 0.13 #r∗∗ 22 14 22 22 21 20 22 22 Staccato mean 0.67 0.34 0.58 0.50 0.72 0.56 0.79 0.63 s.d. 0.10 0.23 0.13 0.13 0.17 0.22 0.11 0.13 #r∗∗ 22 15 22 22 21 20 22 22 Reproduction mean 0.67 0.34 0.58 0.50 0.72 0.56 0.79 0.63 s.d. 0.10 0.23 0.13 0.13 0.17 0.22 0.11 0.13 #r∗∗ 22 15 22 22 21 20 22 22

In fact, the findings derived through the arguable TCC in Sections 3.1–3.5 and Goebl (2001) could be put on firm ground with data obtained from a totally different source and with no insecure steps in procedure in between. 78 Chapter 3. Production of Melody

3.7 A model of melody lead

This section describes briefly a model of melody lead according to the velocity ar- tifact hypothesis. The velocity artifact hypothesis assumes that melody lead occurs exclusively because of different intensities of the tones in a chord. This model as- sumes furthermore that the pianists start to depress the different keys of a chord simultaneously (at finger–key level). Different intensities of the keystrokes result in different travel times and thus, the hammers will arrive at the strings at different points in time. The faster a key is depressed the earlier it arrives at the strings (see Section 2.2.3, p. 21). This model of melody lead simply takes approximations of the hammers’ travel times and infers tone onset asynchronies (melody leads) according to the different intensities of the chord tones. Taking the TCC measured by the B¨osendorfer system’s calibration function as the cause of melody lead (ml in milliseconds), and the final hammer velocity of the melody (fhv 1 in meters per second) and of an accompanying tone (fhv n), melody lead is predicted by

. · −0.570 − . · −0.570. ml =8916 fhv n 89 16 fhv 1 (3.1) In this work, the mapping between MIDI velocity units and final hammer velocity (m/s) as measured by the B¨osendorfer system was chosen to be · , MIDIvel =52+25 log2(fhv) (3.2) thus the model with MIDI velocity units (MIDI velocity from melody MIDI1 ,and from the softer accompaniment MIDIn ) as input is

   − . MIDIn −52 −0.570 MIDI1 −52 0 570 ml =89.16 · 2 25 − 89.16 · 2 25 . (3.3) The alternative travel time functions alter only the coefficients of the power curve fit of the model. To be complete, they are listed below (cf. Table 2.1, p. 22). 1. With the curve fitted into the B¨osendorfer legato data:    − . MIDIn −52 −0.5595 MIDI1 −52 0 5595 ml =89.96 · 2 25 − 89.96 · 2 25 (3.4)

2. With the curve fitted into the B¨osendorfer staccato data:    − . MIDIn −52 −0.7377 MIDI1 −52 0 7377 ml =58.39 · 2 25 − 58.39 · 2 25 (3.5)

3. With the curve fitted into the B¨osendorfer reproduction data:    − . MIDIn −52 −0.7731 MIDI1 −52 0 7731 ml =60.90 · 2 25 − 60.90 · 2 25 . (3.6) Chapter 4

The Perception of Melody in Chord Progressions

This part discusses the perceptual side of bringing out the melody in piano per- formance. From performance studies in Chapter 3, we learned that pianists play the voice intended to be prominently heard not only louder, but also slightly before the accompaniment (melody lead). In this chapter, this phenomenon is approached from the listener’s perspective. The main question here is to clarify the influence of relative asynchrony and variation in tone intensity balance on the perception of the salience of different voices in artificial music stimuli and real music. Another interest of this chapter is whether such small asynchronies as they are typically played by pianists are detected as such by listeners.

In the pilot experiment (Section 4.3, p. 87), two equally loud tones with asyn- chronies up to ±50 ms are used to investigate the perceived loudness of the two tones (question 1) and their perceived order (question 2). In this pilot experiment, different types of tones are used (pure, sawtooth, MIDI-synthesised piano, and real piano) to test whether different attack curves change loudness perception or tempo- ral order identification. Variation in balance of the chord tones is added to the next series of three experiments (Experiments I–III, Section 4.4, p. 95). In Experiment I, participants adjust the relative level of two simultaneous tones (pure, sawtooth, and piano sound) until they sound equally loud. In Experiment II, they rate the relative loudness of the two tones of dyads with relative timing and intensity systematically manipulated by up to ±54 ms and ±20 MIDI velocity units. In Experiment III, listeners judge whether or not the stimuli of the previous experiment sound simul- taneous. In another series of three experiments, the stimulus material is extended to three-tone piano chords, sequences of three-tone piano chords (Experiment IV and Experiment V, see Section 4.5, p. 105), and to an excerpt of a piece by Chopin (Experiment VI, Section 4.6, p. 118).

79 80 Chapter 4. Perception of Melody

4.1 Introduction

A pianist can “bring out” a melody tone—that is increase its perceptual salience— either by depressing the key more quickly or by varying timing relative to the ac- companiment. Melody tones typically sound some 30 ms before the other tones of a chord (melody lead, Palmer, 1989, 1996; Repp, 1996a; Goebl, 2001); this ef- fect is generally associated with, and presumably causally related to, differences in hammer velocity between the melody and accompaniment (velocity artifact, Repp, 1996a; Goebl, 2001, see Chapter 3). Independently of why pianists introduce these asynchronies, several perceptual effects are generally referred to in order to explain the psycho-acoustic relevance of these asynchronies. • Masking. A voice anticipated by several milliseconds avoids being (at least partly) masked by the other tones of a chord (Rasch, 1978, see also Sec- tion 4.1.4, p. 83).

• Streaming. Auditory events are more likely to be grouped into simultaneities if their onsets are synchronous and into separate melodies if their onsets are asynchronous (Bregman and Pinker, 1978; Bregman, 1990; Palmer, 1996, see also Section 4.1.5, p. 84). Apart from the psycho-acoustic effects of asynchrony that will be studied in the following, intensity differences between melody and accompaniment as they are found in piano performance may by itself entail psycho-acoustic effects. The louder melody voice takes on a singing quality, because its pitch becomes more salient (cf. Terhardt et al., 1982). Since the roughness of a beating pair of pure tones falls rapidly with increasing amplitude difference (cf. Terhardt, 1974), the timbre of the whole sonority can be expected to become less rough (Parncutt and Troup, 2002, p. 291).

4.1.1 Perception of melody Under the heading of the “perception of melody” a wide range of psychological and psycho-acoustic research is summarised. The main focus of this research lies on various fundamental principles involved while human listeners perceive melodies. They range from, e.g., pitch processing (Burns, 1999) and interval categorisation (Plomp et al., 1973) to recognition of pitch contour and memorisation (Bharucha, 1983; Dowling, 1990; Watkins, 1985). A comprehensive overview was provided by Deutsch (1999b). Music theoretic approaches of melody perception were provided, e.g., by Meyer (1973), Lerdahl and Jackendoff (1983), and Narmour (1990). In the present study, we are not concerned with all aspects of melody perception. Our focus lies on the perceptual salience of individual melodic lines in multi-voiced musical textures and how it changes when their relative intensity and their relative timing is varied. 4.1. Introduction 81

Different voices in a multi-voiced musical texture exhibit different attentional properties. The highest voice is very often also the melody in the classic-romantic repertoire (Palmer and Holleran, 1994). Thus, there is a perceptual advantage for the highest-pitched voice (DeWitt and Samuel, 1990) and a disadvantage for middle voices (Huron, 1989; Huron and Fantini, 1989). Evidence comes also from error studies by Palmer and van de Sande (1993) and Repp (1996c). Pianists made less mistakes in the melody voice than in the accompaniment. Harmonically related errors occurred more frequently in the middle voices (Palmer and van de Sande, 1993) and are less likely to be detected there by listeners (Palmer and Holleran, 1994).

4.1.2 Perception of isolated asynchronies Much research has been conducted on the perception of asynchronies especially for the purpose of speech perception. Studies from psychoacoustic literature used exclu- sively artificial sounds (pure and complex tones, clicks, bursts). To my knowledge, there is no study of the perception of asynchronies with typical musical stimuli such as music instrument tones (for an overview, confer to Hirsh and Watson, 1996). The two basic questions are (1) what is the temporal threshold beyond which two almost simultaneous sounds are perceived as asynchronous (detection of asynchrony thresh- old), and (2) from which amount of asynchrony can the correct order of two sounds be perceptually determined (temporal order threshold,TOT,cf.Pastoreet al., 1982). The detection of asynchrony threshold of tones is very small and lies at the threshold of the human auditory system in general (auditory acuity, Green, 1971). Two clicks (presented at the same ear) were heard as sounds rather than one single sound when their temporal difference was not more than 2 ms (Wallach et al., 1949). Under extreme conditions, this threshold was found to be even smaller. Two clicks with different amplitude could be discriminated with an asynchrony of 0.2 to 1 ms (Henning and Gaskell, 1981), and, as a strangely extreme case, a 0.01 ms (= 10µs) asynchrony could be detected with 0.01 ms clicks (Leshowitz, 1971). The detection of asynchronous onsets and offsets of individual partials in complex harmonics was studied by Zera and Green (1993a,b, 1995). The listener’s sensibility for onsets was more accurate (than for offsets) and was of the order of 1 to 2 ms. The second question concerns the correct detection of the temporal order of two stimuli. In an often cited study, Hirsh (1959) found the temporal order threshold (TOT) to lie between 15 and 20 ms for pure tones with rise times of the order of 20 ms. Similar threshold were obtained with stimuli of different pitch, and timbre (clicks, noise). His assumption that this threshold is independent of the acoustical nature of the sound was invalidated by Pastore et al. (1982), who tested particularly effects of stimulus duration and rise time. They found that the longer the (common) stimulus durations (10–300 ms) were, the higher were the TOTs (4–12 ms); and the longer the rise times (0.5–100 ms), again, the higher the TOTs (4–23 ms). The TOT of a condition most similar to a piano tone (300 ms common duration, 25 ms 82 Chapter 4. Perception of Melody

rise time, see Chapter 2) was approximately 13 ms. This threshold corresponded to findings from speech perception, where 20 ms were said to be sufficient to tell the correct order of two stimuli (Rosen and Howell, 1987). In special conditions, this threshold was found to be even smaller. A TOT of 2 ms was experimentally validated with two tones of a duration of 2 ms (at 1000 and 2000 Hz, respectively, Wier and Green, 1975), or with three tones differing in frequency (Divenyi and Hirsh, 1974). All the above reported studies dealt with artificial stimuli, mostly with pure tones. In real musical situations, it could be expected that these thresholds are far too low. Handel (1993) said that time differences up to 30–40 ms were perceived as being simultaneous, but as beginning at different times (no order). From 40–80 ms they appeared asynchronous and one tone seems to be before the other (Handel, 1993, p. 214). Reuter (1995, pp. 31–34) reported that the perception time smear (Reuter, 1995, p. 33) lies around 30–80 ms indicating an integration time of the ear below which events are grouped into one percept (see also Meyer-Eppler, 1949; Winckel, 1952; Roederer, 1973). Similarly, Huron (2001) emphasises that onset differences in real music situations can be a lot greater than the TOT of 20 ms and still give the impression of a single onset. In his opinion, sounds with gradual attack characteristics will not be heard separately (especially in reverberant environments) until they are more than 100 ms apart (Huron, 2001, p. 39).

4.1.3Intensity and the perception of loudness and timbre Sound intensity is a physical quantity measured by physical instruments in terms of sound level in decibels (dB), whereas loudness as a psycho-acoustic measure refers to what a human listener senses when exposed to a certain sound intensity. The different perception of individual pure tones by listeners is reflected in the equal- loudness contours (Fletcher and Munson, 1933; Moore, 1997; Zwicker and Fastl, 1999; Yost, 2000, ISO standard 226:1987). The subjective measure is the loudness level, measured in phons (going back to Barkhausen, cf. Zwicker and Fastl, 1999, p. 160): a 40-dB 1-kHz pure tone has a loudness level of 40 phon. Another way to measure loudness is the sone scale. One sone corresponds to a loudness of a 40-dB 1-kHz pure tone. The same tone twice as loud has about 50 phons or 50 dB SPL, or 2 sones. Since in real music we almost never hear isolated pure tones, several approaches tried to find measures of loudness for complex signals. There were two main meth- ods of adding up loudness values across frequency bands (Hartmann, 1998, p. 73). Stevens (1961) used 26 one-third-octave bands from 40 to 12500 Hz. Zwicker’s ap- proach (Zwicker and Fastl, 1999, pp. 220–238) was similar, but based on the idea of summing up neural excitation in critical bands. These models have only been tested on steady-state sounds. In the real world of music they would be best evaluated with organ sounds (Hall, 1993). There were attempts to implement them in real audio data (Langner et al., 2000; Pampalk 4.1. Introduction 83

et al., 2002; Rauber et al., 2002). Such implementations took temporal and spectral masking into account, as well as the equal-loudness contours. These approaches were also used to analyse expressive performance (Langner and Goebl, 2002; Dixon et al., 2002a,b; Langner and Goebl, in press). In addition, it was tried to connect the psycho-physical measures derived from artificial stimuli with listeners’ ratings from real music. Loudness estimation of arti- ficial stimuli (pure tones, noise) and real music (in this case pop music) was approx- imately proportional to their sound level (Fucci et al., 1997, 1999). No significant difference between the sounds (artificial versus real music) were found. However, depending on the experimental conditions, there were considerable differences in loudness estimation according to the content of the music presented to the listen- ers. Loudness estimation varied with preference for musical style (Fucci et al., 1993; Hoover and Cullari, 1992) and peer group (Fucci et al., 1998). Loudness estimation of artificial and real stimuli can be described by power functions similar to those that relate subjective magnitude of loudness to the physical magnitude of intensity, but the slopes of the functions varied with stimulus condition and musical skill (Geringer et al., 1993). In acoustic instruments, loudness cannot be varied independently of timbre. Es- pecially at the piano, both tone intensity and timbre is controlled only by a single parameter: hammer velocity (see Chapter 2). The louder a tone gets, the more partials it involves and thus the more bright the tone color of the sound becomes (Hall and Askenfelt, 1988; Hall, 2002, pp. 187–194, esp. p. 190).

4.1.4 Masking Masking is a common effect in everyday life. A loud sound prevents a softer one to be heard. You can’t hear what your friend says to you while a loud truck is passing by. Similarly, in music this effect is always present. A typical example is chamber music, where the piano tends to be too loud and prevents the singer or the violinist to be properly heard by the audience (for an anecdotic example, cf. Moore, 1979). In psycho-acoustic terms, masking refers to the same notion, but in a more detailed and elaborated way. There are two types of masking: spectral and temporal. Spectral masking operates basically only within critical bands (Moore, 1997; Zwicker and Fastl, 1999). At moderate to high sound levels, a masker tone disrupts tones with higher frequencies more than tones with lower frequencies (Zwicker and Fastl, 1999, p. 68). A masker also distorts the sensation level (the level below which tones are not perceived due to masking) over time (temporal masking,Zwickerand Fastl, 1999, pp. 78–103). After a loud tone or noise, several tens of milliseconds the sensation level remains as high as during that sound and fades away continuously (post-masking or forward masking). Surprisingly, a similar effect can be observed with the opposite order. A tone before a loud masker can also be hidden (pre- masking or backward masking). This effect lasts only a few milliseconds (Zwicker and Fastl, 1999, p. 78). 84 Chapter 4. Perception of Melody

The determination of the exact amount of masking in real music stimuli such as a piano chord is not possible with any of the existing models, because the multiple interactions of the various partials of the sounds that change additionally constantly over time do not allow precise predictions. In computarised models used for sound compression (such as the MP3 file format) or loudness calculation, masking effects were implemented in a simplified manner. Spectral masking between voices was reduced when one voice is temporally shifted some tens of milliseconds away from the rest, as Rasch (1978) confirmed with a tone detection study that used complex artificial signals.

4.1.5 Stream segregation The theory of auditory scene analysis (Bregman, 1990) describes the processes involved when human perception puts together individual frequencies into units (tones) or groups according to various principles. They are similar to the principles found in visual perception (Gestalt psychology). Such principles include proximity, similarity, and good continuation in time, pitch, and timbre (for an overview, see Deutsch, 1999a). These principles apply when a listener hears a four-voiced fugue and perceives the four voices as separate streams. These principles also apply when multiple voices are heard in a Bach solo Sonata or Partita (implied polyphony). An- other interesting example where these principles can be played with is the Alberti bass (typical bass figures of the 18th century, e.g., ||:C–G–E–G:||). When this figure is played very slowly, one voice is perceived. As the tempo increases, the tones fuse into separate streams of C–E and G–G (fusion, cf. van Noorden, 1975). When the tempo increases further, all tones almost merge into a single percept. Perceptual grouping might also be controlled by loudness and timbre as well as the timing of the four tones. Asynchrony in simultaneously occurring events was used for grouping or segre- gation of streams in order to determine sound sources in auditory scene analysis (Bregman, 1990). In Bregman and Pinker’s ABC-experiment (Bregman and Pinker, 1978), they used three pure sounds in a cycle. One alternated with the other two (B and C), which occurred simultaneously, so they were fused into one complex sound (when they have simultaneous onsets and offsets). There were two differ- ent streaming interpretations possible: (1) in the simultaneous condition usually B and C grouped together (you hear A and a complex sound B and C), or (2) if the frequency of A and B was close, they could be heard as a stream and C as a separate event. When one of the two simultaneous tones (C) moved in time (so onset and offset were not at the same time with B), the first interpretation became more likely. Two perceptual effects were competing with each other (sequential integration and spectral/simultaneous integration, Bregman, 1990, p. 30). Bregman argued further with the Old-Plus-New Heuristic (Bregman, 1990, p. 222) which roughly means that a sound having been presented earlier will be referred to as belonging to that earlier source and filtered out from the new sound. 4.1. Introduction 85

Parallel to this experiment, Rasch (1979, 1988) argued that observed asyn- chronies in ensemble performances of the same order as the melody lead enable listeners to track voices distinctly. Alongside this argumentation, Huron (1993) showed that J. S. Bach maximises onset asynchrony in the written score (two-part inventions) in order to optimise the perceptual salience of the individual voices and to make every single voice distinctively audible (Huron, 1993). 86 Chapter 4. Perception of Melody

4.2 Aims

The basic questions I address in the following are listed here. They can be split up into two blocks of questions: the first refers to the perception of asynchrony, and the other to the perception of the salience of a tone or voice.

1. Perception of tone asynchronies

• At what amount of asynchrony is a listener able to tell the temporal order with certainty? • Which asynchronies are detected as such in what intensity combinations? It is hypothesised that typical patterns like the melody lead (a louder voice is also early) are so common that they are not perceived as be- ing asynchronous in comparison to less familiar combinations of relative timings and intensity differences. • Does the perception of asynchrony depend on the type of signal it is pre- sented with (pure or complex artificial sounds versus real piano sounds)?

2. Perception of tone salience

• The role of shifting a tone back and forth in time relative to other voice(s). Does this change the perceived loudness/salience of that tone? • Is there a difference in the perceived salience between an anticipated tone and a delayed one? Could it be that delay attenuates a tone’s salience? • Does a possible effect of asynchrony vary with the types of sound involved (pure, complex, or real piano tones)? • What is the influence of variation in the tone intensity balance of chords on the perceived salience of a particular tone? • Is relative intensity the more important perceptual cue in comparison to relative onset asynchrony? • Is the position in a chord (upper, middle, lower tone) relevant to the loudness perception of a particular tone? • Does streaming enhance the effect of asynchrony and order in comparison to the perception of single tone combinations? 4.3. Perception of asynchronous dyads 87

4.3Perception of asynchronous dyads (pilot study)

This section describes a pilot study on the perception of equally loud dyads that were systematically manipulated in their tone onset synchrony. This work has been presented at the 2001 Society of and Cognition meeting at Queens University, Kingston Ontario, Canada (Goebl and Parncutt, 2001).

4.3.1 Background As already learned in Section 4.1.2, there are two different tasks to distinguish in the perception of asynchronous onsets. The one is the temporal order threshold (Hirsh, 1959; Hirsh and Watson, 1996) which lies around 20 ms. The other is the detection task, whether or not two stimuli (tones, clicks) appear together. This threshold can go down to some milliseconds depending on the kind of stimulus. The first aim of this pilot experiment was to estimate the temporal order thresh- old in dyads of different tone types. It was hypothesised that different timbres or attack characteristics (from pure tones to real piano sounds) strongly influence the perception of asynchronies and the temporal sensitivity of the listener in the way that the temporal order thresholds decrease the more artificial the stimulus becomes. As reported in Chapter 3, the perceptual effects of an anticipated voice in multi- voiced musical contexts may include spectral and temporal masking as well as streaming. The second aim of this pilot study was to examine whether the per- ceived salience of a particular tone in a dyad varies with its relative onset timing. The research question was, whether an anticipated tone is heard as more prominent by listeners than the same tone presented in synchrony with the other tone, even when the asynchronies are very small (below 20 ms). Melody leads (anticipation) were more common than lags (see Chapter 3). The effect of direction (anticipation versus delay) was also investigated here. Does anticipation increase the perceptual salience of a tone and does delay attenuate it, or does asynchrony yield effects of salience perception independently of direction? If temporal masking was an impor- tant factor (cf. Section 4.1.4, p. 83), it would have to be hypothesised that a delayed tone gets more masked by the earlier tones and thus receives a lower perceptual salience than an anticipated one. As a last aspect in this pilot, the effect of tone type was examined. Pure tones were expected to entail masking effects to a lesser extent than complex tones and real piano tones.

4.3.2 Method Participants The 19 participants were aged between 22 and 37 years. They were divided into two groups according to the duration of playing and the regular study of a musical instrument: 10 were classified as musically trained, with 10 to 21 years of musical instruction (average 16 years), and 9 as musically untrained, with zero to 7 years of 88 Chapter 4. Perception of Melody

œ œ Figure 4.1: The two intervals & œ œ (octave and seventh) used for the pilot experiment. playing an instrument (average 3 years). Half of the 10 musicians indicated piano as their main instrument; the others comprised each one guitar, flute, oboe, violin, and a composer (who regarded the computer as his instrument). They were 12 male and 7 female listeners. The testing took place in June 2001. Seven were tested in Vienna, the other 12 in Stockholm.

Stimuli

The test design resulted in 88 stimuli: 4 tone types × 2intervals× 11 asynchronies. The four tone types were pure, harmonic complex with 16 partials (−6dBper octave), MIDI-synthesised piano, and recorded from a computer-monitored grand piano. The pair of tones in each trial spanned an interval of an octave or a major seventh. The lower note was always C5 (525 Hz), the higher note C6 (1050 Hz) or B5 (991 Hz, see Figure 4.1). Asynchronies varied from −50 ms to 50 ms, in 10-ms steps. A negative sign means that the upper tone was before the lower, a positive that the lower was before the upper. The tone duration ranged from 300 to 400 ms so that the overlap of the two tones were constant at 350 ms.

Equipment

The MIDI-synthesised tones were created by using a software synthesiser (Timidity) playing back 22 MIDI files (2 intervals × 11 asynchronies) created with a Matlab script. The MIDI velocity of each of the two notes was arbitrarily set to 80. The acoustic piano stimuli were recorded on a computer-controlled B¨osendorfer playing back the same files transferred into the B¨osendorfer file format. Two AKG (CK91) microphones (placed approximately one meter from the strings in a 6 by 6 meter room) brought the signal to a Tascam DA–P1 DAT recorder.1 The stimuli were transferred digitally to the hard disk of a PC using a “Creative SB live! 5.1 Digital” soundcard and stored in WAV format (16-bit, 44.1 kHz, stereo). The pure and the complex harmonic (sawtooth) tones were generated with the same computer soft- ware. Their loudness was adjusted by the author in order to sound approximately equally loud to the MIDI synthesised tones. The stimuli were presented via head- phones to the participants. All signals were presented diotically (same signal in each ear), except the acoustic piano tones which were stereo. The experiment was controlled by a computer program that had been developed for this purpose by the author in a Matlab environment.

1The recordings took place in January 2001at the B¨osendorfer company in Vienna. 4.3. Perception of asynchronous dyads 89

“Which tone is more prominent?” 1 1 Musicians Non−musicians 0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3 Pure

Rated salience of upper tone 0.2 Complex Rated salience of upper tone 0.2 MIDI 0.1 Bösendorfer 0.1 Average 0 0 −50 −40 −30 −20 −10 0 10 20 30 40 50 −50 −40 −30 −20 −10 0 10 20 30 40 50 Asynchrony (ms) Asynchrony (ms)

Figure 4.2: Average answers on the first question (“Which tone is more prominent?”, 1= upper tone, 0 = lower tone) as a function of asynchrony separately for the four types of sound (different lines) and musicians (left panel) and non-musicians (right panel). The grand average is plotted with a solid line and diamonds. The horizontal lines indicate the range of a result not significantly different from chance according to the χ2 distribution. They are plotted separately for the four tone types (dashed) and for the grand average (solid). Negative asynchronies indicate that the upper tone was before the lower, and vice versa.

Procedure The participants were asked to judge the 88 stimuli on two separate occasions, each with a two-alternative forced choice paradigm (2AFC). In the first block, they were asked “Which tone is more prominent?”, in the second the question was “Which tone is earlier?” In both blocks the possible answers were “the upper” or “the lower.” The stimuli were presented in random order within each block. Participants could repeat each stimulus as often as they liked until they were sure about their answer. The question “Which tone is earlier?” was asked after “Which tone is more prominent?” to prevent listeners from guessing that the experiment was about the effect of asynchrony on loudness. After the whole session a short questionnaire was filled in. The session lasted about 20 minutes. The participants did not receive money for their services in this pilot study.

4.3.3 Results Perception of tone salience (question 1) The mean ratings on the first question (“Which tone is more prominent?”) are plot- ted in Figure 4.2 separately for musically trained (musicians) and not trained par- ticipants (non-musicians), as well as type of tone (pure, complex, MIDI-synthesised, and real piano). Additionally, the grand average (also across tone type) is shown 90 Chapter 4. Perception of Melody

“Which tone is more prominent?” 1 1 Musicians Non−musicians 0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

Rated salience of upper tone 0.2 Rated salience of upper tone 0.2

0.1 Seventh 0.1 Octave 0 0 −50 −40 −30 −20 −10 0 10 20 30 40 50 −50 −40 −30 −20 −10 0 10 20 30 40 50 Asynchrony (ms) Asynchrony (ms)

Figure 4.3: Mean ratings (1= upper tone, 0 = lower tone) displayed separately by interval (two lines with squares and diamonds) and question (two panels).

(Figure 4.2). The two horizontal lines indicate the boundaries beyond which the rat- ings are significantly different from chance (50%) according to the χ2 distribution.2 The complete rating data are listed in Table A.1, p. 164. It is evident that there was no striking trend in either direction. The rated salience was invariant over asynchronies for both groups. Therefore, there was also no effect of order—anticipation and delay were equally rated. The ratings of the non-musicians showed considerable differences between types of tone. The upper tone was always favoured in the sawtooth sound, the lower one in the B¨osendorfer sound, a trend which was also present in the musicians’ ratings, but barely beyond the boundaries of significance. It could be that participants preferred the higher tone in the sawtooth sound because the two tones merged into a single percept and they rated the brightness of the dyad. A log-linear analysis on the frequency tables with timbre (4), interval (2), musical skill (2), timing (11) as design variables and rating (2) as response variable was performed. The k factors suggested mainly two-way interactions, the best fitted

2 At n cases, k ratings of “1” (fo = k)andn − k ratings of “0” are observed, while fe = n/2 ratings of “1” and of “0” are expected to be chance. The χ2-value for one degree of freedom and 2 at 95%-level is χ(1;95%) =3.84. A significant rating is called one which is significantly different from chance when

2 2 n 2 n 2 (fo(j) − fe(j)) (k − ) (n − k − ) χ2 2 2 > . . = f = n + n 3 84 (4.1) j=1 e(j) 2 2

Take the left panel of Figure 4.2 as an example. There are n = 20 ratings (10 musicians and 2 intervals) for each asynchrony and tone type. Either 15 or 5 ratings (of ‘1’) would be significantly different from 10, that is either 0.75 or 0.25, respectively. For the grand average over the four tone types, n becomes 80, so the boundaries of significance are therefore 0.61and 0.39. 4.3. Perception of asynchronous dyads 91

model included a significant (p<0.01) main effect of musical skill and two-way interactions between rating and either timing or interval, respectively. Three-way interactions were not favoured by this model. These findings support the splitting of the participants into two groups (musicians versus non-musicians) as done in Figure 4.2. Separate log-linear models on the data split by skill gave a similar picture. In each case, the interactions between either interval or timbre and rating were significant (p<0.01). The results plotted separately for the two different intervals is shown in Fig- ure 4.3. The log-linear model always emphasised the effect of interval on the rating. The upper tone in the octave was rated always as more prominent, whereas at the seventh the two tones were rated approximately equally prominent with a small tendency towards the lower tone. An explanation for this effect could be that the partials of the lower tone in the octave increase the salience of the higher tone.

Temporal order perception (question 2)

The second question (“Which tone is earlier?”) had a correct answer. As a negative sign in the asynchronies indicated that the upper tone was before the lower, the correct answer was “1.” Similarly, a positive sign would entail a rating of “0” as the correct answer. The rating results as well as the correct answers are plotted in Figure 4.4 separately for types of tone and skill. The complete rating data are listed in Table A.1, p. 164. The difference between the two groups was striking. While musicians could hardly get any correct answers above chance for asynchronies lower than 40 ms, non-musicians rated basically totally random. A log-linear analysis on the frequency tables with timbre (4), interval (2), musical skill (2), timing (11) as design variables and rating (2) as response variable was performed on the whole data set. This model yielded only skill as a significant factor. For this reason, the analysis was performed on the two groups (musicians and non-musicians) separately. These models found an interaction between rating and interval and type of tone, respectively. Only the musicians showed an effect on asynchrony. This again confirmed that this task was simply too difficult for non-musicians. Musicians could report the correct order at and beyond ±40 ms with slightly better results for lower-upper patterns (Arpeggio). There are some tiny differences between the types of tone. For example, the piano tones were correctly heard al- ready at −30 ms with the acoustic piano tones, or the complex tones were judged asynchronously at ±10 ms, in the condition with the higher tone leading correctly, in the other falsely. In Figure 4.5, the ratings are plotted separately for the two intervals and the two groups. Also in this display it becomes evident that non-musician could not perform that task correctly, they did not show ratings beyond chance. Musicians showed also at this question a considerable effect of interval. Considerably correctly answered was the Arpeggio condition (lower voice preceded upper) when the interval was an 92 Chapter 4. Perception of Melody

“Which tone is earlier?” 1 1 Musicians Non−musicians 0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 Pure 0.3 Complex Rated order of upper tone 0.2 MIDI Rated order of upper tone 0.2 Bösendorfer 0.1 Average 0.1 Correct answer 0 0 −50 −40 −30 −20 −10 0 10 20 30 40 50 −50 −40 −30 −20 −10 0 10 20 30 40 50 Asynchrony (ms) Asynchrony (ms)

Figure 4.4: Averaged answers on the second question (“Which tone is earlier?”, 1= upper tone, 0 = lower tone) by amount of asynchrony, displayed separately for tone types (lines) and musical skill (musicians left, non-musicians right). The correct result is indicated by a dotted line.

“Which tone is earlier?” 1 1 Musicians Non−musicians 0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

Rated order of upper tone 0.2 Rated order of upper tone 0.2 Seventh 0.1 Octave 0.1 Correct answer 0 0 −50 −40 −30 −20 −10 0 10 20 30 40 50 −50 −40 −30 −20 −10 0 10 20 30 40 50 Asynchrony (ms) Asynchrony (ms)

Figure 4.5: Mean ratings (“Which tone is earlier?”, 1= upper tone, 0 = lower tone) displayed by interval (two lines with squares and diamonds) and question (two panels). 4.3. Perception of asynchronous dyads 93

octave (diamonds in Figure 4.5), or in the opposite condition when the interval was a seventh (squares in Figure 4.5). Tillmann and Bharucha (2002) used the detection of a 50-ms asynchrony as an indicator for harmonic relatedness in chords consisting of three voices. Their participants performed significantly better (that is detected this 50-ms asynchrony more correctly) with a harmonically related primer than with an unrelated one. The present results are barely interpretable with these findings. The octave was not rated generally better. On the contrary, its asynchrony was only more correctly detected in the Arpeggio condition. The seventh showed the opposite behaviour. Musicians can’t hear the order at ±20 ms, so they guess on the basis of other cues. It can only be guessed here what these other cues might be. Participants might tend to mix up the two tasks and rate the tone sounding less important to them as earlier (which is the upper in case of the seventh, and more the lower at the octave).

4.3.4 Discussion This pilot study comprised two questions on dyads with varying onset asynchrony, type of tone, and interval. Although the results obtained were not conclusive due to a too limited number of participants, some preliminary results have to be pointed out here. The most fundamental finding was that non-musicians could obviously not judge the stimulus material with sufficient precision. Therefore, only musically trained participants were involved in the remaining experiments (see below). We found no consistent effect of relative onset timing on the perceived salience of a tone. Furthermore, there was no effect of order, regardless of the tone type: a delayed attack was considered to have the same prominence as an early attack. This casts doubt on the frequently encountered tacit assumption in the music (and especially piano) performance literature that the first onset is perceived as more salient. The different types of tone were judged slightly different, especially by the non-musicians. The upper tone was rated more prominent with the sawtooth sound, the lower tone with the real piano sound. Regarding the second question (“Which tone is earlier?”), listeners only consis- tently reported the correct order of the two onsets for asynchronies of greater than about 30–40 ms—again, regardless of whether the higher or the lower tone began first. Identification of order barely improved as the sounds became more artificial: the threshold was around 30 ms at pure tones, and 40 ms for real piano tones. These results are substantially larger than results reported in the psycho-acoustic litera- ture (see Section 4.1.2), but consistent with findings reported by Handel (1993) and Reuter (1995). These large temporal order threshold suggests that melody leads are not heard as asynchronous by listeners. However, it still has to be expected that although listeners could not tell the correct order below ±40 ms that they do hear those asynchronies as starting at different times though as a single musical event. To draw more final conclusions on this pilot study and to be able to interpret the statistical test results, more participants would have been required. The χ2 test 94 Chapter 4. Perception of Melody

is not applicable for expected frequencies lower than 5 (Bortz, 1999, p. 170). In our case, it would require at least 10 subjects for each sub-group (which was the number of musicians, but not of the non-musicians). The timing precision of the acoustic piano tones depends on the precision of the B¨osendorfer computer-controlled piano. According to results reported in Sec- tion 2.3.3 (Figure 2.14 on p. 41), the mean timing error in the reproduction is of the order of 3 ms. Still, sympathetic resonances between the two tones could change their loudness and thus bias the ratings. To overcome this and control timing preci- sion, the stimuli used in the following experiments were created by adding together digital recordings of individual piano tones. 4.4. Perception of dyads manipulated in tone balance and synchrony 95

4.4 Perception of dyads varying in tone balance and synchrony

This section reports a set of three experiments conducted in a single test session. This work was first presented at the 7th International Conference on Music Per- ception and Cognition at the University of New South Wales in Sydney, Australia (Goebl and Parncutt, 2002).

4.4.1 Introduction

In a preliminary experiment on the perception of harmonic dyads (see Section 4.3, and Goebl and Parncutt, 2001), we found no significant difference between the per- ceptual prominence of a delayed and an anticipated higher tone. In that pilot study, we used two tones of equal intensity. We also found that musically trained partici- pants could report the correct order of two (equally intense) stimuli at asynchronies exceeding about ±40 ms, irrespective of tone complexity. However in piano performance, anticipation of an emphasised voice usually oc- curred in parallel with an increase in intensity of that particular voice, especially when the voices fell into one hand. In the present experiments, we are therefore interested how the perceptual prominence of individual tones changes when relative timing is varied as well as the intensity balance of the tones. In the present study, we used three different types of tone (pure, complex, piano). We first asked which relative dynamic level of the tones of a harmonic major-sixth dyad produced an impression of equal loudness or salience (balance adjustment, Ex- periment I). This was done separately for each listener and for each of three different tone types. Using these data as a baseline, we then investigated the relative percep- tual salience of the tones of a harmonic dyad in which relative timing and relative intensity were varied systematically (Salience perception, Experiment II). Finally, we investigated listeners’ sensitivity to asynchrony detection in the context of varia- tion in tone intensity balance of the dyads (Asynchrony detection, Experiment III). The two artificial sounds (pure and sawtooth) were included into the experiment in order to be better able to control the psycho-acoustic effect of masking. Two pure tones apart in pitch as far as an interval of a major sixth should not mask each other, while the partials of the two complex tones will fall within the same critical bands and therefore partly mask each other. Hence, the salience of the upper complex tone should be rated higher with increasing asynchrony because it is no longer masked by the lower complex tone as it was in synchrony. Moreover, it is hypothesised that masking changes the listeners’ sensitivity to asynchrony. A loud tone coming early will prevent a softer later arriving tone to be heard as beginning later. The opposite condition (a soft tone followed by a louder one) will be perceived as more asynchronous. 96 Chapter 4. Perception of Melody

4.4.2 Determination of balance baseline (Experiment I) This experiment determined for each participant the relative dynamic level of the tones of a major-sixth dyad at which they sound equally loud to the participant.

Method Participants The 26 participants were aged between 23 and 32 years. All were musicians who had been playing their instrument regularly for an average period of 18.9 years. Twenty-three of them had studied their instrument at a post-secondary level for an average period of 8.3 years. They comprised 15 pianists, 5 violinists, 1 singer (a tenor), 3 cellists, 1 double bass player, and 1 composer (who regarded the computer as his main instrument).

Stimuli Three tone types were used: pure, sawtooth with 16 partials (−6dB per octave), and piano. To avoid uncontrolled asynchronies and sympathetic vibra- tions, harmonic dyads of piano tones were created by digital superposition of indi- vidual monophonic tone recordings. The MIDI velocity values ranged from 79/31 (higher/lower tone) to 31/79, in increments of ±2 units (79/31, 77/33, 75/35, etc.). The nominal equality was thus 55/55—a typical mezzo forte. The amplitudes of the pure and sawtooth stimuli were similar to those of the recorded piano sounds.3 Five different dyads were presented to the participants, each spanning the musical interval of a major sixth. Three of them comprised piano tones: B4 and G#5 (approx. 494 and 831 Hz), C5 and A5 (523 and 880 Hz), and Db5 and Bb5 (554 and 932 Hz) respectively.4 The other two dyads were synthetic; one comprised two pure tones, the other two sawtooth tones. In both cases the (fundamental) frequencies were 523 and 880 Hz, corresponding to C5 and A5.

Equipment The acoustic piano tones were played on a computer-controlled B¨o- sendorfer SE290 (at every MIDI velocity between 20 and 905) and recorded with AKG (CK91) microphones (placed approximately one meter from the strings, in a 6 by 6 meter room) onto a Tascam DA–P1 DAT recorder.6 They were transferred digitally to the hard disc of a PC using a “Creative SB live! 5.1 Digital” soundcard and stored in WAV format (16-bit, 44.1 kHz, stereo). The pure and synthetic com- plex tones were generated using Matlab software. During the experiment, all sounds

3 The relationship between key velocity (in MIDI velocity units) and peak sound level (in dB) for the 1700 single tones played on the B¨osendorfer SE290 was approximated by the expression: 2 −77.2+26.1 · log10(MIDI velocity)+5.3 · log10(MIDI velocity) . 4The given frequency values are calculated and correspond to equal temperament with a A4 at 440 Hz. The actual frequency of the lowest partial of each tone will be slightly different from these values due to inharmonicity and pitch shifts. 5In this study, the relation between MIDI velocity and hammer velocity (in meters per second) at the B¨osendorfer system was set to be: MIDI velocity= 52 + 25 · log2(hammer velocity). 6The recording took place on November 6, 2001at the B¨osendorfer company in Vienna (The same recording as in Section 2.4 on p. 50). 4.4. Perception of dyads manipulated in tone balance and synchrony 97

(a) (b) 30 10 n=26 n=26 25 8

20 6

15 4

10 2 5 0

0 Intensity difference (dB) −2 −5 Intensity difference (MIDI velocity units) −4 −10 C5/A5 pure C5/A5 sawtooth B4/G#5 piano C5/A5 piano Db5/Bb5 piano C5/A5 pure C5/A5 sawtooth B4/G#5 piano C5/A5 piano Db5/Bb5 piano Figure 4.6: Experiment I. Intensity difference between simultaneous tones judged to be equally loud, averaged across 26 participants. Error bars denote 95% confidence intervals of the means. Vertical axes: (a) relative to MIDI velocity or equivalent; (b) in dB peak sound level. In each case, a positive value means that the higher tone was more intense than the lower at equal loudness. were played back via the same sound card and Sennheiser HD 25–1 headphones (di- otic presentation: same signal in each ear). The experiments were controlled by a computer program that had been developed in a Matlab environment.

Procedure In each trial, participants adjusted the level of two simultaneous tones relative to each other until they sounded equally loud. Five trials were presented in a random order that differed from one participant to the next. The relative intensities of the two tones at the start of each trial was also selected at random, from 25 possibilities. Participants first adjusted the relative level of the tones in relatively large increments of ±6 MIDI units (i.e., one tone became 6 units louder while the other became 6 units softer, so that the difference in MIDI velocities changed by 12 in each step). In the second block, the five stimuli were repeated in the same order and adjusted in smaller steps of ±2 MIDI units. Participants were asked to go past the point of equal loudness and return to it from the other side before going on to the next dyad. Each stimulus could be adjusted and repeated for an indefinite period. To test reliability, the entire procedure (coarse followed by fine tuning) was repeated. If the mean difference between the results for the first and second block was larger than 6 MIDI velocity units, a third block was run; this happened for 5 of the 26 participants. After all three experiments were completed, a questionnaire was filled in. The whole session lasted between 30 and 70 minutes. Participants were paid 20 Euro for their services.

Results and discussion The mean adjustments over all 26 participants’ median adjustments are plotted in Figure 4.6a against the (equivalent) MIDI velocity differences between the two tones (see Footnote 3, p. 96). The complete rating data by all participants are listed in Table A.2, p. 168. A positive difference on the y axis indicates that, at 98 Chapter 4. Perception of Melody

equal loudness, the higher tone had greater intensity or MIDI velocity. The data initially suggest that for all three piano dyads and for the sawtooth dyads, the higher tones had considerably higher level than the lower tones at equal loudness (salience). Piano tones with the same hammer velocities do not necessarily have the same peak SPL.7 For example, B4 on our piano samples was always 6 dB more intense than G#5 played with the same MIDI velocity.8 Once the data have been adjusted to account for this (Figure 4.6b), the sound level differences in the piano samples disappeared. Only in the sawtooth sounds there was a significant difference in SPL (of about 6 dB) at equal subjective loudness. The effect cannot be accounted for by the Fletcher–Munson loudness contours for pure tones, which would predict just the opposite tendency. Instead, the effect may be accounted for by masking between the higher partials. Since lower pure tones generally mask higher pure tones more than the reverse (Moore, 1997), higher harmonic complex tones may need to have greater SPL to be perceived as equally loud as simultaneously sounding lower complex tones with identical temporal and spectral envelopes. The masking effect in the piano tones may have been less promi- nent due to the spectral and temporal variability of the amplitudes of the partials, and because the spectral slope of each tone depends on both intensity and register. The greater spread in the data for the sawtooth tones is consistent with comments on the final questionnaire to the effect that the sawtooth sounds were the hardest of the three tone types to judge, presumably due to unfamiliarity.

4.4.3Perception of tone salience (Experiment II) Method Equipment and participants were the same as in the previous experiment. Each of the three tone types (pure, sawtooth, and piano) was presented in five intensity combinations and with five degrees of asynchrony, resulting in 3 × 5 × 5 = 75 stimuli. The intensity combinations were +20/−20, +10/−10, 0/0, −10/+10, and −20/+20 MIDI velocity units, relative to the median levels judged to be equally loud in the previous experiment; the baselines were maintained separately for each tone type and for each participant. The asynchronies were −54, −27, 0, 27, and 54 ms (where a negative value indicates that the higher tone began before the lower tone). Regardless of whether the onset was synchronous or asynchronous, the tones always sounded together for a total of 350 ms, and faded out simultaneously. The chosen time differences were typical of melody leads in piano performance. The velocity artifact hypothesis (Repp, 1996a; Goebl, 2001) is based on the simple

7The peak dB value of a piano sample was the maximum value of a RMS smoothed sound signal. The window size was 10 ms. 8The peak dB values for the same key and the same hammer velocity change strongly with microphone position. The lines of equal MIDI velocity of the second channel of our piano recordings showed a quite different picture than the first channel (for a similar discussion see Repp, 1997a, p. 1880, and Section 2.4, p. 50). 4.4. Perception of dyads manipulated in tone balance and synchrony 99

“Which of the two tones is louder?”

+20/−20 n=25 +10/−10 7 Pure 0/0 7 Sawtooth 7 Piano −10/+10 −20/+20 6 6 6

5 5 5

4 4 4

3 3 3

Rated salience of upper tone 2 2 2 n=8 n=8 n=59 1 1 1 −54 −27 0 27 54 −54 −27 0 27 54 −54 −27 0 27 54 Asynchrony (ms)

Figure 4.7: Experiment II. Mean relative loudness ratings over 25 participants as a function of tone type and asynchrony. Rating scale: 1, lower tone much louder; 4, two equally loud; 7, upper tone much louder. The five horizontal lines in each panel correspond to the 5 intensity combinations (MIDI velocity of upper tone relative to lower tone); the three panels correspond to the three tone types (pure, sawtooth, piano). Error bars indicate 95% confidence intervals of the means across participants. observation that the faster a piano key is depressed, the earlier the hammer arrives at the strings. In a typical modern grand piano, when two tones are struck simul- taneously from the key surface, a higher tone that is 20 MIDI velocity units louder than the lower tone typically sounds about 27 ms before the lower tone (cf. to the model for melody lead, Section 3.7, p. 78). Participants indicated which of the two tones sounded louder on a scale from 1 (lower tone much louder) to 7 (higher tone much louder). Equal loudness was indicated by 4. After 13 practice stimuli, the 75 stimuli were presented in a random order that was varied from one listener to the next. Each stimulus could be repeated as often as desired before deciding on a rating.

Results and discussion In the final questionnaire, participants indicated that this experiment was the most difficult of the three. One participant’s results had to be excluded, because he could not perform the task at all (as he indicated in the questionnaire). The mean ratings are plotted in Figure 4.7, separately for tone type (panels), intensity combination (lines), and asynchrony (x axes).9 The complete rating data

9 Due to a programming mistake of the author, the −54 and the −27 ms condition of the pure tones were omitted and the 0 ms condition presented three times for the first 17 participants (also in Exp. III). The modified n values are specially indicated in Figure 4.7 and Figure 4.8, the different 100 Chapter 4. Perception of Melody

are listed in Table A.3, p. 169. A repeated-measures analysis of variance was performed on the ratings with timbre (pure, sawtooth, piano), asynchrony (5-fold) and intensity (5-fold) as within- subject factors and instrument (piano, other instrument) as a between subject fac- tor.10 It revealed no significant difference in the ratings between pianists and mu- sicians with another main instrument [F (1, 23) = 2.17,p=0.154]. There were significant effects of timbre [F (2, 46) = 6.18 εG.G. =0.90,padj =0.0057], asyn- chrony [F (4, 92) = 3.54 εG.G. =0.83,padj =0.0153], and intensity [F (4, 92) = 525.26 εG.G. =0.46,padj < 0.001]. The two-way interaction between timbre and asynchrony [F (8, 184) = 2.95 εG.G. =0.65,padj =0.0140] and between timbre and intensity [F (8, 184) = 22.36 εG.G. =0.46,padj < 0.001] were significant, whereas the interaction asynchrony × intensity not [F (16, 368) = 1.85 εG.G. = 0.51,padj =0.0692]. The three-way interaction of timbre × asynchrony × in- tensity [F (32, 736) = 1.07 εG.G. =0.35,padj =0.3868] did also not gain significantly different ratings. Regarding timbre, the range of judgements was smallest for pure tones and largest for the piano tones, suggesting that the difference in salience between two simultaneous tones depends on the number of audible harmonics in each tone (prob- ably consistent with the masking hypothesis advanced above). To evaluate whether anticipation or delay of the tones changed their loudness judgement, linear contrasts were performed on the asynchronies (+1, +1, 0, −1, −1) separately for each intensity combination and the sawtooth and the piano sounds.11 The results of these contrasts are listed in Table 4.1. Only three intensity con- ditions showed significantly louder ratings at anticipation in comparison to delay. Surprisingly, none of the piano tones showed such results. To conclude, in the case of dyads perceived loudness was primarily controlled by the loudness of the tones presented. Relative timing did not help to change the ratings. Only at the sawtooth sounds, there was an advantage for anticipated tones in comparison to delayed ones. More complex sounds were easier to judge with respect to their loudness than pure tones.

4.4.4 Asynchrony detection (Experiment III) The psychoacoustic literature initially suggests that listeners can easily distinguish synchronous from asynchronous dyads: the temporal order threshold (which tone came first?) is around 20 ms (Hirsh, 1959), and the threshold for asynchrony de- tection (were the tones synchronous?) can be as low as 2 ms (Zera and Green, lengths of the error bars depicting 95%-confidence intervals reflect this. 10Missing data was introduced because of a programming mistake of the author (see Footnote 9). They comprised two timing conditions (−54, −27 ms) at the pure tones for the first 17 participants, thus 17 × 5 × 2 = 170 ratings which is 8.7% of the data. For the ANOVA, the missing data were interpolated by the ratings of the other participants. 11The linear contrasts were not performed for the pure tones due to the missing data in the anticipation condition (see Footnote 9). 4.4. Perception of dyads manipulated in tone balance and synchrony 101

Table 4.1: Experiment II. Linear contrasts of asynchrony (+1, +1, 0, −1, −1) between anticipation (−55 and −27 ms) and delay (+27 and +55 ms), separately for each intensity combination and two timbre conditions (sawtooth and piano). Sawtooth Piano Intensity Fp Fp +20/−20 12.0181 0.0021∗∗ 0.7830 0.3854 +10/−10 5.4335 0.0289∗ 1.4839 0.2355 0/ 0 17.5392 0.0004∗∗ 1.3567 0.2561 −10/+10 3.1571 0.0888 0.1867 0.6697 −20/+20 3.7097 0.0666 1.1382 0.2971

1993b). But in (piano) music, where tones have overlapping spectra and unequal loudness (so that one tone is masked by the other), the thresholds are higher. In this experiment it was set out to measure these higher thresholds, while additionally manipulating the intensity balance of the tones. It is expected that different intensity combinations will clearly influence the asyn- chrony detection threshold. At intensity combinations with a louder tone followed by a softer tone, the relative timing differences have to be larger to be detected as such by listeners than in the opposite condition.

Method The procedure, participants, and stimuli were identical to Experiment II. The only difference was the question: The participants were asked whether or not the two tones were simultaneous, in a 2AFC paradigm.

Results and discussion The results are plotted in Figure 4.8 in terms of relative frequencies (ranging from 0 to 1). The expected responses were “yes” (1) for synchrony, and “no” (0) for the four asynchronous conditions. The correct answers are sketched in Figure 4.8 by grey dashed lines. The dotted lines mark the range in which observed frequencies are not significantly different from chance according to the χ2 test (cf. Section 4.3.3, p. 89).12 The complete rating data are listed in Table A.4, p. 170. The synchronous dyads were reliably recognised for the two artificial tone types, independent of relative intensity. But for the piano tones, asynchronous dyads were often heard to be synchronous when the louder tone preceded the softer tone (melody lead). This effect also appeared for the synthetic tones; it was weakest for the pure tones and strongest for the piano sounds. For instance, the +20/−20 condition (first row in Figure 4.8) at −27 ms (sawtooth and piano) was perceived as simultaneous

12The different n values that were due to the missing data of the two anticipated conditions with pure tones (see Footnote 9, p. 99) warp those lines. 102 Chapter 4. Perception of Melody

“Are the two tones simultaneous?”

n=26 n=9 n=9 n=60 1 Pure 1 Sawtooth 1 Piano

+20/−20 0.5 0.5 0.5

0 0 0 −54 −27 0 27 54 −54 −27 0 27 54 −54 −27 0 27 54 1 1 1

+10/−10 0.5 0.5 0.5

0 0 0 −54 −27 0 27 54 −54 −27 0 27 54 −54 −27 0 27 54 1 1 1

0/0 0.5 0.5 0.5

0 0 0 −54 −27 0 27 54 −54 −27 0 27 54 −54 −27 0 27 54 1 1 1 Rated simultaneousness −10/+10 0.5 0.5 0.5

0 0 0 −54 −27 0 27 54 −54 −27 0 27 54 −54 −27 0 27 54 1 1 1

−20/+20 0.5 0.5 0.5

0 0 0 −54 −27 0 27 54 −54 −27 0 27 54 −54 −27 0 27 54 Asynchrony (ms)

Figure 4.8: Experiment III. Mean ratings over 26 participants as a function of asynchrony. The answers could be “yes” (1) or “no” (0). Different rows of panels for the different intensity combinations (relative MIDI velocity upper/lower tone), separate columns of panels for the three tone types. The grey dashed lines in the background indicate the ‘correct’ answer; the two dotted lines denote the boundaries within which the observed frequencies are not significantly different from chance. The missing data (reflected in the different n values in the pure tone condition, left column) of the two anticipated conditions at the pure tones warp those lines. 4.4. Perception of dyads manipulated in tone balance and synchrony 103

by around 70% of the participants (significantly different from chance), whereas at +27 ms it was heard as asynchronous by almost everyone. This asymmetry was also found at the opposite intensity combination (−20/+20 MIDI velocity units, bottom row) for sawtooth and piano sound, as well as for the +10/−10 condition with piano sounds. Two possible explanations may be advanced for these asymmetries. The first involves familiarity with piano music: either listeners are insensitive to melody lead in piano performance (due to overexposure); or the listeners in this experiment only noticed, and hence correctly identified, asynchrony when a musically unfamiliar combination of relative loudness and timing was presented. Those participants who were also pianists might additionally have associated the characteristic sound of melody lead with the (kinesthetic) sensation of fingers simultaneously striking the key surface. The second explanation is more psychoacoustic in nature: the effect could be due to forward masking. A louder, anticipated tone masks a softer tone by forward masking, which is stronger than backward masking and attenuates the following softer tone for about the same period of time as typical melody leads show (some tens of milliseconds, Zwicker and Fastl, 1999). Simultaneous masking applies especially among the partials of complex tones spanning typical music intervals, consistent with the finding that the observed asymmetry is stronger for complex than for pure tones.

4.4.5 Conclusion These three experiments investigated the perception of dyads manipulated system- atically in tone balance and synchrony. The two main questions were (1) whether relative timing of the tones of the dyads changed their perceptual salience and (2) whether the detection of asynchronies was influenced by imbalance of tone intensi- ties. Since the variation of tone balance was new in this series of experiments, it was important to find out what tone intensities engender the subjective impression of equal loudness in two-tone sonorities. Participants adjusted a tone balance to sound equally loud when the individual tones were equal in sound level rather than equal in terms of MIDI velocity. The perceived salience of the tones was primarily determined by the relative in- tensity of the tones. Relative timing did not change the ratings, except for sawtooth sounds where anticipated tones received slightly higher salience ratings than delayed tones. Loudness ratings were more clear with more complex sounds. Participants used a smaller range of the rating scale with pure tones, but almost the whole range with real piano sounds. Asynchronies were generally well detected when the two tones were at least 27 ms apart in time. However, there was a strong effect of relative intensity. Asynchronies as large as 55 ms were rated randomly (participants could not tell whether they were synchronous or not), when the first tone was louder than the second. These intensity 104 Chapter 4. Perception of Melody

combinations (early and loud) corresponded to the typical melody-lead situation. Either the participants were so familiar with these combinations of asynchrony and imbalance so they detect asynchronies only in unfamiliar combinations of relative timing and intensity, or due to temporal masking the onset of the weaker second tone gets masked by the first louder tone. These new insights into the perception of asynchronous onsets with variations in loudness seem to explain why pianists are largely unaware of the melody lead (Parncutt and Holming, 2000). 4.5. Perception of asynchronous and unbalanced tones in chords 105

4.5 Perception of chords and chord progressions varying in tone balance and synchrony (Ex- periments IV and V)

The following three experiments are an extension and continuation of the previous experiments (Goebl and Parncutt, 2002, see Experiments I–III) They use chords instead of dyads (Exp. IV), sequences of chords (Exp. V), and real music (Exp. VI) as stimuli. They were included in a single test session. Experiments IV–VI will be presented at the 5th Triennial ESCOM conference in Hanover, Germany (Goebl and Parncutt, 2003).

4.5.1 Introduction In these experiments, we again investigated how asynchrony enhances the perceptual salience of individual voices with respect to changes in relative intensity. Findings from the previous experiments supported the hypothesis that changes in intensity were the dominating factor and onset asynchrony had only a marginal influence on the percptual salience of a given tone of a chord. However, these experiments used dyads as stimuli. Real music contexts may as well involve more than two voices at a time (e.g., four voices as in the excerpts of Chapter 3, see p. 60). The perceptual attention by the listeners varies with pitch position in a chord. As reported in Section 4.1.1 (p. 80), there is empirical evidence that outer voices (soprano or bass) receive greater perceptual attention than inner voices, and that additionally the highest voice, in the classic-romantic repertoire mostly the melody, enjoys generally perceptual advantage (Huron, 1989; DeWitt and Samuel, 1990; Palmer and van de Sande, 1993; Palmer and Holleran, 1994; Repp, 1996c). In the following two experiments (IV and V), we used piano triads instead of dyads. We asked musicians to judge the perceived loudness of a particular tone in a triad that was simultaneously manipulated in relative onset timing and intensity balance by up to ±55 ms and +30/−22 MIDI velocity units. Experiment V had the same design as Experiment IV, except that it used sequences of chords instead of isolated chords. Each chord was repeated five times giving the impression of a short musical unit in 4/4 metre. The loudness of the individual voices had to be rated as before. With this design, we were able to test whether streaming (introduced by a temporal shift of one voice) changed the perceived salience of a given voice. Moreover, we were able to test whether the direction of the asynchrony (melody lead versus melody lag) had any influence of the perceived salience. If streaming influenced the salience of an individual voice, we would expect different results than in Experiment IV. Four research questions were addressed here. First, it was investigated whether findings from the previous experiments (tone imbalance far more important than asynchrony) could be repeated with three-tone chords and three-tone chord pro- 106 Chapter 4. Perception of Melody

gressions. Second, the influence of vertical position in the chord on the perceptual salience was examined. Third, whether streaming as introduced with the use of chord progressions enhances the perceptual salience of an individual voice. And fourth, whether the direction of relative timing is crucial for the perceptual salience (e.g., anticipation enhances the perceptual salience while delay attenuates it).

4.5.2 Method

Participants

Experiments IV, V, and VI were included in a single test session. The 26 musi- cally trained listeners comprised 17 pianists and 9 other instrumentalists (violin, violoncello, and acoustic guitar). They had been playing their instruments regularly for an average of 17.9 years (s.d. = 5.5 years). Twenty-three of them had studied their instruments at post-secondary level for an average period of 7.2 years (s.d. = 4.1 years). Their ages ranged from 19 to 35 with an average of 26.5 (s.d. = 4.5).

Stimuli

Two chords consisting of three tones were used. The two upper tones spanned an interval of a major sixth as in the previous experiments. The bottom tone was either a major sixth or a fifth below the middle tone so that one chord resulted in a major triad (second inversion) and the other in a minor triad (root position). These two chords appeared randomly also in transpositions one semitone higher and lower. The two chords and their transpositions are shown in Figure 4.9. The transpositions were arranged randomly so that every transposition occurred equally often. In each chord, one tone (the target) was shifted in time and varied in intensity relative to the other two. The five asynchronies were −55, −27, 0, 27, and 55 ms. The five intensity combinations were [target tone/other two tones]: 30/−12, 15/−5, 0/0, −12/5, −22/12 MIDI velocity units relative to a medium intensity of 50 MIDI velocity units. These intensity combinations were chosen so that the differences in velocity imply the above named asynchronies according to the velocity artifact (see Section 3.3.3, p. 63 and the model for melody lead, Section 3.7, p. 78). In addition to that, the pairs of velocities were supposed to produce a sonority the loudness of which remains approximately constant over the five combinations. The target tone could occur in any of the three voices of the chord (upper, middle, or lower). The listeners’ attention was directed to the target by a priming tone which started

˙ b˙ ˙ b˙ & ˙ ˙ ˙ ˙ #˙ ˙ #˙ ˙ Figure 4.9: Experiment IV and V: The two ? #˙ n1˙ b˙ #˙ n˙2 b˙ chord combinations and their transpositions one semitone upwards and downwards. 4.5. Perception of asynchronous and unbalanced tones in chords 107

œ œ œ œ ˙ Figure 4.10: Experiment V. Example for & c ˙ Ó œ œ œ œ ˙ a possible stimulus with a primer for the middle voice. The inter-onset interval of ? c ∑ œ œ œ œ ˙ a quarter note is 300 ms (quarter note = 100 beats per minute).

1200 ms before the tested chord and sounded for 600 ms. The intensity of the primer was always constant at a medium intensity of 50 MIDI velocity units. The test design was therefore: 2 chords (randomly transposed one semitone up or down) × 3 target voices × 5 intensities × 5 asynchronies = 150 stimuli for each block of the experiment. The stimuli for Experiment V were the same as in the previous experiment: 2 chords (randomly transposed one semitone up or down) × 3 target voices × 5 intensities × 5 asynchronies. Each chord was repeated five times (see Figure 4.10). The inter-onset interval was 300 ms except the last chord was 330 ms after the previous event in order to give an impression of a 4/4 metre. Each chord was faded out shortly before the new chord started (21 samples at a 44100 Hz sampling rate). The last chord sounded its full length (uncut). The sequence of chords sounded fairly natural, like five portamento chords linked together without pedal. Again the listeners’ attention was guided by an acoustical primer which came 1200 ms before the stimulus, lasted for 600 ms and was held at a constant intensity of 50 MIDI velocity units. The time interval between primer and the first chord was equal to the time interval between the first and the last chord (as represented in standard music notation in Figure 4.10).

Equipment The piano sounds were taken from acoustic recordings of tones produced by the B¨osendorfer SE290.13 Each of the 97 keys were played with MIDI velocities from 10 to 11014 in steps of two MIDI velocity units; all tones had a duration of 300 ms in the file. The sounds were transferred digitally onto computer hard disk15 and stored in WAV format (16-bit, 44.1 kHz, stereo). As a result of the adjustment experiment in Section 4.4.2 (p. 96), the intensity of the recorded piano samples was referred to in terms of peak sound level in decibels and not in terms of MIDI velocity units. The mean relation between MIDI velocity units and dB peak sound level of the 1700 recorded tones from the B¨osendorfer was

13The B¨osendorfer SE290 played back a computer-generated file in the B¨osendorfer file format triple. The recording were performed on January 7, 2002 at the B¨osendorfer company in Vienna using a TASCAM digital audio tape recorder (DA–P1) and AKG (CK91) stereo microphones with ORTF positioning (see also Section 2.4 on p. 50). For the present study, only the first channel of this recording was included. 14Using the same velocity map as in Experiment I–III (see Chapter 4.4): MIDI velocity= 52 + 25 · log2(hammer velocity). 15Using the digital input of a “Creative SB live! 5.1Digital” soundcard. 108 Chapter 4. Perception of Melody

Figure 4.11: Screen shot of the graphical user interface used for Experiments IV and V. For the participants, the experiments were numbered starting with I. In this figure the first repetition is displayed (Ia). approximately − . . · . · 2 pSL = 77 19 + 26 11 log10(MIDI vel)+5277 log10(MIDI vel) (4.2) (see Section 4.4.2, p. 96). For each stimulus tone, the sample that was closest in peak sound level was chosen out of the pool of recorded sample tones. Since the peak sound level increment between one sample and its next louder (or softer) sample was comparably small (see Figure 2.20 on p. 51), the introduced rounding error was negligible. All tones were added up on hard disk to avoid sympathetic resonances. The sound samples were all faded out after 500 ms (fade out time 15 ms) so that the overall duration of each stimulus did not exceed 600 ms.

Procedure A graphical user interface was designed for Experiments IV, V, and VI by the author, a screen shot of which is depicted in Figure 4.11. The computer program guided the users through the experiment. They received instructions and a short training session before each task. 4.5. Perception of asynchronous and unbalanced tones in chords 109

Each stimulus was preceded by an acoustical primer indicating the pitch of the target tone. At the same time, the chord was presented in musical notation with an arrow pointing at the target tone. In these two experiments they were asked: “How loud does the target tone/voice sound to you (in comparison to the other tones)?” The participants answered by clicking on seven radio buttons representing a 7-point scale from 1, “very much softer,” over 4, “equally loud” to 7, “very much louder” (see Figure 4.11). Each stimulus could be repeated as often as the listeners wished to do so (using the “Play again” button) until they were sure about their judgement. The next stimulus was played when the “Next” button was pressed. A short training session familiarised the participants with the stimulus material and the graphical user interface. In the training session, only extreme cases (very loud, very soft) were presented to the participants. In this training session, they were supervised by the computer so that they had to revise their rating when it was opposite the expected (e.g., when the target tone was played very soft and they rated it to be “very much louder”). This feed-back loop was introduced to assure that the participants rated the correct tone in the chord. They had to rate 10 stimuli ‘correctly’ before they could proceed to the actual experiment. The user interface played the stimuli in random order and stored the stimulus information and the ratings of the participants on hard disk. The stimuli were presented diotically (same signal in each ear)16 to the participants via headphones (Sennheiser HD 25–1). The participants completed two blocks of Experiment IV, one before and one after Experiment V. The order of experiments within the session was thus IVa, V, IVb, VI. After having finished the whole listening test, they filled in a questionnaire to indicate their age, musical skill, and feed back about the test. The whole listening test took between 60 and 90 minutes. The participants were paid 15 Euros for their services.

4.5.3Results and discussion The participants had to rate the 150 stimuli three times: once for Exp. IVa, once for Exp. V, and a third time for Exp. IVb. Thus, they became more and more familiar with the stimulus material. This effect is reflected in the duration of each block and in the number of repetitions of each stimulus. The participants typically needed about 24 minutes for the first block of Exper- iment IV, 19 minutes for Experiment V, 18 minutes for the repetition of Exp. IV, and 8 minutes for the last experiment. They heard each stimulus more than two times in the first round of Experiment IV, and less than two times in its repetition and in Exp. V, and only slightly more than once (which is the minimum possible) in Experiment VI (see Figure 4.12).17

16For the stimuli of Experiments IV and V. The stimuli from Experiment VI, the stimuli are presented stereo (as recorded from the SE grand piano); see Section 4.6, p. 118. 17Two separate repeated-measures analyses of variance (ANOVAs) on the average number of 110 Chapter 4. Perception of Melody

Repetition Duration

2.6 30 2.4 25 2.2 2 20 1.8 1.6 15 1.4 10

1.2 Mean durations (Minutes) Mean number of repetitions 1 5 Exp 4a Exp 5 Exp 4b Exp 6 Exp 4a Exp 5 Exp 4b Exp 6

Figure 4.12: Experiments IV–VI. Number of stimulus repetitions averaged over all stimuli and participants (left panel) and average duration (right panel) separately for the four experimental blocks. The experimental blocks are sorted in the order of their appearance in the listening test session.

Participants came faster to their judgements in Experiment V and were then trained to cope faster (and presumably more accurately) with the second block of Experiment IV. This coincides with oral communication with the participants. 12 of them indicated that Experiment V was easier, the repetition of Exp. IV easier than its first block, but “exhausting” as well, and Exp. VI “relaxing” and “a relief” after the artificial stimuli (see Section 4.6, p. 118). We can therefore assume that Exp. IVb was passed by the participants with higher skill after being trained by the repeating chords of Exp. V, but probably also with some noise in the answers that might have been due to fatigue and decreasing concentration.

Effects of intensity balance and asynchrony

Experiment IV To investigate the main issues of this experiment—the effects of voice position, asynchrony, and relative intensity on the perceptual salience—two repeated-measures ANOVAs were performed on the ratings of the two repetitions of Experiment IV separately, with voice (upper, middle, and lower), asynchrony (5-fold), and intensity (5-fold) as within-subject factors and instrument (pianist, non-pianist) as a between-subject factor. The data for this ANOVA are listed in Table A.5, p. 177 and Table A.6, p. 178. There was no significant difference be- tween the ratings of the pianists and the ratings of performers of other instruments ([F (1, 24) = 3.68,p=0.067], [F (1, 24) = 0.12,p=0.73] for IVa and IVb respec- repetitions and on the average duration by experiment (4a, 4b, 5, and 6) revealed significant effects of experiment [F (3, 23) = 19.28,p < 0.001and F (3, 23) = 89.93,p < 0.001, respectively]. Bonferroni post-hoc tests confirm that all means are significantly different from each other except the means of Experiment 4b and 5 (for both dependent variables). These two variables relate to each other; the more repetitions a listener wishes to hear, the longer it will take to finish the experiment. 4.5. Perception of asynchronous and unbalanced tones in chords 111

tively), nor did any of the interactions between instrument and the other variables achieve significance. Familiarity with piano sounds can therefore be excluded as having an influence on the perception of loudness of piano sounds. Effects of voice [F (2, 48) = 59.18,εG.G. =0.86,padj < 0.001], [F (2, 48) = 43.08,εG.G. =0.97,padj < 0.001], asynchrony [F (4, 96) = 11.48,εG.G. = 0.84,padj < 0.001], [F (4, 96) = 6.52,εG.G. =0.86,padj < 0.001], and inten- sity [F (4, 96) = 495.69,εG.G. =0.81,padj < 0.001], [F (4, 96) = 349.51,εG.G. = 0.45,padj < 0.001] are significant in both blocks of the experiment. The five inten- sity combinations were rated as expected: the louder the target tone, the louder it was rated. The lowest voice was rated generally higher than the other voices. The range of all ratings was larger when the target tones was in the highest voice. The range of ratings became smaller in the repetition of this experiment. Also the two-way interactions were all significant,18 even if corrected according to the Greenhouse-Geisser correction of violations of sphericity. The three-way interaction of voice × asynchrony × intensity was also significant for both blocks of Experiment IV [F (32, 768) = 3.82,εG.G. =0.39,padj < 0.001], [F (32, 768) = 3.72,εG.G. =0.33 padj < 0.001]. These interactions are plotted separately for the two blocks in Figure 4.13a/b.

Experiment V As in the previous experiment, a repeated-measures ANOVA on the ratings of Experiment V was conducted with voice (upper, middle, lower), asyn- chrony (5-fold), and intensity (5-fold) as within-subject factors and instrument (pi- anist, non-pianist) as a between-subject factor. Again, there was neither a significant effect of instrument [F (1, 24) = 3.37,p=0.079], nor any of the interactions between instrument and the other factors were significant. No systematic effect of instrument could be observed. Thus, the participants did not rate differently when the played piano or any other instrument. The effects of voice [F (2, 48) = 107.72,εG.G. =0.96,padj < 0.001], asyn- chrony [F (4, 96) = 33.63,εG.G. =0.57,padj < 0.001], and intensity [F (4, 96) = 639.15,εG.G. =0.52,padj < 0.001] were all significant, as well as the 2-way inter- actions between the repeated-measures factors.19 Again, the three-way interaction of voice × asynchrony × intensity was also significant [F (32, 768) = 3.57,εG.G. = 0.35,padj < 0.001]. It is plotted in Figure 4.14. All effects were similar to the previous experiment. In contrast to Exp. IV, the range of ratings was larger in all voices, although the lowest voice still did not receive

18 There were significant two-way interactions of voice × asynchrony [F (8, 192) = 4.80,εG.G. = 0.66,padj < 0.001], [F (8, 192) = 5.98,εG.G. =0.63,padj < 0.001], voice × velocity [F (8, 192) = 14.53,εG.G. =0.60,padj < 0.001], [F (8, 192) = 10.23,εG.G. =0.59,padj < 0.001], and asynchrony × velocity [F (16, 384) = 2.93,εG.G. =0.58,padj < 0.001], [F (16, 384) = 4.08,εG.G. =0.57,padj < 0.001] (always for IVa and IVb, respectively). 19 There were significant 2-way interactions of voice × asynchrony [F (8, 192) = 2.50,εG.G. = 0.57,padj =0.040], voice × velocity [F (8, 192) = 15.67,εG.G. =0.48,padj < 0.001], and asyn- chrony × velocity [F (16, 384) = 6.37,εG.G. =0.48,padj < 0.001]. 112 Chapter 4. Perception of Melody

(a) Experiment IVa 80/38 Upper voice Middle voice Lower voice 65/45 7 7 7 50/50 38/55 28/62 6 6 6

** 5 5 5 *

4 4 4

3 3 3 * **

2 ** 2 2 Perceived salience of target tone

1 1 1 −55 −27 0 27 55 −55 −27 0 27 55 −55 −27 0 27 55

(b) Experiment IVb 80/38 65/45 7 7 7 50/50 38/55 28/62 6 6 6

5 5 5

4 ** 4 4

* 3 3 3

2 2 2 Perceived salience of target tone

1 1 1 −55 −27 0 27 55 −55 −27 0 27 55 −55 −27 0 27 55 Asynchrony (ms)

Figure 4.13: Experiment IVa/b. Mean ratings over 26 participants, separately for two blocks of the experiment (a/b), different voices (panels), intensity combinations (different markers), and asynchronies (x axes). The error bars denote confidence intervals of the means on 95% level. The asterisks between adjacent temporal events indicate a significant difference between them accoring to Bonferroni post-hoc tests (∗ p<0.05, ∗∗ p<0.01). 4.5. Perception of asynchronous and unbalanced tones in chords 113

Experiment V 80/38 Upper voice Middle voice Lower voice 65/45 7 7 7 50/50 38/55 28/62 6 6 6

** ** 5 5 5

4 4 4 *

3 3 3

** ** 2 2 * 2 Perceived salience of target tone

1 1 1 −55 −27 0 27 55 −55 −27 0 27 55 −55 −27 0 27 55 Asynchrony (ms)

Figure 4.14: Experiment V. Mean ratings over 26 participants, separately for different voices (panels), intensity combinations (different markers), and asynchronies (x axis). Error bars denote 95% confidence intervals.

very soft ratings. The effect of asynchrony seems to bear more interpretable results, thus this effect will be discussed in the following.

Post-hoc comparisons In order to evaluate whether the temporal effects in the data reflected significant trends, post-hoc tests were performed according to Bon- ferroni (on the three-way interactions of the three repeated-measures ANOVAs re- ported on p. 111 and p. 111). Significant differences between temporally adjacent conditions are indicated in Figure 4.13 and Figure 4.14 with asterisks (∗ p<0.05, ∗∗ p<0.01). Only few of the adjacent timing conditions were significantly different from each other so no conclusive interpretations can be drawn from these tests. In order to test whether anticipation (−55 and −27 ms) changed rating in com- parison to delay (+27 and +55 ms), those asynchrony conditions were linearly con- trasted to each other (asynchrony: +1, +1, 0, −1, −1), separately for each intensity combination in each voice and each block of the two experiments (5 × 3 × 3=45 contrasts). The results of these contrasts are listed in Table 4.2. In Experiment IV, 5 and 6 of these contrasts were significant; in Experiment V, this number increased to 10. Thus, delayed tones tended to sound softer than an- ticipated tones of equal asynchrony and intensity. This trend was stronger in the streaming experiment (V). However, the effect was quite inconsistent. In two cases, the post-hoc comparisons revealed significant effects in the opposite direction: in Experiment IVa, middle voice (38/55) and in Experiment IVb, upper voice (50/50), the +27 ms condition was rated significantly louder than the corresponding simul- taneous condition. 114 Chapter 4. Perception of Melody

Table 4.2: Linear contrasts of asynchrony (+1, +1, 0, −1, −1) between anticipation (−55 and −27 ms) and delay (+27 and +55 ms), separately for each intensity combination in each voice and each block of the two experiments (IVa/b and V). Upper voice Middle voice Lower voice Intensity Fp Fp Fp Exp. IVa 80/38 0.2279 0.6479 0.0660 0.7994 3.3539 0.0795 65/45 0.3179 0.5781 1.9761 0.1726 7.9464 0.0095∗∗ 50/50 0.0667 0.7983 17.0903 0.0004∗∗ 0.0020 0.9649 38/55 4.1516 0.0528 0.0238 0.8786 7.8970 0.0097* 28/62 10.2152 0.0039∗∗ 0.2282 0.6372 6.1162 0.0209* Exp. IVb 80/38 0.0156 0.9005 6.3255 0.0190* 0.0064 0.9368 65/45 6.5128 0.0175∗∗ 0.29710.5907 0.06120.8067 50/50 1.2158 0.2811 9.1419 0.0059∗∗ 0.1064 0.7471 38/55 9.4499 0.0052∗∗ 5.2994 0.0303* 0.6748 0.4195 28/62 23.92410.0000 ∗∗ 3.6664 0.0675 0.6839 0.4164 Exp. V 80/38 4.9993 0.0349* 2.9992 0.0961 2.5730 0.1218 65/45 8.9578 0.0063∗∗ 23.05510.0000 ∗∗ 15.7706 0.0006∗∗ 50/50 10.1842 0.0039∗∗ 1.7277 0.2011 32.0123 0.0000∗∗ 38/55 36.2793 0.0000∗∗ 34.1422 0.0000∗∗ 3.1109 0.0905 28/62 10.4704 0.0035∗∗ 10.5604 0.0034∗∗ 0.2784 0.6026

Effects of chord, transposition, and voice

Experiment IV The previous section examined the influence of tone balance, asynchrony, and position in the chord on the perceptual salience. In this section, the effects of the two types of chords, the three transpositions, and the position of the target tone within the chord on the listeners’ ratings were investigated. These independent variables were introduced to check for effects of intensity for the indivudual samples involved in these experiments. Since for this (and in the next) experiment, the tones for the experiments were chosen from the pool of sampled sounds with respect to their peak sound level (dB) and not according to the MIDI velocity that produced them, it was evaluated here whether the participants rated the loudness of the target tones more with respect to peak sound levels or more with respect to MIDI velocity values. In Figure 2.20 on p. 51, some tones showed (partly considerably) higher sound levels at all dynamic levels compared to others. If a particular sample (A) showed considerably lower peak sound levels for the same MIDI velocity than an adjacent one (B), it could be that listeners rated it (A) louder because the sample involved in the test was produced by a higher MIDI velocity values than the sample from the adjacent note (B). A repeated-measures analysis of variance was conducted on the ratings with 4.5. Perception of asynchronous and unbalanced tones in chords 115

Transposition −1 Transposition 0 Transposition 1

Upper voice 5 5 5 Middle voice Lower voice

4.5 4.5 4.5

4 4 4

3.5 3.5 3.5

3 3 3 Perceived mean salience of target tone Chord 1 Chord 2 Chord 1 Chord 2 Chord 1 Chord 2

Figure 4.15: Experiment IV. Perceived salience (ratings) as an interaction of voices (lines), chords (x axes), and transpositions (panels). Ratings are averaged over the two repetitions of Experiment IV. Error bars denote 95% confidence intervals.

repetition (IVa, IVb), target voice (upper, middle, lower), chord (major, minor) and transposition (−1, 0, +1) as within-subject factors.20 It revealed significant 21 effects of voice [F (2, 50) = 61.1,εG.G. =0.93,padj < 0.001], chord [F (1, 25) = 56.99,εG.G. =1.00,p<0.001], and transposition [F (2, 50) = 10.74,εG.G. = 0.84,padj < 0.001], but no significant effect of repetition [F (1, 25) = 0.05,εG.G. = 1.00,p=0.83]. The interaction of interest between voice, chord, and transposition was highly significant [F (4, 100) = 18.04,εG.G. =0.89,padj < 0.001]. It is plotted in Figure 4.15. The three independent variables—chord, transposition, and target voice—inter- acted significantly indicating that participants perceived the individual tones that produced the tested sonorities as differently loud on average. Participants did not rate the two repetitions of this experiment differently. It can be seen that the lower voice was always rated loudest except for the minor chord transposed one semitone upwards. In that condition it was heard equally loud with the others. It is not clear here, whether the different ratings were context effects (e.g., attention attracted to the highest tone in the context at transposition +1) or due to different subjective intensities of the different piano samples.

Experiment V Similarly to the previous experiment, the effects of voice, chord and transposition were evaluated for Experiment V. A repeated-measures ANOVA was performed on the ratings with target voice (upper, middle, lower), chord (major,

20The other factors (intensity and asynchrony) were averaged out for this analysis in order to reduce the degrees of freedom. 21The adjusted p values are computed according to the Greenhouse-Geisser correction. The corrected degrees of freedom are not reported. 116 Chapter 4. Perception of Melody

Transposition −1 Transposition 0 Transposition 1

Upper voice 5 5 5 Middle voice Lower voice

4.5 4.5 4.5

4 4 4

3.5 3.5 3.5

3 3 3 Perceived mean salience of target tone Chord 1 Chord 2 Chord 1 Chord 2 Chord 1 Chord 2

Figure 4.16: Experiment V. Perceived salience (ratings) as an interaction of voices (lines), chords (x axes), and transpositions (panels). Ratings averaged over the two repetitions of Experiment IV. Error bars denote 95% confidence intervals.

minor) and transposition (−1, 0, +1) as within-subject factors.22 It revealed signifi- cant effects of voice [F (2, 50) = 90.44,εG.G. =0.92,padj < 0.001], chord [F (1, 25) = 4.97,εG.G. =1.00,p=0.035], and transposition [F (2, 50) = 11.14,εG.G. = 0.82,padj < 0.001]. As in Experiment IV, the interaction between voice, chord, and transposition was highly significant [F (4, 100) = 18.4,εG.G. =0.61,padj < 0.001]. It is plotted in Figure 4.16. The results were very similar to the previous experi- ment. Participants gave different average loudness ratings to the different tones of the chords, although the samples of Experiment IV and V were identical (see above). In both Figure 4.15 and Figure 4.16, the lower voice received generally louder ratings as already reported earlier. But especially chord 2 (the minor chord) was rated softer when it was transposed one semitone upwards. This trend might be due to the attribution of intensity. The lower voice in chord 2 transposed one semitone upwards (note number 58) depict higher peak sound level than the lower voice of chord 2, transposition +1 (note number 56). This tendency was reflected in the data, that is, the lower voice in chord 2 was rated softer than the lower voice in chord 1 (always transposition +1). On the other hand, although the upper and the middle voice were the same tones in the two chords (again transposition +1), they were rated considerably different in Experiment V (Figure 4.16). It could be that we observed an effect of chord type that could be explained by stability of the chord tone within the chord (once the third, once the fifth). Since these findings were not in the main focus of this study, further evaluations cannot be advanced here. To summarise this examination of effects of chord, transposition, and voice, par- ticipants rated differently along these independent variables. The effect of voice

22The other factors (intensity and asynchrony) were averaged out for this analysis in order to reduce degrees of freedom. 4.5. Perception of asynchronous and unbalanced tones in chords 117

(lower voice received higher salience) was replicated from the analyses performed above. The effects and interactions of chord and transposition cannot entirely ex- plained here. One possible explanaition was the attribution of intensity through peak sound level instead or MIDI velocity. Especially in the case of chord 2, trans- position +1, and lower voice, there was a possible explanation in Figure 2.20, p. 51 (see above). The question of whether MIDI velocitities or peak sound levels better describe the intensities of recorded sound samples is further examined and discussed in Section 4.7, p. 127.

4.5.4 Conclusion To conclude, the main cue for the perceived loudness of a tone or voice was intensity; the effect of temporal shifting was relatively small and inconsistent. Synchrony became relevant only when intensity was absent as a cue (voices equally loud) or when the target tone/voice was very soft. In the latter case, anticipation helped to overcome the spectral masking that occurred when the tones were simultaneous. Lower tones or voices were generally rated higher than upper voices. This finding might also be explained by spectral masking: lower tones mask higher tones more than vice versa. There was a small trend of direction of asynchrony. An early tone received slightly higher salience ratings than a delayed one. The ratings of the musicians were independent of whether they had piano as their main instrument or another. The already small temporal effects (of Experiment IV) were marginally reinforced through streaming in Experiment V. 118 Chapter 4. Perception of Melody

4.6 Asynchrony versus relative intensity as cues for melody perception: Excerpts from a ma- nipulated expressive performance (Experi- ment VI)

4.6.1 Background This is the last of a series of experiments on the perceptual salience of individual voices in multi-voiced musical contexts. The aim here is to test and replicate previous findings in a real music situation. Manipulated performance files with an artificially added pedal track are played back on the B¨osendorfer computer grand piano and presented to listeners via headphones. They had to judge the relative loudness of two selected voices that were varied in intensity (in terms of MIDI velocity units) and asynchrony (shifted back and forth in time) as in the previous experiments. This experiment was inspired by and designed similarly to Palmer’s fourth ex- periment (Palmer, 1996, Exp. 4, pp. 46–51). Palmer used the theme of the last movement of Beethoven’s piano sonata op. 109. Palmer tested four melodic in- terpretations (exaggerated lower voice, lower voice, upper voice, and exaggerated upper voice) of this Beethoven theme performed by one pianist. The four melodic interpretations resulted in increases in melody lead and intensity of the two voices in question: bass and soprano (see Palmer, 1996, p. 41). She presented these four versions to musically trained listeners in three conditions: (1) with all of these cues removed (without timing and intensity), (2) with timing only (intensity removed), and (3) with timing and intensity (original performances). She did not test a con- dition with intensity variations only (timing removed), which would possibly have affected her findings significantly. The listeners indicated which of the two voices in question was the melody as intended by the performer on a 6-point scale from 1 (“Very sure it’s the upper voice”) to 6 (“Very sure it’s the lower”). She found no difference between condition 1 and 2, except for expert pianists. They detected the condition with melody lead only (2) slightly better than the neutral one (1). In the current experiment, all combinations of asynchrony (melody lead and lag) and intensity differences were examined. From the previous experiments, we learned that asynchrony hardly changed the perceptual salience of a voice, regard- less of whether by shifting it forwards or backwards in time. The hypothesis in this experiment was the same: temporally shifting the melody is only a minor cue for detecting it as being the melody (most salient voice), while differences in intensity (timbre) are the main criterion. A short excerpt of a piece by Chopin was used (around 20 seconds) for the experiment. In addition to the melody (the highest voice), a middle voice was manipulated which is a voice that is not particularly em- phasised in a normal musical interpretation (see also Chapter 3, p. 57). The artificial versions were computed from a single performance with all other cues as articulation, pedalling, and expressive timing held constant over all stimuli conditions. 4.6. Asynchrony versus relative intensity as cues for melody perception 119

60 50 40 30

MIDI velocity 20

0 1 2 3 4 5 6 7 8 9 j j j j b 6 œ œ œ œ œ œ œ œ œœœ œœ œ. œœœœ œ œ j j œ j œ œ & 8 1 J J J J 1œ œ œ œ œ œ œ 1œœ œ . œ œ œ œ œ. œœ œ œ. œœ œ. œ œ. 3 J J J œ 2œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ sotto voce 3 J J J J J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ j j œ œ œ œ œ ? 6 4 J J J J 4œ œ œ œ œ œ œ J œ œ œ œ œ œ œ œ œ œ œ œ œ œ œ b 8 5 J J J œ 4œ œ œ œ œ œ œ œ œ œ œ œ œ œ J J 5 œ J J J J J J J

1250 1000 750

IOI (ms) 500 250 0 0 1 2 3 4 5 6 7 8 9 Score time (bars)

Figure 4.17: Average velocity profile (top panel) and inter-onset intervals (IOIs, bottom panel) of the melody (first voice) of the initial bars of Chopin’s Ballade op. 38 as performed by Pianist 5 against score time in bars (see Section 3.3.1, p. 60). These profiles served as baselines for the artificially generated performances of Experiment VI. The numbers in the score against some note heads indicate voice numbers. (Note that the IOI graph just displays the time intervals between pairs of adjacent melody tones in milliseconds without any correction regarding their nominal length in the score.)

4.6.2 Method

Stimuli

The first 9 bars of Chopin’s Ballade, op. 38 (F major) were chosen to be the test excerpt. As the two possible melodic interpretations to be tested, the first and the third voice were selected (see voice numbering in Figure 4.17). To avoid too arti- ficially sounding performances, one expressive performance of this piece was taken and modified in order to control the experimental conditions. The timing profile stemmed from the expressive timing of the melody (highest voice) of Pianist 5’s performance of that piece.23. The intensity profile was calculated from the dynamic profile of the melody of the same performance (in terms of MIDI velocity units), but reduced in loudness by half of the average distance between melody and accom- paniment. These profiles are plotted in Figure 4.17. They served as the baseline from which the test stimuli were calculated and then played back on the B¨osendorfer computer-controlled grand piano.

23The recording session is described in detail in Section 3.3.1, p. 60 This performance was chosen because it was highly rated in informal listening tests by several musically trained listeners. 120 Chapter 4. Perception of Melody

Dynamics Ballade (Bars 1−9) 70 Normal 1st emphasised 3rd emphasised 60

50

40

MIDI velocity units 30

20

10 1 2 3 4 5 Voices

Figure 4.18: Average dynamics of the first 9 bars of the Ballade performed by 22 pianists (see Chapter 3) separately for the different voices. Error bars denote standard errors of the means. The mapping between MIDI velocity units and (final) hammer velocity (m/s) is as in Section 4.4.2, p. 96.

The two designated melodies were shifted in time back and forth, but other as in the previous experiments, the two melodies were only increased in loudness, and not decreased. The increments in MIDI velocity were obtained from measurements of 22 expressive performances of that piece (see Chapter 3, p. 57). The average dynamic levels in MIDI velocity units are plotted in Figure 4.18 separately by the different voices (compare with Figures 3.5, p. 65 and 3.7, p. 68, respectively). The melody (upper voice) in the ‘normal’ condition (without any specific instruction) was played 12 MIDI units louder than the middle voices (voice 2 and 3), and 24 MIDI units louder in the emphasised condition. When the middle voice (voice 3) was asked to be played strongly emphasised, it was played only 20 MIDI units louder than the middle voices. The left hand (voice 4 and 5) was always about 10 MIDI units softer than the middle voices. According to these data, the following loudness combinations were chosen for the melody voice (0, +12, +24 MIDI velocity units), and for the middle voice (0, +10, +20). The accompaniment (voice 4 and 5) were set constantly to −10 MIDI velocity units. All these velocity values were relative to the expressive loudness profile of Pianist 5 (see Figure 4.17). Parallel to this, the timing was calculated relative to the timing profile as displayed in Figure 4.17. The manipulations began at the beginning of the second bar (the opening unisone octaves were not manipulated). Thus, the experimental design was as follows: 2 voices (upper and middle) × 3 loudness combinations (0, +12/+10, +24/+20 MIDI velocity units) × 5asyn- 4.6. Asynchrony versus relative intensity as cues for melody perception 121

Upper voice (1) Middle voice (3) Palmer, 1996, Exp. 4 24/0/−10

12/0/−10

0/0/−10

0/0/−10

0/10/−10

0/20/−10

MIDI velocity combinations (Upper/middle/rest) −54 −27 0 27 54 Asynchrony (ms)

Figure 4.19: Experiment VI: Test design schema. Two melodic interpretations: upper voice (triangles) and middle voice (circles), three velocity levels of the ‘melodies’ and five asynchronies. Only combinations on the axes and their diagonals are included in the experiment. The grey area sketches the test design of Palmer’s 4th experiment (Palmer, 1996). However, her design is not directly comparable to the present, because she had 4 interpretations × 3 conditions. chronies (−55, −27, 0, 27, 55 ms) = 30 combinations. To reduce the amount of stimuli, only orthogonal and diagonal combinations of asynchrony and intensity were included, resulting in 22 combinations (see Figure 4.19). The two combina- tions without asynchrony and loudness variation were identical, but were left in the design for symmetry reasons. The duration of one stimulus is 22 seconds. The note durations of each note was set to 75% of the corresponding IOI. To give an impression of legato throughout the whole excerpt, an artificial pedal track was added to the computed MIDI files. The pedal was programmed to be released first with the onset of each chord and re-depressed again sufficiently before (150 ms) the corresponding note off, except if harmony remained constant (see Figure 4.20). The locations of pedal change were determined by the author and occurred parallel with changes in harmony. This kind of pedalling is called syncopated or legato pedalling (Repp, 1997b). The individual pedal changes (represented in MIDI control values between 0 and 10024) are modelled by a sine curve from 1/2π to 3/2π for a pedal press and 3/2π to 5/2π for a pedal release within a time period of 160 or 240 ms for each change, respectively. This time period was informally taken from expressive

24In standard MIDI, the conceptual range of the right pedal is from 0 (released) to 127 (fully depressed). At the B¨osendorfer system, it has 256 steps from 0 to 255. In our case, it was sufficient to press the pedal up to a value of 100. 122 Chapter 4. Perception of Melody

6q010101

120

100

80 81 78 20 20 23 23 26 25 32 64 65 70 75 8179 77 77 40 41 46 51 57 55 54 7 53 53 20 20 23 23 26 25 32 30 31 36 41 47 45 44 60 43 43 30 31 36 41 47 45 44

40 MIDI pitch / Pedal velocity

20

0 0 1 2 3 4 5 6 7 8 9 Time (seconds)

Figure 4.20: Piano roll display of an excerpt of a stimulus with the first voice shifted forwards in time (−55 ms) and increased in velocity (+24 MIDI velocity units, represented here by darker color). The individual MIDI velocity values are printed on top of each tone. The continuous line indicates the artificial (right) pedal track, where 0 means released and 127 fully depressed. performances of the same piece. Asynchrony and intensity variation started always in the second bar, so that the introductory octaves on C remained unchanged in all conditions. The generated MIDI files were converted into the B¨osendorfer file format triple (“.kb,” “.lp,” and “.sp” files) and played back on that device.25 The acoustic recording was accom- plished with the identical setup and equipment as in Section 4.5.2. The microphones were placed at an imagined player’s position in front of the keyboard.

Procedure The experiment was carried out with the same graphical user interface as in the previous experiments (see Figure 4.21). After a short training period (3 stimuli had to be rated ‘correctly’ before proceeding to the experiment), where the participant got familiar with the stimuli, they heard the 22 stimuli in random order with the same headphones as above (Sennheiser HD 25–1), but in full stereo quality. Due to the ORTF recording technique, a very elaborated spatial impression emerges in the listener’s mind. The participants saw the music score of the Chopin excerpt with the two voices (voice 1 and 3) marked in colour (red and blue). They were asked to judge the prominence of the two voices by answering the question: “Which voice attracts your attention more?” Answers were allowed from 1 (“very much the lower one” to 7 (“very much the upper one” via 4 (“the two melodic interpretations sound equally important to me”). The background colour of the text boxes varied

25The recording took place on January 9, 2003 at the B¨osendorfer company in Vienna on the 290–3 SE grand piano. 4.6. Asynchrony versus relative intensity as cues for melody perception 123

Figure 4.21: Experiment VI: Screen shot of the graphical user interface used for this ex- periment (The counting in the experimental session was different from the counting in this thesis). The two voices are indicated by colour (red, blue), the rating scale correspondingly converged between these two colours. according to the colours used for indicating the two voices. Each stimulus could be repeated an infinite number of times.

4.6.3Results and discussion As reported in Section 4.5.3 (see also Figure 4.12, p. 110), the participants indicated this experiment to be easiest due to its comparatively naturally sounding stimulus material. They needed 8:10 minutes (s.d. = 2:15 minutes) to accomplish it, while repeating each stimulus 1.23 times on average (s.d. = 0.22). However, two par- ticipants (cello, piano) found this experiment to be the most difficult of the test session. The mean ratings are plotted in Figure 4.22 separately for the two voices (panels), three intensity combinations (different markers), and five asynchronies (x axes). The asterisks next to the error bars indicate significant differences from the neutral rating according to t-tests for single means (∗ p<0.05, ∗∗ p<0.01). The conditions with the two voices equally loud were rated significantly lower than a rating of 124 Chapter 4. Perception of Melody

Upper voice Middle voice 7 7 +20 MIDI units n.s. * ** ** +10 MIDI units 6 ** 6 0 MIDI units

n.s. ** ** 5 ** 5 ** ** ** ** 4 4 n.s. n.s. n.s. n.s. n.s. n.s. * n.s. ** n.s. 3 3 ** ** ** n.s. n.s. ** ** 2 +24 MIDI units 2 ** Rated salience of upper voice +12 MIDI units n.s. n.s. 0 MIDI units ** ** 1 1 ** −55 −27 0 27 55 −55 −27 0 27 55 Asynchrony Asynchrony

Figure 4.22: Experiment VI. Mean ratings over 26 participants, separately for different voices (panels), intensity combinations (different markers), and asynchronies (x axes). Er- ror bars denote confidence intervals on 95% level. The asterisks next to the error bars mean significant differences to a rating of 4 (“The two melodic interpretations sound equally im- portant to me”) according to t-tests for single means (∗ p<0.05, ∗∗ p<0.01). Results of Bonferroni post-hoc tests are marked between adjacent asynchrony conditions either by asterisks or ‘n.s.’ (non significant). Significant non-adjacent asynchrony conditions are marked, if they were significant (here, only in the right panel). four (“They sound equally important to me”) in the simultaneous conditions and when the middle voice appeared earlier than the upper. As expected, all the other intensity conditions differed from a rating of four significantly. It is unclear whether the middle voice was in fact louder than the upper voice (even though the two were equally loud in terms of MIDI velocity units) or whether musically trained participants expected the upper voice (melody) to be louder in this musical context and therefore considered the middle voice to be louder when this expectation was violated. The intensity combination with the two voices equally loud was tested in a combined two-way repeated-measures ANOVA with asynchrony (5) and voice (2) as within-subject factors and instrument (pianist versus non-pianist) as a between subject factor. There was a significant effect of voice [F (1, 24) = 25.6,εG.G. = 1.00,p<0.001] and an significant interaction between voice and asynchrony [F (4, 69) = 8.36,εG.G. =0.91,padj < 0.001], but no significant effects of asyn- chrony alone [F (4, 96) = 2.04,εG.G. =0.85,padj =0.1065], nor of instrument [F (1, 24) = 1.8,p=0.1923].26 Post-hoc tests revealed that all differences in rating

26None of the (2-way or 3-way) interactions between instrument and asynchrony or timing gained statistical significance: voice × instrument [F (1, 24) = 3.61,p=0.0696], asynchrony × instrument [F (4, 96) = 0.61,p=0.6507], voice × asynchrony × instrument [F (4, 96) = 0.21,p=0.9318]. 4.6. Asynchrony versus relative intensity as cues for melody perception 125

in the first voice did not differ significantly from each other. In the middle voice, only the −55 ms and the −27 ms conditions showed a significant difference from the +27 ms condition, and −55 ms from +55 ms condition (as sketched in Figure 4.22, right panel). All adjacent asynchrony conditions were non-significant at the middle voice condition. As in the previous experiments, linear contrasts were calculated on anticipation and delay (asynchrony: +1, +1, 0, −1, −1) for the two voices separately. They both showed significant effects. The upper voice [F (1, 24) = 6.48,p=0.0178] and the middle voice [F (1, 24) = 36.18,p<0.001] showed significantly different ratings between anticipation and delay. In both voices, anticipation enhanced and delay attenuated the loudness ratings when the two voices were equally loud. Due to the reduced experimental design, four repeated-measures analyses of vari- ance were conducted on the ratings with asynchrony (3) as within-subject factors and instrument as a between subject factor separately for the intensity combinations +12/+10 and +24/+20. The results of the two ANOVAs concerning the upper voice conditions (see Figure 4.22, left panel) yielded effects of asynchrony,27 but no effects of instrument.28 Post-hoc tests according to Bonferroni depicted the delayed voices to be rated significantly higher than the simultaneous conditions, but not the op- posite (see Figure 4.22). The anticipation of the melody was not rated significantly different from the simultaneous condition. On the other hand, the results of the two ANOVAs concerning the middle voice conditions (Figure 4.22, right panel) yielded neither significant effects of asyn- chrony,29 nor any effects of instrument.30 If the middle voice was equally loud in terms of MIDI velocity units, it was rated tendencially as being more prominent than the (equally loud) upper voice. The anticipation of it could enhance its prominence only by comparison to the delayed conditions. But if the hammer velocity of the middle voice was increased, the already small effects of asynchrony ceased away. Then, it was only the loudness of an already dominant middle voice that controlled its perceptual salience. During the setup of the experiment, the author followed average measurement results of several recordings (see Figure 4.18, p. 120). While listening to the stimuli as played back by the B¨osendorfer, the left hand always sounded very loud. This

27Effects of asynchrony for the +12 condition [F (2, 23) = 3.64,p<0.05] and for the +24 con- dition [F (2, 23) = 3.6,p<0.05]. 28Effect of instrument in the +12 condition [F (1, 24) = 0.35,p=0.56] and in the +24 condition [F (1, 24) = 0.45,p=0.51]. Only in the +12 MIDI velocity units condition, the interaction between asynchrony and instrument revealed that pianists rate the simultaneous condition slightly lower than the others in comparison to the non-pianists. 29Effects of asynchrony for the +10 condition [F (2, 23) = 1.02,p=0.38] and for the +20 con- dition [F (2, 23) = 0.14,p=0.87]. 30Effect of instrument in the +10 condition [F (1, 24) = 2.03,p=0.17] and in the +20 condition [F (1, 24) = 0.55,p=0.47]. Only in the +10 MIDI velocity units condition, there was a significant interaction between asynchrony and instrument [F (2, 23) = 4.22,p<0.05], but post-hoc tests did not disclose any significant effects to interpret. 126 Chapter 4. Perception of Melody

corresponded also what participants reported in personal communication after the experiment or in the questionnaire. They expected the upper voice to sound even louder sometimes whereas they found the middle voice always too dominant and the left hand too strong. This was surprising taking into consideration that the middle voice was emphasised only in steps of 10 MIDI velocity units instead of steps of 12 units in the upper voice. In addition to that, the ratings were not asymmetric at all (e.g., going only down to 2, but up to 7). Outer voices tended to receive greater perceptual attention (Palmer and van de Sande, 1993; Palmer and Holleran, 1994; Repp, 1996c), but they also required expressive emphasis to fulfill perceptual familiarities of musically trained listeners. Evidence was given here only for an upper voice, the behaviour and expectancies for a bass voice might be very different. The loudness of all the voices followed the average velocity profile with a fixed distance (in terms of MIDI velocity units). A human performer is definitely more flexible in timbral shaping of the individual voices than this coarse algorithm. The pedal used in the stimuli continuously also might be responsible for enforcing the lower voices more than the upper. It might be interesting for future performance rendering approaches that lower voices need much less emphasis than those of higher pitch, especially when the right pedal is involved. It can be summarised here that the effects of asynchrony were small compared to the effects of intensity. When intensity was missing as a cue, anticipation could lead to a slightly enhanced perception of a voice (in our data more in the middle than in the upper voice), but when the voices were played louder, asynchrony became a minor cue (especially in the middle voice). But still, there were effects of asynchrony in the data. (1) When the two voices were equally loud, anticipation increased the ratings significantly in comparison to delay in both the upper and the middle voice. (2) In contrast to findings of the previous experiments, delay was more taken as a cue for attraction than anticipation in conditions with an emphasised upper voice . The typical Melody lead condition (−27 ms and +12 MIDI velocity units in the upper voice) was not significantly rated differently from the simultaneous condition, but the corresponding delay was! No analogous effect was observed in the middle voice. This experiment went the opposite way of obtaining the stimulus material than Palmer (1996). She reduced original expressive performances by a professional pi- anist by excluding particular cues stepwise. It can be assumed that other cues such as articulation and pedalling also varied with the different melodic interpretations and thus also served as cues for melody detection. In the present study, the ‘ex- pressive’ cues to test were added to a prototypical expressive baseline (derived from one professional pianist). With the present procedure, it was possible to exclude all other possibly influencing factors such as articulation and pedalling. Nevertheless, small trends of timing could be found in the present study. 4.7. Model 127

4.7 Model

4.7.1 Introduction In this section, we compared the results of the last three experiments featuring dif- ferent experimental design in order to develop a comprehensive theory to explain the data. The final question of this study is to evaluate the relative influence of each of the varied expressive cues on the listeners’ ratings. To this end, multiple regres- sion models were fitted to the data of Experiment IVa, IVb, V, and VI, separately. The stimuli of Experiment IV and V shared the same design; Experiment VI was slightly different. The models to be developed in this section assumed that the in- dependent variables related directly to the perception of salience by the participants as represented in their ratings.

4.7.2 Input of the models The input of the models consisted of the independent variables by which the stim- uli were created and the responses by the participants. For Experiments IVa, IVb, and V, voice (1: upper, 2: middle, 3: lower), (signed) asynchrony (−55, −27, 0, 27, 55 ms), and intensity (five levels from 1: 28/62 to 5: 80/38 MIDI velocity units, see Figure 4.13, p. 112 and Figure 4.14, p. 113) were the relevant variables by which the stimuli were created. One research question was whether or not anticipation of tones had the same effect on their perceived salience as delay. Therefore, another independent variable was introduced that referred to the amount of asynchrony independently of direction (unsigned asynchrony); it was the absolute value of asyn- chrony (55, 27, 0 ms). The intensity values in these experiments were chosen from the database of sound samples with respect to their peak sound level (dB) and not according to the MIDI velocity that produced a given tone. This decision was based on the results of Experiment I where listeners adjusted piano tone pairs equally loud that were roughly equal in peak sound level.31 In order to evaluate how much the hammer velocities contributed to the listeners’ ratings, the (absolute) MIDI velocities of the rated tones were introduced as alternative independent variable of dynamics (MV, with values between 0 and 127). The two independent variables referring to the dynamics of the stimuli were examined in separate models per experiment. Multiple regression models were fitted separately onto the ratings32 of Exper- iment IVa, IVb, and V with voice, asynchrony, unsigned asynchrony, and either

31In the stimulus material that a certain velocity combination was sometimes composed with tones produced by slightly different MIDI velocities on different pitches; e.g., 50/50/50 MIDI veloc- ity units corresponding to −17.6/−17.6/−17.6 dB–pSPL were realised at f#–d#–b (chord 1, trans- position −1, see Figure 4.9, p. 106) with 56/64/68 MIDI velocity units, at g–e–c with 72/58/68, and at ab–f–db with 84/66/78. 32The ratings were averaged across the two chord types, but not across participants to include the entire between-subject variance. 128 Chapter 4. Perception of Melody

intensity, or MIDI velocity (MV) as predictor variables, resulting in six separate regression models for these experiments. For Experiment VI the model design was slightly different. Voice (1: upper, 2: middle), signed asynchrony (as before), un- signed asynchrony (as before), and intensity (1: 0, 2: +12/+10, 3: +24/+20 MIDI velocity units, see Figure 4.22, p. 124) served as independent variables.33 The rat- ings of Experiment VI (Ro) ranging from 1 (“the lower voice attracts my attention more”) to 7 (“the upper”), with 4 (“they both sound equally loud to me,” see Figure 4.21, p. 123) were modified for the model so that perceived equality of the two voices was zero at the rating scale and the maximum perceived attraction to a specific voice was 3, irrespective of voice [Rating new = abs(Rating old − 4)]. This modification was considered to account for the asymmetry of the rating scale. Thus, the models had the form

4 Rating = I + Bi · Vi, (4.3) i=1 with I being the intercept, Bi the individual factors for the independent variables 34 as listed in Table 4.3, and Vi the variable numbers as described above.

4.7.3Results and discussion The results of the multiple regression models with intensity as the fourth indepen- dent variable are listed in Table 4.3a, the results for those models with MIDI velocity in Table 4.3b.

ExperimentsIVa,IVb,andV Intensity All models for Experiments IVa, IVb, and V managed to explain more than 76% of the rating data. In these six models most of the variance was explained by the two intensity variables (with β values of 0.87–0.90, while the largest β value of the other independent variables was 0.11). The models involving MIDI velocity as the fourth independent variable showed slightly higher R2s for Exp. IVa and Exp. V by comparison to those involving intensity (although the difference might be insignificant), but almost equal values in Exp. IVb. Thus, in Exp. IVa and V, the MIDI velocity numbers explained the results slightly better than a simple numbering of the five intensity conditions. This was not true for the model fitted onto the ratings of Exp. IVb. However, the two loudness variables were correlated highly (r =0.97∗∗) to each other, suggesting that the diverse attribution of intensity once by peak sound levels and once by MIDI velocity values did not make a great difference in the results. 33In this experiment, intensity was controlled by MIDI velocity anyway, so no alternative variable had to be introduced. 34Experiments IVa, IVb, and V used once intensity and another time MIDI velocity as the fourth independent variable. 4.7. Model 129

Table 4.3: Results of the multiple regression models fitted onto rating data of Experi- ments VIa/b, V, and VI (see p. 112 and 113). The fourth independent variable was either intensity (a) or MIDI velocity (b). Highly significant independent variables (p<0.01) are indicated by ∗∗.

(a) Exp. IVa Exp. IVb Exp. V Exp. VI N = 1950 N = 1950 N =1950 N =572 F (4, 1945) = 1861.0 F (4, 1945) = 1607.7 F (4, 1945) = 2278.4 F (4, 567) = 185.7 p<0.001 p<0.001 p<0.001 p<0.001 R2 =0.793 R2 =0.768 R2 =0.824 R2 =0.567

Vi βB βB βB βB Intercept 0.6768∗∗ 1.0718∗∗ −0.0074∗∗ −0.7178∗∗ 1Voicea 0.1104 0.2119∗∗ 0.1130 0.1977∗∗ 0.1138 0.2492∗∗ 0.1293 0.2587∗∗ 2 Signed asynchrony −0.0373 −0.0015∗∗ −0.0339 −0.0015∗∗ −0.0792 −0.0037∗∗ −0.0533 −0.0014 3 Unsigned asynch. 0.0312 0.0024∗∗ 0.0189 0.0013 0.0259 0.0022∗∗ 0.0932 0.0043∗∗ 4Intensityb 0.8822 0.9774∗∗ 0.8681 0.8469∗∗ 0.8968 1.1341∗∗ 0.7315 0.8785∗∗

a For Exp. IVa, IVb, and V: 1, 2, 3; for Exp. VI: 1, 2. b For Exp. IVa, IVb, and V: 1, 2, 3, 4, 5; for Exp. VI: 1, 2, 3.

(b) Exp. IVa Exp. IVb Exp. V N = 1950 N = 1950 N =1950 F (4, 1945) = 2031.5 F (4, 1945) = 1562.3 F (4, 1945) = 2499.8 p<0.001 p<0.001 p<0.001 R2 =0.807 R2 =0.763 R2 =0.837

Vi βB βB βB Intercept −0.3412∗∗ 0.0238 −1.3233∗∗ 1 Voice 0.0964 0.1849∗∗ 0.1108 0.1938∗∗ 0.1086 0.2379∗∗ 2 Signed asynchrony −0.0096 −0.0004 −0.0071 −0.0003 −0.0632 −0.0029∗∗ 3 Unsigned asynch. 0.0395 0.0030∗∗ 0.0321 0.0022∗∗ 0.0104 0.0009 4 MIDI velocity 0.8907 0.0579∗∗ 0.8656 0.0532∗∗ 0.9043 0.0698∗∗

These results cannot suggest that participants tended to rate the loudness of the stimuli more according to their underlying MIDI velocity, because the differences in the R2 values were too small to draw any conclusions on it. Moreover, the model of Experiment IVb did not show larger R2 values for MIDI velocities. The evidence of Experiment I, where participants adjusted the perceptual equal-intensity baseline of two simultaneous tones according to their peak sound levels, is still more convincing. Nevertheless, it can be concluded that in all for models intensity explained the major part (86–90%) of the ratings.

Unsigned asynchrony The models of Experiment IV revealed unsigned asyn- chrony as a significantly contributing variable in four out of six models indepen- dently of whether the target tone was before or after the other tones. This indicated that the further the target tone was shifted away from the chord the louder it was rated (according to the positive sign of the B values). The importance of this vari- able changed considerably with model. With MIDI velocity, it was significant for Exp. IVa/b, but not for Exp. V, while with intensity, this picture was different (Exp. IVa and V significant, while Exp. IVb not). This inconsistent behaviour over 130 Chapter 4. Perception of Melody

different intensity conditions meant that with this independent variable no definite conclusion is possible.

Signed asynchrony Asynchrony was significant in all models that included inten- sity as independent variable, but explained only a very small portion of the ratings (3–7%, respectively, see Table 4.3). The negative sign of β denoted an increase in the perceived salience for anticipated target tones, and an attenuation for delayed ones. The slope of this effect was very small: at −55 ms it attenuated the ratings by −0.083 in Exp. IVa/b and −0.204 in Exp. V. The models with MIDI velocity as dynamic reference exhibited asynchrony as significantly contributing in Exp. V, but not in Exps. IVa/b. Although the two model approaches delivered diverse results, they were not contradicting. Both approaches confirmed that streaming as intro- duced in Experiment V increased the effect of relative timing of the target tone, however, only to a very small degree.

Voice The position of the target tone in the chordal context played a slightly more important role. In the models of Experiments IVa, IVb, and V, the lower voice received louder ratings (e.g., the ratings of the lowest voice were higher by 0.42 in Exp. IVa and 0.5 in Exp. V by comparison to the highest voice for the first set of models). This finding coincided with the masking hypothesis: the lower tones tend to mask higher tones more than in the opposite direction so that the higher tones are perceived softer as they would be when presented alone, assuming that the tones were equally loud.

Experiment VI The model fitted onto the data of Experiment VI also favoured intensity as the most important contributing variable (with a β value of 0.73 versus 0.13 of voice). Moreover, voice as well as unsigned asynchrony were important predictors for the listeners’ ratings. The independent variable voice had a positive B value. This indicated that with voice = 1 (upper voice), the prediction are multiplied with 0.2587 and with voice = 2 (middle voice) with 0.5174, with the effect that all other three (or at least the other two significant independent) variables had a greater (about 25%) impact on the ratings in the middle voice than in the upper voice. This could mean that listeners are more sensitive to the same amount of expressive changes of relative timing and intensity in the middle voice than in the upper voice, where they usually expect these expressive variations. The positive coefficient for unsigned asynchrony denoted that, irrespective of direction, asynchrony helped to attract listeners’ attention to a particular voice. The further away the voice from the accompaniment, the more attracting was it rated by the participants. However, the B value of this effect was small. The regression model of this experiment did not specify asynchrony as a significantly contributing independent variable. This result was somewhat surprising, because especially in a 4.7. Model 131

realistic musical context the effect of streaming and asynchrony was expected to be more prominent than in the preceding experiments (confer to Section 4.6, p. 118, as well as to Palmer, 1996, Exp. 4). 132 Chapter 4. Perception of Melody

4.8 General discussion

This chapter focussed on two basic questions: the first investigated what amount of asynchrony can be detected as such by listeners and whether this threshold is depen- dent on the type of tone involved. The second tackled the influence of relative onset timing on the perceptual salience of particular tones in chordal musical textures. In a series of three experimental sessions with a total of seven experiments, these questions were investigated in various different conditions. Asynchrony perception was tested in the pilot experiment and Experiment III, while the other question of the pilot experiment, as well as Experiments II, IV, V, and VI focussed on salience perception. Listeners could tell the correct order of two tones of a piano dyad for asynchronies greater than 30–40 ms. This threshold decreased marginally the more artificial the stimuli became (synthesised piano, sawtooth, pure tones, cf. Section 4.3, p. 87). This threshold was somewhat larger than found in other studies with artificial stimuli (e.g., Hirsh, 1959). The more striking result came from Experiment III (Section 4.4.4, p. 100), where the two tones of the dyads were also manipulated in intensity. In this experiment, participants tended to perceive the two tones as synchronous even with asynchronies as large as 55 ms, provided that the earlier tone was also louder. This asymmetry was stronger for more complex sounds (sawtooth, real piano) than for pure tones. It is still unclear whether masking phenomena (i.e., forward masking) and/or familiarity with piano sounds were responsible for this asymmetry in the ratings. Listeners noticed and detected asynchrony only in unfamiliar combinations of rel- ative intensity and asynchrony (e.g., early and soft tone). But, if familiarity with piano sounds were important for this effect, why did the non-pianists in this study not rate these stimuli significantly differently, since they were expected to have less acquaintance with piano sounds and typical combinations of loudness and timing? The other explanation is more psychoacoustic in nature: listeners assign simultane- ity to early–loud combinations, simply because they cannot or only hardly hear the onset of the second and weaker tone. This effect is best explained by forward masking (Zwicker and Fastl, 1999), in which a louder tone attenuates the hearing threshold of a following (softer) tone within a time interval that is comparable to the asynchrony of typical melody leads (some tens of milliseconds). However, mask- ing phenomena are too complex especially in real piano sounds to be predicted by existing models for the present stimulus material. Therefore, it was hard to state any more precise assumptions about the extent of temporal and spectral masking that occurred in the present stimuli. To conclude, in the light of this experiment the melody lead phenomenon, where the onsets of a more intense melody temporarily precede the onsets of the softer accompaniment, has to be interpreted differently. Recall that Hirsh (1959) found asynchronies of the order of 20 ms to be easily perceived as asynchronous; even the order of the two tones could be determined by the listeners at that threshold. With 4.8. General discussion 133

the results of Experiment III, it seems now evident that asynchronies of the order of some 30 ms—as the melody lead phenomenon typically exhibits—are not heard as being asynchronous by musically trained listeners. The second question focussed on how perceived salience may be altered with relative onset asynchrony. Five experiments were exclusively dedicated to this ques- tion. Their stimulus material became acoustically and musically more and more complex. It developed from equally loud dyads, over dyads with intensity varia- tion, three-tone chords, and sequences of three-tone chords, to a short excerpt of a piece by Chopin. The fundamental result of all five experiments was the same: asynchrony had only a small and inconsistent effect on the perceived salience on the tone or voice in question. There were some effects in some experiments, but some- times they contradicted results just found in the preceding experiments. Generally, the perceived salience depended primarily on differences in relative intensity of the tones, while asynchrony altered it only marginally and sometimes inconsistently. Effects of relative onset timing became relevant, when the target tones or voices were softer than the other chord tones so that they were masked by them. Masking occurred when the tones were simultaneous and when the softer target tone came late. In these cases, early tones (anticipation) helped to overcome masking. This masking explanation was also consistent with the finding that the perceived salience was greatest (thus masking attenuation lowest) for a lower target tone and weakest for a higher target tone (a lower tone masks a higher more than in the opposite direction, cf. also to the linear contrasts between anticipation and delay as listed in Table 4.2, p. 114). In general, anticipation of the tones slightly increased their salience ratings while delay attenuated it. This trend was reflected also in the models fitted onto the rating data. The asymmetry between early and late was greater, and thus the slope of the line of best fit was steeper in Experiment V than in Experiment IV (see Table 4.3, p. 129). This evidence is explained with the chord repetitions introduced to Experiment V. This finding was consistent with the streaming hypothesis as advanced by Bregman and Pinker (1978); however, its effect was small relative to the effect of intensity. On the basis of this explanations, it was surprising that this effect could did not become stronger with the real music excerpt by Chopin in Experiment VI. There, delay as well as anticipation tended to enhance perceptual salience about equally (effect of unsigned asynchrony). It might be that in a real music situation other performance parameters as, e.g., articulation or pedalling play an important role in specifying a melody voice. Thus, asynchrony would help to perceptually separate the voices, but only in conjunction with the other performance parameters. The stimuli for this last experiment were deliberately based on a single expres- sive performance by one pianist, that is, all other performance parameters were held constant over the stimulus conditions (see Section 4.5.2, p. 106). This procedure was opposite to Palmer’s approach that removed individual performance parame- ters (timing, intensity) from different expressive performances with different melodic 134 Chapter 4. Perception of Melody

intentions (cf. Palmer, 1996, Exp. 4). She found that pianists could slightly better identify the correct melodic intention with the intensity cues removed from the stim- ulus performance (timing only) than non-pianists. Since in her stimuli expressive parameters as articulation and pedalling changed over condition, this weak effect does not necessarily depend on relative onset timing. The present study with a design focussed exclusively on the investigated param- eters showed that relative timing alone did not have a consistent effect on perceived salience. It follows that asynchrony might play an important role in expressing mu- sical intentions of the performers only in combination with the above mentioned other performance parameters (in order to explain the results of Palmer, 1996). One difficulty in the present experiments was the issue of how to measure the loudness of piano sounds. An intensity baseline adjustment experiment (Experi- ment I, p. 96) showed that participants perceived simultaneous tone pairs as sound- ing equally loud that were equal in their peak sound level, but not in their original MIDI velocities. As measurements described in Section 2.4 revealed, the connection between MIDI velocities and the peak sound level of the recorded tones resulting from those MIDI velocities varied strongly with pitch. This finding together with the results of Experiment I led to the decision to select the samples for the Experi- ments IV and V according to their peak sound levels. This had the consequence that a certain intensity combination showed different MIDI velocity values on different pitches. Results from the regression models suggest that the absolute MIDI velocity values explained the results as well as the velocity combinations only. This issue requires more profound perceptual and acoustic investigation. It is still unclear why the peak sound level changes so much from one tone to the other and why this changes so strongly with microphone position. Furthermore, it needs to be examined how listeners perceive intensity at the piano and how this perception relates to acoustic measures. A possible solution would be to introduce a perceptual model of intensity perception that involves intensity as well as timbral information that is prototypical for the piano sound. It might be that the effects of asynchrony were so fragile that they got obscured by the above mentioned uncertainty of as- signing intensity to the stimulus material. As the questions regarding perceived tone salience were formulated in terms of “how loud” particular tones sound to the listeners, it might be that listeners rated the loudness only and tried to cancel out any effects of asynchrony. Some participants indicated after the experiments that it was easy to assign loudness ratings on the various tones, if only they sounded together. So, it might be that for future research it would be better to ask for “how transparent,” “how distinct,” or “how singing” individual voices sound. Another approach could be to ask how many tones can be detected out of a four or five-tone chord (cf. e.g., DeWitt and Crowder, 1987). These points raise the questions of whether alterations in relative onset timing result in an enhanced salience of a particular voice or in an increased salience of even more than one voice at a time (corresponding to the concept of multiplicity, that is, the number of tones simultaneously noticed in a chordal sonority, Parncutt, 4.8. General discussion 135

1989, p. 92). The parallel increase of the individual saliences of voices could lead to a more transparent sonority in which each voice can be tracked distinctly by a listener (Rasch, 1979, 1988). This coincides with Huron’s finding that J. S. Bach maximised onset asynchrony in his polyphonic compositions in order to enhance the perceptual salience of the individual voices (Huron, 1993, 2001). 136 Chapter 4. Perception of Melody Chapter 5

Conclusions

This thesis addressed the question of how pianists make individual voices stand out from the background in a contrapuntal musical context, how they realise this with respect to the constraints of piano keyboard construction, and how much each of the expressive parameters used by the performers contributes to the perception of these particular voices. These basic questions were approached from three different methodological directions represented as three major parts (Chapters 2–4) in this thesis. First, in a piano acoustics study a vast amount of data was gathered from three grand pianos produced by different piano makers. These data were collected with an accelerometer setup monitoring key and hammer movements under different touch conditions. This study explored the relationship between the duration of the keystroke (travel time) and the dynamics of the produced tone (in terms of hammer velocity). This relation reflects a simple mechanical constraint: the faster a key was depressed, the shorter was the hammer’s travel to the strings, and thus, the louder the produced sound. This basic relation (travel time in ms versus hammer velocity in m/s) was approximated by a power curve and used for analysis of data collected in the second approach (melody lead study). The temporal characteristics (travel time, key–bottom contact times, instants of maximum hammer velocity) of the measured grand piano action varied only marginally among investigated pianos, hardly at all between different keys, but greatly with the type of touch. When a tone with a certain intensity (hammer ve- locity) is played, it takes around 30–40 ms less time to produce a sound when it was hit from a certain distance above the keys (staccato touch)bycomparisontoa keystroke “from the keys” (legato touch). This finding demonstrated the complexity of what pianists need to be (even unconsciously) aware of when aiming for a desired expressive timing with tones of different intensities and types of touch. Other find- ings confirmed assertions from piano education literature. Depressing the keys from the key surface reduces finger–key noise and thus produces a cleaner sound (cf. to G´at, 1965). Moreover, legato touch allows closer control of the tone, because the instants of maximal hammer velocity, corresponding closely to the points in time

137 138 Chapter 5. Conclusions

when the hammer loses contact with the jack, are later—and thus the time intervals of free flight shorter—relative to a staccato touch of the same intensity. On the other hand, very loud tones can only be achieved with keystrokes from a certain distance above (staccato touch). Although these studies on the temporal behaviour of three different grand piano actions delivered a huge amount of data, the research is still in progress. A prelimi- nary attempt was made to infer subjective judgement of playability from the tempo- ral behaviour of the grand piano actions. To reach more definite conclusions on the quality of pianos and on how piano actions should be adjusted, more investigations have to be performed. Research of this kind has not adequately integrated the vast knowledge of piano makers and piano technicians (see, e.g., Dietz, 1968) into the re- search process. Further studies should take advantage of their large experience and (sometimes anecdotal) knowledge in order to examine the effective impact of their work (tuning, intonation, regulation) on the playability of grand pianos. For exam- ple, Askenfelt and Jansson (1990b, 1991, 1992a) worked in close co-operation with well-regarded piano technicians from the National Swedish Radio and the Swedish Academy of Music. Research in instrumental acoustics, performance research, and the wide empirical expertise of piano makers and technicians should no longer be separate knowledge areas, but interacting fields of interest that mutually benefit from each other. Researcher in instrumental acoustics could systematically investi- gate the process of piano tuning, regulation of action, and intonation of the hammers by examining the effect of each individual adjustment by a technician. It might be that some explanations of craft’s habits by piano technicians are questionable or untrue, others might be confirmed. Such a co-operation could lead to a better and more complete understanding of the complex processes involved in expressive piano performance. The second approach of this thesis was through a performance study in which 22 professional pianists played two excerpts by Fr´ed´eric Chopin on a B¨osendorfer computer-controlled grand piano. The performance data were analysed with re- spect to tone onset asynchronies and dynamic differences between the principal voice and the accompaniment. The melody was found to precede the other voices by around 30 ms, confirming findings from previous studies (melody lead,cf.Vernon, 1937; Palmer, 1989, 1996; Repp, 1996a). The earlier a melody tone appeared the louder that tone was with respect to timing and intensity of the other chord tones. This evidence supported the velocity artifact hypothesis (Repp, 1996a) that ascribed the melody lead phenomenon to mechanical constraints of the piano keyboard (the louder a tone is hit, the earlier it will arrive at the strings). In order to test this hypothesis, the relative asynchronies at the onset of the keystrokes (finger–key asyn- chronies) were inferred through the travel time–hammer velocity relation from the previous study. Key onset differences between the principal and the other voices showed almost no asynchrony anymore. This finding suggests that pianists started the key movement basically in synchrony; the typical asynchrony patterns (melody lead) were caused by different sound intensities in the different voices. It was con- Chapter 5. Conclusions 139

cluded that melody lead can be largely explained by the mechanical properties of the grand piano action rather than to be seen as an independent expressive device that is applied (or not) by pianists for purposes of expression (Palmer, 1996). In a further evaluation of the collected piano action data of the first part of this thesis, the recording and reproducing capability of the two computer-controlled pianos were investigated—an issue never tackled in performance research although of crucial importance. It revealed that the recording accuracy in timing was ±3msfor the B¨osendorfer SE290 and +20/−28 ms for the Yamaha Disklavier (see Section 2.3, p. 36). This suggests that a performance study examining an effect of the order of some 30 ms would not have been possible with such a Disklavier as used in the accuracy study. It could be that the results, e.g., of Repp (1996a) were blurred, if the Disklavier used in that study (a MX100A upright piano) had comparable properties to that used here. However, this consideration remains speculation until measurements are performed also with Repp’s device. As reported before (p. 40), the recording accuracy could be enhanced by eliminating the trend over tone intensity using a polynomial curve fit (see Figure 2.13, p. 40). Although the performance study on melody lead (Chapter 3) generated convinc- ing evidence for the role of the mechanical constraints in the genesis of melody lead, its perceptual relevance had still to be studied in detail. The third approach in this study involved psychological experiments and judgements of trained musicians to investigate how the systematic manipulation of the two parameters investigated in the previous study (relative onset timing and variation in tone balance of the chords) altered the perception of individual tones in a multi-voiced musical context. In a series of seven experiments, two main issues were addressed. The first was the threshold for the perception of asynchronies between two or three almost simulta- neous tones. This threshold was at around 30–40 ms for piano tones and was almost independent of tone type (pure, sawtooth, synthesised piano, and real piano), some- what larger than typical values reported in the literature (Hirsh, 1959). However, it was strongly dependent on the sound level of the tones involved. Asynchronies as large as 55 ms were identified as simultaneous by musically trained listeners when the earlier tone was also considerably louder than the later tone. This finding may be explained either by familiarity with piano music (only unfamiliar combinations of relative timing and intensity are detected as asynchronous by listeners) or by for- ward masking (the loud and early tone attenuates the sensation level of a subsequent softer tone for some tens of milliseconds). The second investigative direction examined how manipulation of relative onset timing and intensity altered the perceived salience of individual tones. Five exper- iments investigated the perception of dyads, three-tone chords, and sequences of three-tone chords with and without intensity manipulation, as well as an excerpt of real music by Chopin. In all experiments, the effects of relative onset timing were relatively small although many factors were simultaneously manipulated through- out the experiments (type of tone, interval, pitch, position of the target tone). The perceived salience depended primarily on the intensity relations of the stimuli. Only 140 Chapter 5. Conclusions

when intensity was absent as a cue or when the rated tone (target tone)wassofter than the rest, did melody lead help the softer and thus masked tone to be heard. This effect was stronger when the target tone was in the upper or middle voice and weakest with the target tone in the bass. Apart from these masking effects, stream- ing (Bregman and Pinker, 1978) barely changed loudness ratings in the experimental condition with repeated chords in comparison to the experiment without repeated chords. According to Bregman’s theory of auditory scene analysis (Bregman and Pinker, 1978; Bregman, 1990), early tone onsets helped to group those tones into a melodic stream (stream segregation) in a chordal music context. However, this effect was weak in the present study. In the real music excerpt by Chopin, a delayed voice attenuated and an early voice (anticipation) enhanced loudness ratings, but only in conditions with dynamically balanced voices. When intensity variation be- tween voices was included, delay just as anticipation enhanced loudness ratings—in contrast to findings from the previous experiments.

These unclear effects of asynchrony bring up several questions. When anticipa- tion and delay do not clearly affect loudness ratings, why can we find the principal instrument leading by the same amount of time in various instrumental ensembles other than piano (Rasch, 1979, 1988)? Previously, research found an early voice to catch the attention, because it is first and it is for some fractions of a second—though almost not as such perceivable—not masked by any other sound. On the other hand, these ensemble asynchronies may be explained by the simple fact that the leading instruments also leads the ensemble and, thus, appears some tens of milliseconds ahead of the others (as, e.g., the beat of some conductors especially of large orches- tras are often visibly before the orchestra). However, it might be that asynchronies do not enhance the salience of certain voices, that is the likelihood to be heard as individual voices by listeners (Parncutt, 1989), but increase the salience of all voices. In other words, relative timing differences may render a multi-voiced context more transparent. With respect to the present listening experiments, it might have been that asking whether a tone or voice gets louder or softer with time shift was not the right thing to ask; so it would have been better to ask for voice transparency, singing quality, expressivity of a voice, or even for the number of voices perceptually immediately present to the listeners.

In an interview study by Parncutt and Holming (2000), university students of piano performance were largely unaware of the melody lead phenomenon. This was possibly unsurprising, because melody lead occurs automatically with dynamic differentiation between voices and second it is hardly detected as sounding asyn- chronously at all. This would explain why we find many statements in the pedagogic literature of how to shape chords timbrally or how to emphasise single voices exclu- sively with reference to tone intensity, but almost never to small timing changes. To complement quotes by Horowitz (Eisenberg, 1928, cf. p. 73) and Neuhaus (1973, cf. p. 1), an excerpt of an interview of Konrad Wolff with the pianist Alfred Brendel is cited here. Chapter 5. Conclusions 141

“(...) at the beginning of the ‘Waldstein’ Sonata you have four-voiced chords. If you play them in the manner recommended in the book1 (the soprano and bass leading and the middle voices slightly in the back- ground) you will get a great deal of clarity but a totally wrong atmo- sphere. The atmosphere of this beginning is pianissimo misterioso (...)” “In the case of the ‘Waldstein’, it is not daylight but dawn, I would say, not bright energy but mystery – even within the strict rhythmic pulse – and for me that tips the balance in favour of the inner voices. I play the inner voices slightly stronger than the outer voices. That makes the chord sound softer. This is an important matter. If the outer voices are played louder than the inner voices it does not sound pianissimo, no matter how soft you try to play them. The inner voices, in certain positions, give the dolce character, the warmth.” (Brendel, 1990, p. 241, emphasis in original.)

Both pianists, Brendel and Schnabel, referred to how softly or strongly to play certain keys under the tacit assumption that chords have to be played synchronously or at least with no reference to tone onset synchronisation. They refer to the overall intensity impression, the timbre of the chord, and its “character.” According to the findings from the present studies (velocity artifact hypothesis), the two inner voices would appear some milliseconds earlier in Brendel’s performance—because his inner voices are louder—in comparison to Schnabel’s performance in which the opposite would be the case. It might be that the dynamic impression and the character of a chord is at least as much dependent on the relative timing than on the intensity of its tones. However, the relation between relative timing and intensity balance of a chord and the perceived timbre has to remain a topic for future investigation. In the production study we found that pianists start the key movement basi- cally simultaneously, based on evidence inferred from a travel time approximation. However, it could be that special playing techniques might entail an early onset of the key movement for an emphasised tone. In this context, instructions by Alfred Cortot for the Chopin Etude op. 10, No. 3 have to be considered here.

“A definitive rule must be followed without fail while practising this polyphonic technique: i.e. the weight of the hand should lean towards the fingers which play the predominant musical part, and the muscles of the fingers playing an accessory part should be relaxed and remain limp.” (Cortot, 1915, p. 20)

Cortot regarded one of the main “difficulties to overcome” in that piece the “intense expressiveness imparted by the weaker fingers and the particular position of the hand arising therefrom” (Cortot, 1915, p. 20). With the hand skewed towards the

1The purpose of the interview was to discuss about the teaching of Artur Schnabel as published by Wolff (1979). 142 Chapter 5. Conclusions

“weaker fingers,” it is possible that with this particular way of realising this task, key movement starts somewhat before the other fingers depress the accompaniment keys. Verbal descriptions of playing techniques are always difficult to implement and to realise on the piano by somebody else, because the bodily awareness of all muscular and motor processes is very different between players. This is usually overcome during a piano lesson by showing certain playing techniques directly to the student; however, this is difficult to achieve through books. To more conclusively address the issue of asynchrony at finger–key level, video recordings of performing pianists coming from different pianistic schools or other special studies would be required. Brendel once mentioned deliberate asynchrony to shape the tone balance of a chord in a special way. In this passage, he suggests to delay a middle voice (which he regarded as the most meaningful voice) in order to increase its perceptual salience.

“To my ears, the sound of thirds and sixths should often be on the dark side. That means that the lower voice in Schubert and Brahms has to be at least as prominent and expressive as the main voice – particularly in minor keys. If I listen to the slow movement of Schubert’s B flat major Sonata, at least the thought that the inner voice is the most meaningful is valuable to me, even if it is not louder than the soprano. Maybe it comes just a split second after the soprano and thus draws imperceptibly a little attention to itself.” (Brendel, 1990, p. 244)

So, Brendel aims for two equally loud tones with the lower voice put slightly more into the foreground by delaying its onset. Such a condition was tested in Experiment VI (Section 4.6, p. 118). There, the importance of the two voices was considered equally to the simultaneous condition by the listeners, but the delayed middle voice lost perceptual salience in comparison to the opposite condition (early middle voice). This Brendel quote either suggests that a dark chord timbre develops when the lower (middle) voice is at least as loud as the upper voice. This corresponds with my experience with rendering the stimulus performances in Experiment VI where the lower voices needed to be strongly reduced in order not to cover the single melody. Secondly, it could be that Brendel rotates his hand as suggested by Parncutt and Holming (2000) to realise what he intended to do (delay despite equally loud voices). And thirdly, Brendel suggests delay of a middle voice as a delicate expressive means to emphasise a single voice. While listening to a recording of Brendel playing the mentioned movement,2 thedarktimbreofthetwomelody voices becomes immediately apparent. However, the mentioned delay of the middle voice could not be heard by the author. Since it is still impossible with present signal processing methods to reliably determine the individual onsets within a piano

2The second movement (Andante sostenuto) from Schubert’s last piano sonata in B flat major, D. 960, Philips Classics, 456 573-2, recorded on June 25, 1997, live at the Royal Festival Hall in London. Chapter 5. Conclusions 143

chord, we cannot further verify Brendel’s statements with this recording, or prove that he might have changed his mind after 18 years.3 All these considerations lead to the question as to whether it is possible at all to play chords with differently shaded single intensities, but without the melody lead effect, thus, consciously cancelling out the mechanical constraints of the keyboard construction. Therefore, the softer tones of a chord have to be depressed slightly before the stronger keystrokes in order to achieve synchrony at the strings. Parncutt and Troup (2002) suggest to lift the finger to play the louder tone from a certain distance above the keys while setting the other fingers in motion. The lifted finger then gains on the others at the strings (Parncutt and Troup, 2002, p. 296). There is no reason why such a technique is not learnable, it is only the question whether it is worthwhile to do so. Perhaps, if young students perform such practising regularly and while doing so sharpen their perception to asynchronies, they would be able to more consciously use asynchrony as an expressive device. And it could be that in special performance conditions absolutely simultaneous chords may sound as an interesting expressive alternative. As a possible application to go more into the phenomenon of melody lead would be a keyboard that compensates for the velocity artifact so that a sound starts later the stronger the key is depressed using an electronic keyboard.4 How would pianists react on such an altered acoustical feedback? Will they be able to notice that melody lead is missing although they play the melody louder? Would the outcoming piano sound seem strange to them, because the supposedly “singing” quality is missing (cf. to Dunsby, 1996, pp. 67–73)? Although the melody lead phenomenon as a small part of the spectrum of tone onset asynchrony effects and possibilities on the piano was thoroughly investigated in this thesis from three different sides (piano acoustics, performance practice, per- ceptual experiments), there are still many issues to address in future research. Es- pecially in the case of onset asynchronies, it would be fruitful for piano students, piano teachers, and researchers to collaborate and to share their knowledge, not only because this issue is not yet conclusively examined, but also because not all neighbouring effects have been clarified entirely.

3The interview with Konrad Wolff dates back to 1979. 4Such a device would have to deal with the temporal properties of an electronic keyboard. The travel time function of an electronic keyboard lasts from finger–key contact to the note-on command that might differ from key–bottom or hammer–string contact times as measured in Chapter 2. 144 Bibliography Bibliography

Allen, F. J. (1913), “Pianoforte touch,” Nature 91(2278), 424–425.

Askenfelt, A. (Ed.) (1990), Five Lectures on the Acoustics of the Piano (Publica- tions issued by the Royal Swedish Academy of Music, Vol. 64, Stockholm).

Askenfelt, A. (1991), “Measuring the motion of the piano hammer during string contact,” Speech, Music, and Hearing. Quarterly Progress and Status Report 1991(4), 19–34.

Askenfelt, A. (1994), “Observations on the transient components of the piano tone,” in Proceedings of the Stockholm Music Acoustics Conference (SMAC’93), July 28– August 1, 1993, edited by A. Friberg, J. Iwarsson, E. V. Jansson, and J. Sundberg (Publications issued by the Royal Swedish Academy of Music, Stockholm), vol. 79, p. 297–301.

Askenfelt, A. (1999), “Personal communication,” .

Askenfelt, A., Galembo, A., and Cuddy, L. L. (1998), “On the acoustics and psy- chology of piano touch and tone,” Journal of the Acoustical Society of America 103(5 Pt. 2), 2873.

Askenfelt, A. and Jansson, E. V. (1990a), “From touch to string vibrations,” in Five Lectures on the Acoustics of the Piano, edited by A. Askenfelt (Publications issued by the Royal Swedish Academy of Music, Stockholm), vol. 64, p. 39–57.

Askenfelt, A. and Jansson, E. V. (1990b), “From touch to string vibrations. I. Timing in grand piano action,” Journal of the Acoustical Society of America 88(1), 52–63.

Askenfelt, A. and Jansson, E. V. (1991), “From touch to string vibrations. II. The motion of the key and hammer,” Journal of the Acoustical Society of America 90(5), 2383–2393.

Askenfelt, A. and Jansson, E. V. (1992a), “From touch to string vibrations. III. String motion and spectra,” Journal of the Acoustical Society of America 93(4), 2181–2196.

145 146 Bibliography

Askenfelt, A. and Jansson, E. V. (1992b), “On vibration and finger touch in stringed instrument playing,” Music Perception 9(3), 311–350.

Behne, K.-E. and Wetekam, B. (1994), “Musikpsychologische Interpretationsfor- schung: Individualit¨at und Intention,” in Musikpsychologie. Empirische Forschun- gen, ¨asthetische Experimente, edited by K.-E. Behne, G. Kleinen, and H. d. la Motte-Haber (Noetzel, Wilhelmshaven), vol. 10, p. 24–32.

Bharucha, J. J. (1983), Anchoring Effects in Melody Perception: The Abstraction of Harmony from Melody, Ph.D. thesis, Harvard University, Cambridge, USA.

Bolzinger, S. (1995), Contribution a l’´etudedelar´etroaction dans la pratique musi- cale par l’analyse de l’influence des variations d’acoustique de la salle sur le jeu du pianiste, Ph.D. thesis, Institut de M´ecanique de Marseille, Unpublished doctoral thesis, Universit´e Aix-Marseille II, Marseille.

Bork, I., Marshall, H., and Meyer, J. (1995), “Zur Abstrahlung des Anschlag- ger¨ausches beim Fl¨ugel,” Acustica 81, 300–308.

Bortz, J. (1999), Statistik f¨ur Sozialwissenschaftler (Springer, Berlin, Heidelberg, New York), 5th revised ed.

Boutillon, X. (1988), “Model for piano hammers: Experimental determination and digital simulation,” Journal of the Acoustical Society of America 83(2), 746–754.

Bregman, A. S. (1990), Auditory Scene Analysis. The Perceptual Organization of Sound (The MIT Press, Cambridge, Massachusetts).

Bregman, A. S. and Pinker, S. (1978), “Auditory streaming and the building of timbre,” Canadian Journal of Psychology 32, 19–31.

Brendel, A. (1990), Music Sounded Out. Essays, Lectures, Interviews, Afterthoughts (Robson Books, London).

Bresin, R. and Battel, G. U. (2000), “Articulation strategies in expressive piano performance,” Journal of New Music Research 29(3), 211–224.

Bresin,R.andWidmer,G.(2000), “Production of staccato articulation in Mozart sonatas played on a grand piano. Preliminary results,” Speech, Music, and Hear- ing. Quarterly Progress and Status Report 2000(4), 1–6.

B´aron, J. G. (1958), “Physical basis of piano touch,” Journal of the Acoustical Society of America 30(2), 151–152.

B´aron, J. G. and Holl´o, J. (1935), “Kann die Klangfarbe des Klaviers durch die Art des Anschlages beeinflußt werden?” Zeitschrift f¨ur Sinnesphysiologie 66(1/2), 23–32. Bibliography 147

Bryan, G. H. (1913a), “Pianoforte touch,” Nature 91(2271), 246–248.

Bryan, G. H. (1913b), “Pianoforte touch,” Nature 91(2281), 503–504.

Bryan, G. H. (1913c), “Pianoforte touch,” Nature 92(2297), 292–293.

Bryan, G. H. (1913d), “Pianoforte touch,” Nature 92(2302), 425.

Burns, E. M. (1999), “Intervals, scales, and tuning,” in The Psychology of Music, edited by D. Deutsch (Academic Press, San Diego), 2nd ed., pp. 215–264.

Cadox,C.,Lisowski,L.,andFlorens,J.-L.(1990), “A modular feedback keyboard design,” Computer Music Journal 14(2), 47–51.

Chaigne, A. and Askenfelt, A. (1994a), “Numerical simulations of piano strings. I: A physical model for a struck string using finite differences methods,” Journal of the Acoustical Society of America 95(2), 1112–1118.

Chaigne, A. and Askenfelt, A. (1994b), “Numerical simulations of piano strings. II: comparisons with measurements and systematic exploration of some hammer- string parameters,” Journal of the Acoustical Society of America 95(3), 1631– 1640.

Cochran, M. (1931), “Insensitiveness to tone quality,” Australian Journal of Psy- chology 9, 131–134.

Coenen, A. and Sch¨afer, S. (1992), “Computer-controlled player pianos,” Computer Music Journal 16(4), 104–111.

Conklin, H. A. (1996a), “Design and tone in the mechanoacoustic piano. Part I. Piano hammers and tonal effects,” Journal of the Acoustical Society of America 99(6), 3286–3296.

Conklin, H. A. (1996b), “Design and tone in the mechanoacoustic piano. Part II. Piano structure,” Journal of the Acoustical Society of America 100(2), 695–708.

Conklin,H.A.(1996c), “Design and tone in the mechanoacoustic piano. Part III. Piano strings and scale design,” Journal of the Acoustical Society of America 100(3), 1286–1298.

Cortot, A. (Ed.) (1915), Chopin. 12 Studies Op. 10. Student’s Edition (Editions´ Salabert, Paris).

Dain, R. (2002), “The engineering of the concert piano,” Ingenia 12(May), 20–39, Published online at http://www.pianosonline.co.uk/.

Deutsch, D. (1999a), “Grouping mechanisms in music,” in The Psychology of Music, edited by D. Deutsch (Academic Press, San Diego), 2nd ed., pp. 299–348. 148 Bibliography

Deutsch, D. (1999b), “The processing of pitch combinations,” in The Psychology of Music, edited by D. Deutsch (Academic Press, San Diego), 2nd ed., pp. 349–411.

DeWitt, L. A. and Crowder, R. G. (1987), “Tonal fusion of consonant musical intervals: The oomph in Stumpf,” Perception and Psychophysics 41(1), 73–84.

DeWitt, L. A. and Samuel, A. G. (1990), “The role of knowledge-based expectations in music perception: Evidence from musical restoration,” Journal of Experimental Psychology: General 119(2), 123–144.

Dietz, F. R. (1968), Steinway Regulation. Das Regulieren von Fl¨ugeln bei Steinway (Verlag Das Musikinstrument, Frankfurt am Main).

Divenyi, P. L. and Hirsh, I. J. (1974), “Identification of temporal order in three-tone sequences,” Journal of the Acoustical Society of America 56(1), 144–151.

Dixon, S. E., Goebl, W., and Widmer, G. (2002a), “Real Time Tracking and Vi- sualisation of Musical Expression,” in Proceedings of the Second International Conference on Music and Artificial Intelligence (ICMAI2002), Edinburgh,edited by C. Anagnostopoulou, M. Ferrand, and A. Smaill (Springer, Berlin et al.), pp. 58–68.

Dixon, S. E., Goebl, W., and Widmer, G. (2002b), “The Performance Worm: Real time visualisation based on Langner’s representation,” in Proceedings of the 2002 International Computer Music Conference, G¨oteborg, Sweden,editedbyM.Nor- dahl (The International Computer Music Association, San Fransisco), pp. 361– 364.

Dowling,W.J.(1990), “Expectancy and attention in melody perception,” Psy- chomusicology 9(2), 148–160.

Dunsby, J. (1996), Performing Music: Shared Concerns (Clarendon Press, Oxford).

Eisenberg, J. (1928), “Noted Russian pianist urges students to simplify me- chanical problems so that thought and energy may be directed to artis- tic interpretation,” The Musician 1928(June), 11, available electronically at http://users.bigpond.net.au/nettheim/horowitz/horo28.htm.

Fletcher,H.andMunson,W.A.(1933), “Loudness, its definition, measurement and calculation,” Journal of the Acoustical Society of America 5, 82–108.

Fletcher, N. H. and Rossing, T. D. (1998), The Physics of Musical Instruments (Springer, New York, Berlin), 2nd ed.

Friberg, A. (1995), A Quantitative Rule System for Musical Performance,Ph.D. thesis, Department of Speech, Music and Hearing, Royal Institute of Technology, Stockholm. Bibliography 149

Friberg, A. and Sundberg, J. (1995), “Time discrimination in a monotonic, isochro- nous sequence,” Journal of the Acoustical Society of America 98(5), 2524–2531.

Fucci, D., Harris, D., Petrosino, L., and Banks, M. (1993), “The effect of prefer- ence for rock music on magnitude-estimation scaling behavior in young adults,” Perceptual and Motor Skills 76(3, Pt 2), 1171–1176.

Fucci, D., Kabler, H., Webster, D., and McColl, D. (1999), “Comparisons of magni- tude estimation scaling of rock music by children, young adults, and older people,” Perceptual and Motor Skills 89, 1133–1138.

Fucci, D., McColl, D., and Petrosino, L. (1998), “Factors related to magnitude estimation scaling of complex auditory stimuli: Aging,” Perceptual and Motor Skills 87(3, Pt 1), 836–838.

Fucci, D., Petrosino, L., McColl, D., Wyatt, D., and Wilcox, C. (1997), “Magnitude estimation scaling of the loudness of a wide range of auditory stimuli,” Perceptual and Motor Skills 85, 1059–1066.

Gabrielsson, A. (1987), “Once again: The Theme from Mozart’s Piano Sonata in A Major (K.331),” in Action and Perception in Rhythm and Music,editedby A. Gabrielsson (Publications issued by the Royal Swedish Academy of Music, Stockholm), vol. 55, pp. 81–103.

Galembo, A. (1982), “Quality evaluation of musical instruments (in Russian),” Technical Aesthetics 5, 16–17.

Galembo, A. (2001), “Perception of musical instrument by performer and lis- tener (with application to the piano),” in Proceedings of the International workshop on Human Supervision and Control in Engineering and Music, September 21–24, 2001 (University of Kassel, Kassel, Germany), p. 257–266, http://www.engineeringandmusic.de/.

Galembo, A. and Cuddy, L. L. (1997), “Large grand versus small upright pianos: Factors of timbral difference,” Journal of the Acoustical Society of America 102(5 Pt. 2), 3107.

Geringer,J.M.,Fucci,D.,Harris,D.,Petrosino,L.,andBanks,M.(1993), “Loud- ness estimations of noise, synthesizer, and music excerpts by musicians and non- musicians. The effect of preference for rock music on magnitude-estimation scaling behavior in young adults,” Psychomusicology 12(1), 22–30.

Gillespie, B. (1992), “Dynamical modeling a the grand piano action,” in Proceed- ings of the International Computer Music Conference (ICMC’1992) (International Computer Music Association, San Francisco), pp. 77–80. 150 Bibliography

Giordano, N. (1997), “Simple model of a piano soundboard,” Journal of the Acous- tical Society of America 102(2), 1159–1168.

Giordano, N. (1998a), “Mechanical impedance of a piano soundboard,” Journal of the Acoustical Society of America 103(4), 2128–2133.

Giordano, N. (1998b), “Sound production by a vibrating piano soundboard: Exper- iment,” Journal of the Acoustical Society of America 104(3, Pt. 1), 1648–1653.

Giordano, N. and Winans II, J. P. (2000), “Piano hammer and their force compres- sion characteristics: Does a power law make sense?” Journal of the Acoustical Society of America 107(4), 2248–2255.

Goebl, W. (1999a), “Analysis of piano performance: towards a common perfor- mance standard?” in Proceedings of the Society of Music Perception and Cogni- tion Conference (SMPC99) (North-Western University, Evanston, Illinois, USA).

Goebl, W. (1999b), Numerisch-klassifikatorische Interpretationsanalyse mit dem “B¨osendorfer Computerfl¨ugel”, Magisterarbeit, Institut f¨ur Musikwissenschaft, Universit¨at Wien, Wien, available electronically at http://www.oefai.at/∼wernerg/.

Goebl, W. (2000), “Skilled piano performance: Melody lead caused by dynamic differentiation,” in Proceedings of the 6th International Conference on Music Per- ception and Cognition (ICMPC6), Aug 5–10, 2000,editedbyC.Woods,G.Luck, R. Brochard, F. A. Seddon, and J. A. Sloboda (Keele University, Department of Psychology, Keele, UK), pp. 1165–1176.

Goebl, W. (2001), “Melody lead in piano performance: Expressive device or arti- fact?” Journal of the Acoustical Society of America 110(1), 563–572.

Goebl, W. and Bresin, R. (2001), “Are computer-controlled pianos a reliable tool in music performance research? Recording and reproduction precision of a Yamaha Disklavier grand piano,” in Workshop on Current Research Directions in Com- puter Music, November 15–17, 2001, edited by C. L. Buyoli and R. Loureiro (Audiovisual Institute, Pompeu Fabra University, Barcelona, Spain), p. 45–50.

Goebl, W. and Bresin, R. (2003a), “Measurement and reproduction accuracy of computer-controlled grand pianos,” Journal of the Acoustical Society of America 114, in press.

Goebl, W. and Bresin, R. (2003b), “Measurement and reproduction accuracy of computer-controlled grand pianos,” in Proceedings of the Stockholm Music Acous- tics Conference (SMAC’03), August 6–9, 2003, edited by R. Bresin (Department of Speech, Music, and Hearing, Royal Institute of Technology, Stockholm, Swe- den), vol. 1, p. 155–158. Bibliography 151

Goebl, W., Bresin, R., and Galembo, A. (2003), “The piano action as the performer’s interface: Timing properties, dynamic behaviour, and the per- former’s possibilities,” in Proceedings of the Stockholm Music Acoustics Confer- ence (SMAC’03), August 6–9, 2003, edited by R. Bresin (Department of Speech, Music, and Hearing, Royal Institute of Technology, Stockholm, Sweden), vol. 1, p. 159–162.

Goebl, W. and Parncutt, R. (2001), “Perception of onset asynchronies: Acoustic piano versus synthesized complex versus pure tones,” in Meeting of the Society for Music Perception and Cognition (SMPC2001), August 9–11, 2001 (Queens’s University, Kingston, Ontario, Canada), p. 21–22.

Goebl, W. and Parncutt, R. (2002), “The influence of relative intensity on the per- ception of onset asynchronies,” in Proceedings of the 7th International Conference on Music Perception and Cognition, Sydney (ICMPC7), Aug. 17–21, 2002,edited by C. Stevens, D. Burnham, G. McPherson, E. Schubert, and J. Renwick (Causal Productions, Adelaide), pp. 613–616.

Goebl, W. and Parncutt, R. (2003), “Asynchrony versus intensity as cues for melody perception in chords and real music,” in Proceedings of the 5th Triennial ES- COM Conference, September 8–13, 2003, edited by R. Kopiez, A. C. Lehmann, I. Wolther, and C. Wolf (Hanover University of Music and Drama, Hanover, Ger- many).

Green, D. M. (1971), “Temporal auditory acuity,” Psychological Review 78(6), 540–551.

G´at,J.(1965), The Technique of Piano Playing (Corvina, Budapest), 3rd ed.

Hall, D. E. (1986), “Piano string excitation in the case of small hammer mass,” Journal of the Acoustical Society of America 79(1), 141–147.

Hall, D. E. (1987a), “Piano string excitation II: General solution for a ard narrow hamer,” Journal of the Acoustical Society of America 81(2), 535–546.

Hall, D. E. (1987b), “Piano string excitation III: General solution for a soft narrow ammer,” Journal of the Acoustical Society of America 81(2), 547–555.

Hall, D. E. (1993), “Musical dynamic levels of pipe organ sounds,” Music Perception 10(4), 417–434.

Hall, D. E. (2002), Musical Acoustics (Brooks/Cole, Pacific Grove, CA), 3rd ed.

Hall, D. E. and Askenfelt, A. (1988), “Piano string excitation V: Spectra for real hammers and strings,” Journal of the Acoustical Society of America 83(4), 1627– 1638. 152 Bibliography

Handel, S. (1993), Listening. An Introduction to the Perception of Auditory Events (MIT-Press, Cambridge, Massachusetts, London, UK).

Hart, H. C., Fuller, M. W., and Lusby, W. S. (1934), “A precision study of piano touch and tone,” Journal of the Acoustical Society of America 6, 80–94.

Hartmann, A. (1932), “Untersuchungenuber ¨ das metrische Verhalten in musikalis- chen Interpretationsvarianten,” Archiv f¨ur die gesamte Psychologie 84, 103–192.

Hartmann, W. M. (1998), Signals, Sound, and Sensation, Modern Acoustics and Signal Processing (Springer, New York).

Hayashi, E., Yamane, M., and Mori, H. (1999), “Behavior of piano-action in a grand piano. I. Analysis of the motion of the hammer prior to string contact,” Journal of the Acoustical Society of America 105(6), 3534–3544.

Heaviside, O. (1913), “Pianoforte touch,” Nature 91(2277), 397.

Henderson,M.T.(1936), “Rhythmic organization in artistic piano performance,” in Objective Analysis of Musical Performance, edited by C. E. Seashore (The Uni- versity Press, Iowa City), vol. IV of University of Iowa Studies in the Psychology of Music, pp. 281–305.

Henning, G. B. and Gaskell, H. (1981), “Monaural phase sensitivity with Ronken’s paradigm,” Journal of the Acoustical Society of America 70(6), 1669–1673.

Hirsh, I. J. (1959), “Auditory perception of temporal order,” Journal of the Acous- tical Society of America 31, 759–767.

Hirsh, I. J. and Watson, C. S. (1996), “Auditory psychophysics and perception,” Annual Review of Psychology 47, 461–484.

Hoover,D.M.andCullari,S.(1992), “Perception of loudness and musical prefer- ence: Comparison of musicians and nonmusicians,” Perceptual and Motor Skills 74(3, Pt 2), 1149–1150.

Hudson, R. (1994), Stolen Time: The History of Tempo Rubato (Clarendon Press, Oxford).

Huron, D. B. (1989), “Voice denumerability in polyphonic music of homogeneous timbres,” Music Perception 6, 361–382.

Huron, D. B. (1993), “Note-onset asynchrony in J. S. Bach’s two part inventions,” Music Perception 10(4), 435–444.

Huron, D. B. (2001), “Tone and voice: A derivation of the rules of voice-leading from perceptual principles,” Music Perception 19(1), 1–64. Bibliography 153

Huron, D. B. and Fantini, D. (1989), “The avoidance of inner-voice entries: per- ceptual evidence and musical practice,” Music Perception 9, 93–104.

Juslin, P. N. and Madison, G. (1999), “The role of timing patterns in recogni- tion of emotional expression from musical performance,” Music Perception 17(2), 197–221.

Kendall, R. A. and Carterette, E. C. (1990), “The communication of musical ex- pression,” Music Perception 8, 129–164.

Knoblaugh, A. F. (1944), “The clang tone of the pianoforte,” Journal of the Acous- tical Society of America 16(1), 102.

Koornhof, G. W. and van der Walt, A. J. (1994), “The influence of touch on piano sound,” in Proceedings of the Stockholm Music Acoustics Conference (SMAC’93), July 28–August 1, 1993, edited by A. Friberg, J. Iwarsson, E. V. Jansson, and J. Sundberg (Publications issued by the Royal Swedish Academy of Music, Stock- holm), vol. 79, p. 302–308.

Langner, J. and Goebl, W. (2002), “Representing expressive performance in tempo- loudness space,” in ESCOM 10th Anniversary Conference on Musical Creativity, April 5–8, 2002 (Universit´edeLi`ege, Li`ege, Belgium), CD-ROM.

Langner, J. and Goebl, W. (in press), “Visualizing expressive performance in tempo-loudness space,” Computer Music Journal .

Langner,J.,Kopiez,R.,andFeiten,B.(1998), “Perception and Representation of Multiple Tempo Hierarchies in Musical Performance and Composition: Perspec- tives from a New Theoretical Approach,” in Controlling Creative Processes in Music, edited by R. Kopiez and W. Auhagen (P. Lang: Schriften zur Musikpsy- chologie und Musik¨asthetik, Frankfurt a. M.), vol. 12, pp. 13–35.

Langner, J., Kopiez, R., Stoffel, C., and Wilz, M. (2000), “Realtime analysis of dynamic shaping,” in Proceedings of the 6th International Conference on Mu- sic Perception and Cognition (ICMPC6), Aug 5–10, 2000,editedbyC.Woods, G. Luck, R. Brochard, F. A. Seddon, and J. A. Sloboda (Keele University De- partment of Psychology, Keele, UK), pp. 452–455.

Lerdahl, F. and Jackendoff, R. (1983), A Generative Theory of Tonal Music (MIT Press, Cambridge (Mass.), London).

Leshowitz, B. (1971), “Measurement of the two-click threshold,” Journal of the Acoustical Society of America 49(2, Pt. 2), 462–466.

Lieber, E. (1985), “On the possibilities of influencing piano touch,” Das Musikin- strument 34, 58–63. 154 Bibliography

Lisboa, T., Zicari, M., and Eiholzer, H. (2002), “Mastery through imitation,” in ESCOM 10th Anniversary Conference on Musical Creativity, April 5–8, 2002 (Universit´edeLi`ege, Li`ege, Belgium), CD-ROM. Maria, M. (1999), “Unsch¨arfetests mit hybriden Tasteninstrumenten,” in Global Village – Global Brain – Global Music. KlangArt Kongreß 1999, edited by B. En- ders and J. Stange-Elbe (Osnabr¨uck, Germany). Martin, D. W. (1947), “Decay rates of piano tones,” Journal of the Acoustical Society of America 19(4), 535–541. Meyer, J. (1965), “Die Richtcharakteristik des Fl¨ugels,” Das Musikinstrument 14, 1085–1090. Meyer, J. (1978), Acoustics and the Performance of Music (Verlag Das Musikin- strument, Frankfurt am Main, Germany). Meyer, J. (1999), Akustik und musikalische Auff¨uhrungspraxis (Bochinsky, Ger- many), 4th ed. Meyer, L. B. (1973), Explaining Music: Essays and Explorations (University of California Press, Berkeley, CA). Meyer-Eppler, W. (1949), Elektrische Klangerzeugung. Elektronische Musik und sysnthetische Sprache (D¨ummler, Bonn). Moog, R. A. and Rhea, T. L. (1990), “Evolution of the keyboard interface: The B¨osendorfer 290 SE recording piano and the Moog multiply-touch-sensitive key- boards,” Computer Music Journal 14(2), 52–60. Moore, B. C. J. (1997), An Introduction to the Psychology of Hearing (Academic Press, San Diego, CA), 4th ed. Moore, G. (1979), Am I Too Loud? Memoirs of an Accompanist (Hamish Hamilton, London). Mori, T. (2000), Ein Vergleich der qualit¨atsbestimmenden Faktoren von Klavier und Fl¨ugel, Braunschweig, TU Carolo-Wilhelmina, Diss (Verlagsgruppe Mainz, Wissenschaftsverlag, Aachen). Morton,W.B.(1913), “Pianoforte touch,” Nature 91(2280), 477. Nakamura, I. (1989), “Fundamental theory and computer simulation of the decay characteristics of piano sound,” Journal of the Acoustical Society of Japan 10(5), 289–297. Nakamura,T.(1987), “The communication of dynamics between musicians and lis- teners through musical performance.” Perception and Psychophysics 41(6), 525– 533. Bibliography 155

Namba, S. and Kuwano, S. (1990), “Continuous multi-dimensional assessment of musical performance,” Journal of the Acoustical Society of Japan 11(1), 43–51.

Namba, S., Kuwano, S., Hatoh, T., and Kato, M. (1991), “Assessment of musical performance by using the method of continuous judgement by selected descrip- tion,” Music Perception 8, 251–276.

Narmour, E. (1990), The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model (University of Chicago Press, Chicago).

Neuhaus, H. (1973), The Art of Piano Playing (Barrie & Jenkins, London).

Ortmann, O. (1925), The Physical Basis of Piano Touch and Tone (Kegan Paul, Trench, Trubner; J. Curwen; E. P. Dutton, London, New York).

Palmer, C. (1989), “Mapping musical thought to musical performance,” Journal of Experimental Psychology: Human Perception and Performance 15(12), 331–346.

Palmer,C.(1996), “On the assignment of structure in music performance,” Music Perception 14(1), 23–56.

Palmer, C. and Brown, J. C. (1991), “Investigations in the amplitude of sounded piano tones,” Journal of the Acoustical Society of America 90(1), 60–66.

Palmer, C. and Holleran, S. (1994), “Harmonic, melodic, and frequency height influences in the perception of multivoiced music,” Perception and Psychophysics 56(3), 301–312.

Palmer, C. and van de Sande, C. (1993), “Units of knowledge in music perfor- mance,” Journal of Experimental Psychology: Learning, Memory, and Cognition 19(2), 457–470.

Pampalk, E., Rauber, A., and Merkl, D. (2002), “Content-based organization and visualization of music archives,” in Proceedings of the 10th ACM International Conference on Multimedia (ACM, Juan les Pins, France), p. 570–579.

Pampalk, E., Widmer, G., and Chan, A. (2003), “A New Approach to Hierarchical Clustering and Structuring of Data with Self-Organizing Maps,” Intelligent Data Analysis Journal 8(2), in press.

Parlitz, D., Peschel, T., and Altenm¨uller, E. (1998), “Assessment of dynamic finger forces in pianists: Effects of training and expertise,” Journal of Biomechanics 31(11), 1063–1067.

Parncutt, R. (1989), Harmony. A Psychoacoustical Approach (Springer, Berlin). 156 Bibliography

Parncutt, R. and Holming, P. (2000), “Is scientific research on piano performance useful for pianists?” in Poster presentation at the 6th International Confer- ence on Music Perception and Cognition (ICMPC6), Aug. 5–10, 2000,edited by C. Woods, G. Luck, R. Brochard, F. A. Seddon, and J. A. Sloboda (Keele University, Psychology Department, Keele, UK), pp. 412–413.

Parncutt, R. and Troup, M. (2002), “Piano,” in The Science and Psychology of Music Performance. Creative Strategies for Teaching and Learning,editedby R. Parncutt and G. McPherson (University Press, Oxford, New York), pp. 285– 302.

Pastore,R.E.,Harris,L.B.,andKaplan,J.K.(1982), “Temporal order iden- tification: Some parameter dependencies,” Journal of the Acoustical Society of America 71(2), 430–436.

Pickering, S. (1913a), “Pianoforte touch,” Nature 91(2283), 555–556.

Pickering, S. (1913b), “Pianoforte touch,” Nature 92(2302), 425.

Plomp, R., Wagenaar, W. A., and Mimpen, A. M. (1973), “Musical interval recog- nition with simultaneous tones,” Acustica 29, 101–109.

Podlesak, M. and Lee, A. R. (1988), “Dispersion of waves in piano strings,” Journal of the Acoustical Society of America 83(1), 305–317.

Rasch, R. A. (1978), “The perception of simultaneous notes such as in polyphonic music,” Acustica 40, 21–33.

Rasch, R. A. (1979), “Synchronization in performed ensemble music,” Acustica 43, 121–131.

Rasch, R. A. (1988), “Timing and synchronization in ensemble performance,” in Generative Processes in Music: The Psychology of Performance, Improvisation, and Composition, edited by J. A. Sloboda (Clarendon Press, Oxford), pp. 70–90.

Rauber, A., Pampalk, E., and Merkl, D. (2002), “Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by sound simi- larities,” in Proceedings of the 3rd International Conference on Music Information Retrieval (ISMIR’02) (IRCAM – Centre Pompidou, Paris, France), p. 71–80.

Repp, B. H. (1993a), “Music as motion: A synopsis of Alexander Truslit’s ‘Gestal- tung und Bewegung in der Musik’,” Psychology of Music 21, 48–72.

Repp, B. H. (1993b), “Some empirical observations on sound level properties of recorded piano tones,” Journal of the Acoustical Society of America 93(2), 1136–44. Bibliography 157

Repp, B. H. (1994), “On determining the basic tempo of an expressive music per- formance,” Psychology of Music 22, 157–167.

Repp, B. H. (1995a), “Acoustics, perception, and production of legato articulation on a digital piano,” Journal of the Acoustical Society of America 97(6), 3862–3874.

Repp, B. H. (1995b), “Expressive timing in Schumann’s “Tr¨aumerei”: An analysis of performances by graduate student pianists,” Journal of the Acoustical Society of America 98(5), 2413–2427.

Repp, B. H. (1996a), “Patterns of note onset asynchronies in expressive piano performance,” Journal of the Acoustical Society of America 100(6), 3917–3932.

Repp, B. H. (1996b), “Pedal timing and tempo in expressive piano performance: A preliminary investigation,” Psychology of Music 24(2), 199–221.

Repp, B. H. (1996c), “The art of inaccuracy: Why pianists’ errors are difficult to hear,” Music Perception 14(2), 161–184.

Repp, B. H. (1996d), “The dynamics of expressive piano performance: Schumann’s “Tr¨aumerei” revisited,” Journal of the Acoustical Society of America 100(1), 641–650.

Repp, B. H. (1997a), “Acoustics, perception, and production of legato articula- tion on a computer-controlled grand piano,” Journal of the Acoustical Society of America 102(3), 1878–1890.

Repp, B. H. (1997b), “The effect of tempo on pedal timing in piano performance,” Psychological Research 60(3), 164–172.

Repp, B. H. (1999), “A microcosm of musical expression: II. Quantitative analysis of pianists’ dynamics in the initial measures of Chopin’s Etude in E major,” Journal of the Acoustical Society of America 105(3), 1972–88.

Reuter,C.(1995), Der Einschwingvorgang nichtperkussiver Musikinstrumente (P. Lang, Frankfurt am Main).

Riley-Butler, K. (2001), “Comparative performance analysis through feedback technology,” in Meeting of the Society for Music Perception and Cognition (SMPC2001), August 9–11, 2001 (Queen’s University, Kingston, Ontario, Canada), p. 27–28.

Riley-Butler, K. (2002), “Teaching expressivity: An aural–visual feed- back–replication model,” in ESCOM 10th Anniversary Conference on Musical Creativity, April 5–8, 2002 (Universit´edeLi`ege, Li`ege, Belgium), CD-ROM.

Roads, C. (1986), “B¨osendorfer 290 SE computer-based piano,” Computer Music Journal 10(3), 102–103. 158 Bibliography

Roederer, J. G. (1973), Introduction to the Physics and Psychophysics of Music (Springer, New York, Heidelberg, Berlin).

Rosen, S. and Howell, P. (1987), “Is there a natural sensitivity at 20 ms in rela- tive tone-onset-time continua? A reanalysis of Hirsh’s (1959) data,” in The Psy- chophysics of Speech Perception, edited by M. E. H. Schouten (Martinus Nijhoff Publishing, Dordrecht, Netherlands), vol. X, pp. 199–209.

Seashore,C.E.(1937), “Piano touch,” Scientific Monthly, New York 45, 360–365.

Shaffer, L. H. (1981), “Performances of Chopin, Bach and Bart`ok: Studies in motor programming,” Cognitive Psychology 13, 326–376.

Shaffer, L. H. (1984), “Timing in solo and duet piano performances,” Quarterly Journal of Experimental Psychology: Human Experimental Psychology 4, 577– 595.

Shaffer, L. H., Clarke, E. F., and Todd, N. P. M. (1985), “Metre and Rhythm in Pianoplaying,” Cognition 20, 61–77.

Shaffer, L. H. and Todd, N. P. M. (1987), “The interpretative component in mu- sical performance,” in Action and Perception in Rhythm and Music,editedby A. Gabrielsson (Publications issued by the Royal Swedish Academy of Music, Stockholm), vol. 55, pp. 139–152.

Skinner, L. and Seashore, C. E. (1936), “A musical pattern score of the first move- ment of the Beethoven sonata, opus 27, No. 2,” in Objective Analysis of Musical Performance, edited by C. E. Seashore (University Press, Iowa), vol. IV of Studies in the Psychology of Music, pp. 263–279.

Stahnke, W. (2000), “Personal communication,” .

Stevens, S. S. (1961), “The measurement of loudness,” Journal of the Acoustical Society of America 33, 1577–1585.

Suzuki, H. (1986), “Vibration and sound radiation of a piano soundboard,” Journal of the Acoustical Society of America 80(6), 1573–1582.

Suzuki, H. (1987), “Model analysis of a hammer-string interaction,” Journal of the Acoustical Society of America 82(4), 1145–1151.

Taguti, T., Ohtsuki, K., Yamasaki, T., Kuwano, S., and Namba, S. (2002), “Quality of piano tones under different tone stoppings,” Acoustical Science and Technology 23(5), 244–251.

Terhardt, E. (1974), “On the perception of periodic sound fluctuations (rough- ness),” Acustica 30, 201–213. Bibliography 159

Terhardt,E.(1979), “Calculating virtual pitch,” Hearing Research 1(2), 155–182.

Terhardt, E., Stoll, G., and Seewann, M. (1982), “Pitch of complex signals ac- cording to virtual-pitch theory: Tests, examples, and predictions,” Journal of the Acoustical Society of America 71(3), 671–678.

Tillmann, B. and Bharucha, J. J. (2002), “Effect of harmonic relatedness on the detection of temporal asynchronies,” Perception and Psychophysics 64(4), 640– 649.

Timmers, R., Ashley, R., Desain, P., and Heijink, H. (2000), “The influence of mu- sical context on tempo rubato,” Journal of New Music Research 29(2), 131–158.

Tro, J. (1994), “Perception of micro dynamical variation in piano performance,” in Proceedings of the Stockholm Music Acoustics Conference (SMAC’93), July 28– August 1, 1993, edited by A. Friberg, J. Iwarsson, E. V. Jansson, and J. Sundberg (Publications issued by the Royal Swedish Academy of Music, Stockholm), vol. 79, pp. 150–154.

Tro, J. (1998), “Micro dynamics deviation as a measure of musical quality in pi- ano performances?” in Proceedings of the 5th International Conference on Music Perception and Cognition (ICPMC5), August 26–30, 1998,editedbyS.W.Yi (Western Music Research Institute, Seoul National University, Seoul, Korea).

Tro, J. (2000a), “Aspects of control and perception,” in Proceedings of the COST–G6 Conference on Digital Audio Effects (DAFX–00), December 7–9, 2000, edited by D. Rocchesso and M. Signoretto (Universit`a degli Studi di Verona, Di- partimento Scientifico e Tecnologico, Verona, Italy), pp. 171–176.

Tro, J. (2000b), “Data reliability and reproducibility in music performance mea- surements,” in Proceedings of the Seventh Western Pacific Regional Acoustics Conference (WESTPRAC–VII), October 3–5 2000 (The Acoustical Society of Japan, Kumamoto, Japan), p. 391–394.

Truax, B. (1978), Handbook for Acoustic Ecology, vol. 5 of World Soundscape Project (A.R.C. Publications, Vancouver, B.C.), first ed.

Truslit, A. (1938), Gestaltung und Bewegung in der Musik (Chr. Fiedrich Vieweg, Berlin-Lichtenfelde).

Van den Berghe, G., De Moor, B., and Minten, W. (1995), “Modeling a grand piano key action,” Computer Music Journal 19(2), 15–22. van Noorden, L. (1975), Temporal Coherence in the Perception of Tone Sequences, Doctoral dissertation, Institute for Perception Research, Eindhoven University of Technology, Eindhoven, The Netherlands. 160 Bibliography

Vernon, L. N. (1937), “Synchronization of chords in artistic piano music,” in Objec- tive Analysis of Musical Performance, edited by C. E. Seashore (University Press, Iowa), vol. IV of Studies in the Psychology of Music, pp. 306–345.

Vos, J. and Rasch, R. A. (1981a), “The perceptual onset of musical tones,” in Music, Mind and Brain. The Neuropsychology of Music, edited by M. Clynes (Plenum Press, New York, London), pp. 299–319.

Vos, J. and Rasch, R. A. (1981b), “The perceptual onset of musical tones,” Per- ception and Psychophysics 29(4), 323–335.

Wallach, H., Newman, E. B., and Rosenzweig, M. R. (1949), “The precedence effect in sound localization,” American Journal of Psychology 62, 315–336.

Watkins,A.J.(1985), “Scale, key, and contour in the discrimination of tuned and mistuned approximations to melody,” Perception and Psychophysics 37(4), 275–285.

Weinreich, G. (1977), “Coupled piano strings,” Journal of the Acoustical Society of America 62, 1474–1484.

Weinreich, G. (1990), “The coupled motion of piano strings,” in Five Lectures on the Acoustics of the Piano, edited by A. Askenfelt (Publications issued by the Royal Swedish Academy of Music, Stockholm), vol. 64, pp. 73–81.

Wheatley, C. W. C. (1913), “Pianoforte touch,” Nature 91(2275), 347–348.

White,W.B.(1930), “The human element in piano tone production,” Journal of the Acoustical Society of America 1, 357–367.

Widmer, G. (2001), “Using AI and machine learning to study expressive music per- formance: Project survey and first report,” AI Communications 14(3), 149–162.

Widmer, G. (2002a), “In search of the Horowitz factor: Interim report on a mu- sical discovery project,” in Proceedings of the 5th International Conference on Discovery Science (DS’02), L¨ubeck, Germany (Springer, Berlin).

Widmer, G. (2002b), “Machine discoveries: A few simple, robust local expression principles,” Journal of New Music Research 31(1), 37–50.

Wier,C.C.andGreen,D.M.(1975), “Temporal acuity as a function of frequency difference,” Journal of the Acoustical Society of America 57(6), 1512–1515.

Winckel, F. (1952), Klangwelt unter der Lupe. Aesthetisch-naturwissenschaftliche Betrachtungen (Hesse, Berlin, Wunsiedel).

Wolff, K. (1979), Interpretation auf dem Klavier (The Teaching of Artur Schnabel) Was wir von Schnabel lernen (R. Piper & Co. Verlag, M¨unchen, Z¨urich). Bibliography 161

Yost,W.A.(2000), Fundamentals of Hearing (Academic Press, San Diego).

Zera,J.andGreen,D.M.(1993a), “Detecting temporal asynchrony with asyn- chronous standards,” Journal of the Acoustical Society of America 93(3), 1571– 1579.

Zera,J.andGreen,D.M.(1993b), “Detecting temporal onset and offset asyn- chrony in multicomponent complexes,” Journal of the Acoustical Society of Amer- ica 93(2), 1038–1052.

Zera,J.andGreen,D.M.(1995), “Effect of signal component phase on asynchrony discrimination,” Journal of the Acoustical Society of America 98(2, Pt 1), 817– 827.

Zwicker, E. and Fastl, H. (1999), . Facts and Models, Springer Series in Information Sciences Vol. 22 (Springer, Berlin, Heidelberg), second updated ed. 162 Bibliography Appendix A Ratings of Listening Tests

• Pilot study Questions 1–2. Ratings for musicians and non-musicians . . Table A.1. p. 164

• Experiment I Adjustmentratingsbytimbreandchord ...... TableA.3,p.168

• Experiment II Ratings by instrument, timbre, asynchrony, and intensity . .Table A.3, p. 169

• Experiment III Ratings by instrument, timbre, asynchrony, and intensity . .Table A.4, p. 170

• Experiment IVa Ratings by instrument, voice, asynchrony, and intensity . . . Table A.5, p. 177

• Experiment IVb Ratings by instrument, voice, asynchrony, and intensity . . . Table A.6, p. 178

• Experiment V Ratings by instrument, voice, asynchrony, and intensity . . . Table A.7, p. 179

• Experiment VI Ratings by instrument, voice, intensity, and asynchrony . . . Table A.8, p. 180

163 164 Appendix A. Ratings

Table A.1: Pilot experiment (Section 4.3, p. 87). Frequencies of ratings (1: “the upper”, 0: “the lower”) separately for timbre (1: pure, 2: complex, 3: MIDI, 4: samples recorded from the B¨osendorfer SE290), interval (8: octave, 7: seventh), relative timing (in ms) for the two questions (Question 1: “Which tone is more prominent?”, and Question 2: “Which tone is earlier?”), with musicians’ (Mus) and non-musicians’ (Non-mus) rating frequencies in separate columns.

Frequencies of ratings Question 1Question 2 Timbre Interval Timing Rating Mus Non-mus Mus Non-mus 17−5005476 27−5004277 37−50 0 7 6 11 10 47−50 0 9 9 12 14 18−5003498 28−50 0 4 18 3 38−5005498 48−5004578 17−40056612 27−4004247 37−40055712 47−40 0 9 9 11 13 18−40 0 5 3 10 8 28−4003094 38−40 0 3 2 10 7 48−40035711 17−3003659 27−3005387 37−30 0 5 5 10 9 47−30069814 18−30 0 3 1107 28−3002378 38−30025712 48−30045810 17−2006399 27−2005376 37−20038910 47−20 0 10 8 17 14 18−20 0 5 3 13 8 28−20 0 117 5 38−20026812 48−2004378 17−100 8 6 13 10 27−1004366 37−100 5 6 10 9 47−10 0 10 9 13 13 18−1002398 28−1004487 continued on next page Appendix A. Ratings 165

Table A.1: continued

Timbre Interval Timing Rating Mus1Non-Mus1Mus2 non-Mus2 38−1002399 48−100 4 3 10 7 170045119 270045108 370028614 4700981614 180033107 28001289 38002499 48003479 171005799 271005366 3710046810 47100971314 1 81004 3 11 7 2810041 95 381004511 7 4 8 10 0 6 4 11 10 172008 6 129 272004376 37200671012 4 7 20 0 10 9 14 14 182002084 2820022106 3820024108 482002379 1 73007 4 1011 273003365 37300561310 47300781212 183002 3 106 2830020106 383002488 4830033109 1 74008 5 1310 2740063106 37400661210 47400881313 1 84004 4 1310 2840010103 38400351010 484002511 9 1 75008 6 1510 275003396 3 7 50 0 6 6 11 11 47500881513 continued on next page 166 Appendix A. Ratings

Table A.1: continued

Timbre Interval Timing Rating Mus1Non-Mus1Mus2 non-Mus2 185001397 2850030136 38500231210 4 8 50 0 3 4 11 10 17−50 1 5 5 13 12 27−50 1 6 7 13 11 37−5013398 47−5011084 18−50 1 7 5 11 10 28−50 1 6 8 12 15 38−50 1 5 5 11 10 48−50 1 6 4 13 10 17−40 15 3 14 6 27−40 1 6 7 16 11 37−40 15 4 13 6 47−4011095 18−40 1 5 6 10 10 28−40 1 7 9 11 14 38−40 1 7 7 10 11 48−40 17 4 13 7 17−30 17 3 15 9 27−30 1 5 6 12 11 37−30 15 4 10 9 47−30 14 0 12 4 18−30 1 7 8 10 11 28−30 1 8 6 13 10 38−30 18 4 13 6 48−30 16 4 12 8 17−20 14 6 119 27−20 1 5 6 13 12 37−20 17 1 118 47−20 10 1 3 4 18−20156710 28−20 1 9 8 13 13 38−20 18 3 12 6 48−20 1 6 6 13 10 17−101 2378 27−101 6 6 14 12 37−101 5 3 10 9 47−101 0075 18−10 1 8 6 11 10 28−101 6 5 12 11 38−101 8 6 11 9 48−101 6 6 10 11 17016499 2701 6 41010 continued on next page Appendix A. Ratings 167

Table A.1: continued

Timbre Interval Timing Rating Mus1Non-Mus1Mus2 non-Mus2 370181 144 47011144 1 801 7 61011 280197129 380185119 480175139 1 7101 5 2 11 9 27101 5 61412 37101 6 3128 47101 1 274 1 8101 66911 2 8 10 1 6 8 11 13 38101 64911 48101 4598 172012389 2 7 20 1 6 6 13 12 3 7 20 14 2 10 6 472010064 1 8201 8 9 1214 2 8 20 1 8 7 10 12 3 8 20 1 8 5 10 10 4 8 20 18 6 13 9 1730135 107 2 7 30 1 7 6 14 13 373015378 4730131 86 1 8301 8 6 1012 2 8 30 1 8 9 10 12 3 8 30 1 8 5 12 10 4 8 30 17 6 10 9 174012478 2 7 40 1 4 6 10 12 374014388 4740121 75 184016578 2 8 40 1 9 9 10 15 3 8 40 17 4 10 8 484018499 175012358 2 7 50 1 7 6 11 12 375014397 4750121 55 1 8501 9 6 11 11 2850179712 385018688 485017598 168 Appendix A. Ratings

Table A.2: Experiment I (Section 4.4.2, p. 96). Pairs of MIDI velocity units adjusted by the 26 participants (P), separately for five different conditions involving three tone # types (pure, sawtooth, and real piano) and three chords (B4/G 4, C5/A5, Db5/Bb5). Every participant had to give at least two adjustments, some of them three, because their previous two adjustments were too inconsistent.

Pure Sawth. Piano # P C5/A5 C5/A5 B4/G 4C5/A5Db5/Bb5 1 51/59 61/49 69/41 67/43 65/45 1 53/57 61/49 71/39 65/45 63/47 2 49/61 49/61 65/45 59/51 65/45 2 45/65 45/65 61/49 57/53 59/51 3 49/61 65/45 69/41 63/47 65/45 3 47/63 63/47 67/43 65/45 65/45 4 53/57 77/33 63/47 61/49 61/49 4 47/63 75/35 65/45 55/55 55/55 5 63/47 67/43 69/41 61/49 69/41 5 57/53 69/41 65/45 65/45 63/47 6 59/51 71/39 69/41 65/45 63/47 6 59/51 77/33 63/47 63/47 51/59 6 65/45 79/31 67/43 65/45 63/47 7 55/55 75/35 71/39 63/47 63/47 7 55/55 69/41 61/49 63/47 63/47 8 57/53 73/37 63/47 63/47 61/49 8 63/47 71/39 69/41 65/45 63/47 9 55/55 59/51 63/47 61/49 59/51 9 51/59 49/61 63/47 59/51 57/53 10 53/57 75/35 71/39 75/35 69/41 10 53/57 75/35 73/37 69/41 71/39 11 47/63 77/33 65/45 65/45 57/53 11 59/51 47/63 63/47 65/45 61/49 11 67/43 71/39 69/41 65/45 63/47 12 49/61 49/61 65/45 65/45 61/49 12 51/59 57/53 63/47 65/45 59/51 13 41/69 59/51 71/39 63/47 65/45 13 49/61 51/59 61/49 55/55 61/49 13 57/53 57/53 61/49 55/55 59/51 14 49/61 59/51 63/47 61/49 63/47 14 51/59 55/55 65/45 63/47 51/59 14 53/57 63/47 69/41 63/47 55/55 15 65/45 79/31 77/33 71/39 73/37 15 61/49 77/33 77/33 75/35 69/41 16 45/65 73/37 71/39 63/47 65/45 16 49/61 67/43 69/41 65/45 57/53 17 57/53 55/55 65/45 65/45 49/61 17 55/55 55/55 67/43 65/45 61/49 18 39/71 47/63 59/51 67/43 37/73 18 63/47 43/67 59/51 43/67 41/69 18 49/61 55/55 61/49 53/57 49/61 19 53/57 63/47 69/41 67/43 69/41 19 57/53 65/45 69/41 71/39 73/37 20 55/55 59/51 67/43 65/45 63/47 20 55/55 59/51 67/43 65/45 65/45 21 53/57 63/47 67/43 63/47 67/43 21 55/55 55/55 67/43 65/45 63/47 22 53/57 73/37 63/47 59/51 65/45 22 53/57 67/43 63/47 61/49 65/45 23 55/55 69/41 69/41 63/47 61/49 23 55/55 67/43 67/43 65/45 61/49 24 53/57 59/51 63/47 63/47 61/49 24 53/57 55/55 65/45 63/47 63/47 25 53/57 65/45 63/47 55/55 61/49 25 51/59 63/47 63/47 63/47 59/51 26 59/51 55/55 67/43 63/47 65/45 26 55/55 61/49 67/43 63/47 63/47 Appendix A. Ratings 169 20/+20 − 27 ms (see − osendorfer SE290), 54 and − 10/+10, v5: − 27 ms 0 ms +27 ms +54 ms − 54 ms − 10, v3: 0/0, v4: − 20, v2: +10/ − 27 ms 0 ms +27 ms +54 ms − 54 ms − ··· Pure tone Sawtooth Piano 27 ms 0 ms +27 ms +54 ms 27, 0, +27, +54 ms), and velocity combinations (v1: +20/ − − 54, − 54 ms − 0001000111 21 644216443264421655226532165321553217643276432754427643276532 10 514215431165421655316441165211555217632166321664217531176321 31 665446653366322654227542175432655216533265532644426543266432 1 5.6674.6673.6673.333 1 4.6673.6673.3331.333 0 6.3335.3334.3333.667 0 5 6.333 3 5.333 41 5.3335.3334.3333.6672.667544425434276522754316553265551644226633176421753417642175331 3.333555446553465433654526543254232655236652265542643326543265532 1 3 3 51 6 6 4 645446454445622664467671154422545625544265332645226553166422 5.3334.6674.667 5 4 4 5.667 1 4 7 4 3.3332.667655436643376443654326653166342564327542175422753227632177412 5 3 5 40 6 2 4 6 3 6.6675.3335.6671.3331.333564726522176411766217751177411672117642277461764217633167326 4 5 3 6 4 5.333 1 5 2 5 4 6 4 5.667 3 3.333 4 4 4 5 3 5 4.3333.6672.667654336652365511555515442155532444327653274523654227653266343 1 4 6 3 5 5 3 654336554365543554446644255433654337654275432764227543276522 5.3334.3333.667 555446354366543755435544255543654327653276432765217653276422 6 3 2 4 6 6 3.6671.667754416242275521765116642166522655317562176551764317765177532 1 5 5 3 5 3 5.6675.333 4 4 4 2 5 6 2 6 5 4 3 4 4 6 4 7 4 2 7 6 3 5 5 2 5 3.333654436443466432665316553765421643217652265321744216442165331 5 3 5 2 6 4.6673.667 6 4 3 6 3 4 5 3 5 6 5.3334.3332.6671.667654327443166522665216652166411662326543265446654326644275422 4 6 4 4 4 5 3 4 4 3 5 4 3.667 4.667 7 4 4 4.6674.333 6 634425443355342544434544255432543427653276432755327654276532 3 5 2 6.3334.6673.6672.667664437653264331643317442154321553117633164422763316543175321 4 3 5 3 4 3 5 1.667653326534566542755424633254332643327543374431754227633275432 544446543366433754326543155432544327544364443754326543275442 5 3 4 5 664226642266532655317642165331754227762276421776217762177521 5 6 2 4 663116533267521754317651164421654117652176422754317532275432 3 4 4 655336532266411645226641165321653216522164421544227532175221 2 4 645426643265532654326532165332554325543274423633326433265332 1 5 655337655277621664317551176521774217652276432765327553276551 3 554445554345532654325542164331553326433266432655225533265322 2 544324454275532654316544175532645216652365553674416563176532 ntvvvvvvvvvv v1 v2 v3 v4instrv1v2v3v4v5v1v2v3v4v5 v5 asynchrony ( Table A.3: Experiment IIupper) by (Section instrument 4.4.3, (instr, p. 1: 98). piano, 0: Loudness other ratings instruments), (“Which timbre of (pure, the sawtooth, two samples tones recorded is from louder?”, the from B¨ 1: the lower to 7: the Section 4.4.3, Footnote 9 on p. 99), while the data for the simultaneous pure-tone condition was averaged over three ratings. MIDI velocity units). The missing data was left blank for the first 17 participants in the pure tone condition at 170 Appendix A. Ratings

Table A.4: Experiment III (Section 4.4.4, p. 100). Frequencies of asynchrony detection ratings (“Are the two tones simultaneous?”, 1: simultaneous, 0: asynchronous) separately for instrument (1: piano, 0: other instruments), timbre (1: pure tones, 2: sawtooth tones, 3: samples recorded from the B¨osendorfer SE290), relative timing (in ms), velocity combi- nations (v1: +20/−20, v2: +10/−10, v3: 0/0, v4: −10/+10, v5: −20/+20 MIDI velocity units).

Instrument Timbre Timing Velocity Rating Frequency of ratings 01−54 v10 1 11−54 v10 5 02−54 v10 8 12−54 v10 10 03−54 v10 7 13−54 v10 9 01−27 v10 1 11−27 v10 3 02−27 v10 3 12−27 v10 5 03−27 v10 2 13−27 v10 5 010v10 0 110v10 0 020v100 120v10 1 030v101 130v10 4 0127v10 10 1127 v10 15 0227v1011 1227v10 15 0327v1011 1327v10 15 0154v10 11 1154 v10 15 0254v1011 1254v10 15 0354v1011 1354v10 15 01−54 v2 0 2 11−54 v2 0 5 02−54 v2 0 10 12−54 v2 0 9 03−54 v2 0 8 13−54 v2 0 14 01−27 v2 0 1 11−27 v2 0 5 02−27 v2 0 6 12−27 v2 0 5 continued on next page Appendix A. Ratings 171

Table A.4: continued

Instrument Timbre Timing Velocity Rating Frequency of ratings 03−27 v2 0 3 13−27 v2 0 4 010v20 0 110v20 1 020v20 0 120v20 0 030v20 4 130v20 6 0127v20 11 1127 v2 0 15 0227v20 11 1227v20 14 0327v20 11 1327v20 15 0154v20 11 1154 v2 0 15 0254v20 11 1254v20 15 0354v20 11 1354v20 15 01−54 v3 0 2 11−54 v3 0 6 02−54 v3 0 10 12−54 v3 0 14 03−54 v3 0 10 13−54 v3 0 14 01−27 v3 0 1 11−27 v3 0 4 02−27 v3 0 8 12−27 v3 0 9 03−27 v3 0 8 13−27 v3 0 9 010v30 0 110v30 1 020v30 0 120v30 0 030v30 6 130v30 5 0127v30 11 1127 v3 0 14 0227v30 11 1227v30 13 0327v30 10 1327v30 15 0154v30 11 1154 v3 0 15 continued on next page 172 Appendix A. Ratings

Table A.4: continued

Instrument Timbre Timing Velocity Rating Frequency of ratings 0254v30 11 1254v30 15 0354v30 11 1354v30 15 01−54 v4 0 2 11−54 v4 0 7 02−54 v4 0 11 12−54 v4 0 15 03−54 v4 0 11 13−54 v4 0 15 01−27 v4 0 1 11−27 v4 0 7 02−27 v4 0 10 12−27 v4 0 12 03−27 v4 0 10 13−27 v4 0 13 010v40 0 110v40 0 020v40 1 120v40 0 030v40 1 130v40 2 0127v40 11 1127 v4 0 12 0227v40 8 1227v40 8 0327v40 9 1327v40 11 0154v40 11 1154 v4 0 14 0254v40 10 1254v40 14 0354v40 10 1354v40 15 01−54 v5 0 2 11−54 v5 0 7 02−54 v5 0 11 12−54 v5 0 15 03−54 v5 0 11 13−54 v5 0 15 01−27 v5 0 2 11−27 v5 0 7 02−27 v5 0 11 12−27 v5 0 14 03−27 v5 0 11 13−27 v5 0 13 continued on next page Appendix A. Ratings 173

Table A.4: continued

Instrument Timbre Timing Velocity Rating Frequency of ratings 010v50 0 110v50 0 020v50 1 120v50 1 030v50 0 130v50 0 0127v50 7 1127 v5 0 9 0227v50 3 1227v50 4 0327v50 3 1327v50 2 0154v50 9 1154 v5 0 14 0254v50 9 1254v50 11 0354v50 7 1354v50 8 01−54 v11 1 11−54 v11 2 02−54 v11 3 12−54 v11 5 03−54 v11 4 13−54 v11 6 01−27 v11 1 11−27 v11 4 02−27 v11 8 12−27 v11 10 03−27 v11 9 13−27 v11 10 010v11 11 110v11 15 020v1111 120v11 14 030v1110 130v11 11 0127v11 1 1127 v11 0 0227v110 1227v11 0 0327v110 1327v11 0 0154v11 0 1154 v11 0 0254v110 1254v11 0 continued on next page 174 Appendix A. Ratings

Table A.4: continued

Instrument Timbre Timing Velocity Rating Frequency of ratings 0354v110 1354v11 0 01−54 v2 10 11−54 v2 12 02−54 v2 11 12−54 v2 16 03−54 v2 13 13−54 v2 11 01−27 v2 11 11−27 v2 12 02−27 v2 15 12−27 v2 110 03−27 v2 18 13−27 v2 111 010v21 11 110v21 14 020v2111 120v21 15 030v217 130v21 9 0127v21 0 1127 v2 1 0 0227v210 1227v21 1 0327v210 1327v21 0 0154v21 0 1154 v2 1 0 0254v210 1254v21 0 0354v210 1354v21 0 01−54 v3 10 11−54 v3 11 02−54 v3 11 12−54 v3 11 03−54 v3 11 13−54 v3 11 01−27 v3 11 11−27 v3 13 02−27 v3 13 12−27 v3 16 03−27 v3 13 13−27 v3 16 010v31 11 110v31 14 continued on next page Appendix A. Ratings 175

Table A.4: continued

Instrument Timbre Timing Velocity Rating Frequency of ratings 020v3111 120v31 15 030v315 130v31 10 0127v31 0 1127 v3 1 1 0227v310 1227v31 2 0327v311 1327v31 0 0154v31 0 1154 v3 1 0 0254v310 1254v31 0 0354v310 1354v31 0 01−54 v4 10 11−54 v4 10 02−54 v4 10 12−54 v4 10 03−54 v4 10 13−54 v4 10 01−27 v4 11 11−27 v4 10 02−27 v4 11 12−27 v4 13 03−27 v4 11 13−27 v4 12 010v41 11 110v41 15 020v4110 120v41 15 030v4110 130v41 13 0127v41 0 1127 v4 1 3 0227v413 1227v41 7 0327v412 1327v41 4 0154v41 0 1154 v4 1 1 0254v411 1254v41 1 0354v411 1354v41 0 continued on next page 176 Appendix A. Ratings

Table A.4: continued

Instrument Timbre Timing Velocity Rating Frequency of ratings 01−54 v5 10 11−54 v5 10 02−54 v5 10 12−54 v5 10 03−54 v5 10 13−54 v5 10 01−27 v5 10 11−27 v5 10 02−27 v5 10 12−27 v5 11 03−27 v5 10 13−27 v5 12 010v51 11 110v51 15 020v5110 120v51 14 030v5111 130v51 15 0127v51 4 1127 v5 1 6 0227v518 1227v51 11 0327v518 1327v51 13 0154v51 2 1154 v5 1 1 0254v512 1254v51 4 0354v514 1354v51 7 Appendix A. Ratings 177

Table A.5: Experiment IVa (Figure 4.13a, p. 112). Loudness ratings averaged over two chords for 26 participants (P), separately for voice (upper, middle, lower), asynchrony (−55, −27, 0, +27, +55 ms), velocity combinations (v1: 80/38, v2: 65/45, v3: 50/50, v4: 38/55, v5: 28/62 MIDI vel. units), and instrument (instr, 1: piano, 0: other instrument). Upper voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 22346.5224561.523.56621.535.56123.54.57 2 0 2 24.556.52 3 4 56.51.534.566.522.53.55 6 2 2 45.56.5 3 1 2 3 44.561.52.53.55.56.51.524.555.52 3 4 5 61.52 4 5 6 4 1 33.53.556 2 43.54.561.534 5 6 22.5356.51.52.54.555.5 5 1 2.53.5456.52.53.53.5562344.55.532.544.56.52344.56.5 6 1 2.5 3 3.5 5.5 6.5 2.5 3 4 5 6 2 3.5 4.5 6.5 5.5 2 3.5 3 5 6.5 3.5 3 5 5 6.5 7 1 2 3 4.5 4.5 6 2 2.5 4 4.5 6.5 1.5 3 4.5 4.5 5.5 1.5 3 4.5 6.5 6.5 2 2.5 4.5 4 5.5 8 0 2 22.55.56.532.52.556.522.535.56 3 23.556.523.52.55.57 9 1 1.5 3.5 4 6.5 6.5 1 2.5 4 6 7 1.5 2.5 4 6 6 1 2.5 4 7 6.5 2 3 4 4.5 7 100 3 3 45.562.53.545 6 32.545.56.53 33.55.56.522.54 5 6 11 1 3 3 4 5.5 6 3 4 4 4 5.5 3 3.5 3.5 5 6.5 2.5 3.5 4 5 6.5 3 3.5 4 5 6.5 12 1 3.5 3.5 4 4 6 4 3 3.5 5.5 6.5 2 3.5 3.5 4 6.5 2.5 3 3.5 5 7 3.5 3 4 5 6.5 13 0 3.5 2 4 5 6.5 3 2.5 3.5 4.5 6 2.5 3 3 5 4.5 2 2.5 3 4.5 6.5 2.5 3 4 4.5 6.5 140 2.533.54.56.5334562346632.5455.5334.556 15 0 2 3 4.5 6 7 4.5 3.5 3.5 5.5 6.5 1 2 4.5 5.5 6.5 1.5 1.5 4 6 7 1.5 1.5 4 5.5 7 161 3 33.55.56.52.544.556 2 44.55.56.53 3 45.56.533 4 55.5 17 1 2 2.5 4 6 7 1.5 3 3.5 5.5 7 1.5 2 3.5 6 7 1.5 2.5 3 5.5 7 2 3 4 4.5 6.5 181 1.52.54662345.56.511457113.55.571.522.556.5 191 1.533.55 7 33.53.556.51 3 46.56.52 33.55.572 34.556.5 20 0 1.5 2 3.5 5 6.5 2 2.5 3.5 5.5 6.5 1.5 1.5 4 4.5 6 1.5 2.5 4 6 6 1.5 2 4 5 6.5 21 0 2 2.5 4 5 6 4 3 4 4.5 6.5 2.5 3 2.5 5 5.5 3.5 3 3 4 6.5 3 2.5 4 5 6 22 1 2.5 2 4 5.5 6.5 3 3 3.5 5 6 1.5 2.5 4.5 5 5.5 1 2.5 3.5 5.5 6.5 2 2 4 5.5 7 23 1 1.5 1.5 5 5 6.5 2.5 2.5 4.5 5.5 7 2 3 4.5 5 7 1.5 2 5 6.5 7 2 3.5 4.5 5 6 24 1 2 2.5 4 5.5 7 3.5 3.5 3.5 5.5 6.5 1.5 1.5 3.5 6.5 6.5 2.5 2 3 6 6 1.5 2.5 3.5 5.5 7 250 2 2 3 5 72.533.55.561.52 4 5 61.533.55.5622.545.54.5 26 1 2.5 3 3.5 5.5 7 2.5 3 3.5 5.5 6.5 2 3 3.5 6 7 2.5 3 3 5.5 6.5 1.5 2.5 4 5 6 Middle voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 2 2 3.5 5.5 6.5 2.5 4 4 5 6.5 2 2.5 4 4.5 6 2 3.5 4 4.5 6.5 3 4 4 5.5 6.5 2 0 1.5 3.5 4 4.5 6.5 2 2 3.5 3 6.5 1.5 2.5 3.5 3.5 6.5 2 1.5 3 3 6 2 3 3 5 7 3 1 1.534 6 7 21.5445.522.53.556.51 3 4 5 51.533.54.56 4 1 33.544.562.534.55 6 32.53.54.55.52 53.54 5 23.544.56 5 1 2 3.5 4 5 6 2 3 4 4 6.5 2 2.5 4 4.5 6 2.5 2.5 3.5 4 5.5 1.5 4 3.5 4.5 6.5 6 1 3.53.53.5561.543.556.52.52.53.556 23.5455.52 3 4 55.5 7 1 2.5 3.5 3.5 5.5 6.5 1.5 2 4 3.5 6 1 3.5 3.5 4.5 6.5 2.5 3 3.5 5 6 1.5 4 4 4.5 6.5 8 0 2.5 3 4 4.5 5 4 2 4 3.5 6 4 3 2.5 3 6.5 3 4 4 3.5 5.5 3.5 3.5 3.5 4.5 6 9 1 2 2 3 56.51 2 34.56.521.535.57 2 3 3 55.522 4 5 7 10 0 1.5 4 4 4 5.5 2.5 4.5 3.5 4.5 6 1.5 2.5 3 3.5 6 2.5 3.5 3 4 5 2 4.5 3 4 5.5 111 3.54 5 5 62.533.54.55.532.53.5562.534.54.562.544.54.55.5 12 1 1.5 1.5 3 4 4.5 2.5 3 3 4 5.5 1 1.5 3 3.5 5.5 2 2.5 2.5 4 4.5 1.5 2.5 3.5 3.5 5.5 13 0 2.5 5 5.5 5 4 3.5 5 5 5 3 2.5 4.5 4 4.5 5 2.5 3.5 4.5 5 3.5 3.5 4 4.5 5 4 140 333.54.552.53.5446234452.52.544.5534345 15 0 1 3 5 5 6.5 1.5 2.5 5 5 6.5 1 1.5 4 5 6.5 1.5 2.5 2.5 5 7 1 2.5 3 5 7 16 1 2 3.5 3.5 5 5.5 2.5 2.5 3 3 5.5 3 3 3.5 4.5 5.5 2.5 3.5 3 4 6 3 3 3.5 4 5.5 17 1 2 3.5 5 6 7 1 3 3.5 5.5 6.5 1 1 3 5 6 1.5 2.5 3 5 6 1.5 3 3.5 5 6.5 18 1 2 4 4 5.5 6.5 2 3.5 4 5 6.5 2.5 1.5 3 5 6.5 1.5 2 3.5 4.5 5.5 2 3 3 4 6 19 1 1.5 3 3.5 5 6 2 2.5 3.5 5 6 2 1.5 4 4.5 6.5 2 5 3 5 6.5 1.5 3.5 3 4.5 7 200 2.533.55 5 22.5445.51 2 33.54.51.5234.56 12.53.55 6 21 0 2.5 3.5 4 4 4 3.5 4 3 4.5 5.5 2.5 2 3 3.5 5.5 3 3.5 3 5 5.5 3 3.5 4 4 5 22 1 1.5 2.5 4 6 6.5 2 2.5 5 4 6 2 2 4.5 4 6 1 3 3.5 5 6.5 2.5 3 3.5 4.5 6.5 231 1 33.55 72.5344.56.533 4 56.52 3 4 56.523.54 5 6 241 2 34.55 62.5345.56 2 33.5462.54.534.56.524 34.56 25 0 1 2.5 3.5 4.5 4.5 1 2 2.5 3.5 4.5 1 1 3.5 4 2.5 1 1.5 2.5 3 4 2 2 1.5 4 4 26 1 2 3.5 4 5 5.5 2 3.5 4.5 4 6 2 2.5 3.5 4.5 5.5 2 2.5 3.5 5 6 2 3.5 3.5 5 6.5 Lower voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 3 2 4 5.5 6.5 4 2.5 4 6.5 6.5 2.5 3.5 3.5 6.5 6.5 2 3.5 4 4.5 6 3 3 3.5 5 6.5 2 0 2 3.5 3.5 6 7 3 3.5 4 6 7 3 4.5 3.5 5.5 7 3 3 4 4 6.5 2.5 3 4.5 5.5 7 3 1 3 34.556.533.54.576.52.52.53.556.51.53.545.56.5235.55.56.5 4 1 3 3 4 5.5 6 3 4 4.5 5 6 2 3.5 3.5 6 6 2.5 3 4.5 5 6 3 4 4 5 5.5 5 1 33.545.57 3 4 45.56.52.5445.572.53.544.56.5344.55.56.5 6 1 44.545.573.53.54.56.563 43.566.52.52.54.555.533.54 56.5 7 1 2.5444.57 23.55.566.52 3 4 5 7 23.545.571.53.556.56.5 8 0 3 3 2.5 5 7 2 3.5 2.5 5.5 6.5 2 3.5 4 5.5 6.5 3.5 2.5 3 5.5 6 3 4 3.5 5.5 7 9 1 2 2.5 4 6 7 1.5 4 5 6 7 3 2 4.5 6.5 7 3 3 4.5 6 7 2 2.5 4.5 6 7 100 343.556.5443.566.52446635346.52.53.5447 11 1 3.5 3.5 4.5 5.5 6 3.5 3.5 5 5 7 3 3 4.5 5.5 6.5 2.5 4 4 5 6 4 3.5 4.5 5.5 6.5 12 1 3 3 4 4.5 6.5 1.5 2.5 3 5.5 7 2.5 2.5 4 4.5 7 3 2 3 5.5 6.5 1 3 3.5 4 7 13 0 3.5 4.5 4.5 5 4.5 4 4.5 4.5 3 4.5 2.5 3.5 4 4 6.5 2.5 3.5 4 5.5 5.5 2.5 3.5 4.5 4.5 6.5 14 0 2.5 3.5 4.5 5 7 3 4 4 5 6.5 3.5 4 4.5 6 6 2.5 3.5 4 5.5 6.5 3.5 4 5 5 6 15 0 3 4.5 4 5.5 7 3 4 4 6.5 7 2 4 3.5 7 7 3 3.5 3.5 5.5 7 2 3 4.5 5 7 161 3 33.55.573 43.55.56 3 43.566.532.53 46.5333.555.5 17 1 2.5 4.5 4.5 6 7 3.5 4.5 4.5 6.5 6.5 2 4 4.5 7 7 3 3.5 3.5 6 6.5 2.5 4 4 5.5 6.5 18 1 2.5 3 4 5.5 7 2.5 3.5 4 6.5 7 3.5 4 5 5 7 3.5 3 4 5 4 2.5 4 4 5.5 6.5 191 3 4 45.573 3 46.57 23.53.56.5733.544.57 3 4 4 5 7 200 2.554572.53.555.572.533.5672.53.555.5724567 21 0 3.5 4.5 4.5 5.5 6 4 3 4.5 5.5 5.5 2 3.5 3 5 6.5 3 2.5 3.5 5 4.5 4 4 3 5 5.5 22 1 2.5 3 4.5 6 7 4 4.5 4 6 7 2 4 4.5 5.5 7 2.5 3.5 3.5 5 6 2 3.5 4.5 5.5 6 231 3 54.55 73.54.54 6 7 4 44.566.53 4 55.572.544.55.57 24 1 3 3.5 4.5 5.5 7 2.5 3.5 4.5 6 7 2 4 3.5 7 6 3 3 4 5 7 3 3.5 4 5 6.5 25 0 2.5 3.5 4 6 7 3 3.5 4 5 6.5 2 3.5 4.5 4.5 5.5 2.5 3 4.5 5 7 2 4 5 6 5 261 3 44.55.573.53.54.56.573 43.56.572.53.53.556.533.54.557 178 Appendix A. Ratings

Table A.6: Experiment IVb (Figure 4.13b, p. 112). Mean ratings averaged over two chords. Labelling as in Table A.5, p. 177. Upper voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 1.53.53.556 22.544.562.52 3 5 51.52.544.56.5223.55 7 2 0 23.54.55.56 22.54.55.56.522.545.56.52 3 54.5622.544.56 3 1 2 3 3.5 5.5 6 1 3 4.5 5 6 1.5 3 4.5 5 6 1.5 2.5 4.5 5 6 2 3 4 5.5 7 4 1 32.54.55.56 3 34.554.52.5244.55.522.544.554.53 44.55.5 5 1 2.5 3 4 4.5 5.5 3 3.5 4 4.5 5.5 2.5 2 4 4.5 6 1.5 3 4 4.5 6 2 3 4 5 6 6 1 4 2.5 4 5.5 6.5 3 3 4 5.5 5.5 1.5 3.5 3 5 6 2.5 2.5 5 4 5.5 2.5 2.5 5.5 5 6 7 1 2.5 3 3.5 5.5 6.5 1 3 4.5 6 6.5 1.5 2.5 3.5 5.5 7 1.5 2.5 5 5 6 1.5 3 4 5 7 8 0 2.534553.543.5562345.56.53.53456.53.522.55.56.5 9 1 2.5 2 4 5 6.5 2 2 4.5 4.5 6 1 1.5 4.5 6 6.5 1.5 1.5 4 6 7 2 2.5 4 6.5 6.5 100 33.54.555.53 34.55.55.532.54 5 62.53 5 55.5234.55 6 111 2.54.54.55 63.5455.56.5243.55 6 3 34.55 6 33.54.556.5 121 4 3 4 4 6 4 3 44.5632.544.562.53.544 5 33.544.56 13 0 4 3 3.5 3.5 6 2.5 3 4 4.5 5.5 2 3.5 4 5 5.5 3.5 3 4 4.5 5.5 2.5 4 3 5 6 140 3 34.55.5632.544.55.5333.54 62.534 5 5 2 34.53.55 150 2 3 56.5622.54.566.511.54 5 7 12.555.56 11.5556.5 16 1 3 3 4 5.5 5.5 3 3 4 4.5 6 2.5 2.5 3.5 5 6 2.5 3 5 4.5 5.5 2.5 3 4 5 5.5 171 2 34.55.572.52.545.56.51.52 4 5 71.525 56.51.534.556.5 181 334562.53455.522.53.55622.54562.52.5446 191 2.534.556.52.534.556.53 2 4 5 6 23.554.561.534.556.5 20 0 2.5 2.5 4 3.5 6 2.5 3 3 5 6.5 1.5 2 3.5 4.5 6 2 2 3.5 4 6 2 3 4 4.5 6 21 0 3.5 3.5 4.5 5 7 3 4.5 4 5 6.5 3 3.5 2 4 6.5 2 3 4.5 4 6 3 2 4.5 4.5 6 22 1 2 2.5 4.5 5 6 2 2.5 5 5.5 6.5 2 2 4 5.5 6.5 2 2 4.5 5 6 1.5 2 4 5 6 231 2.53 5 56.53 34.556 2 34.5572.53 5 6 72.53.54.54.57 24 1 2 3.5 4.5 5.5 6 2.5 3.5 4.5 5 6 1.5 2.5 4 5 6 1.5 2 4.5 5 6 2.5 3 4.5 5 6 250 2 34.55 62.524.556.52.51.53.5561.534.53 61.51.54.556 26 1 2.5 3 4.5 5.5 6 2.5 2.5 3.5 5 6 2.5 2.5 3 5 6.5 1 3 5 5 6 2 2.5 5 4 6 Middle voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 1.534.54.562.53 4 4 62.543.545.533.544.55.523 4 56.5 2 0 1.524.53.562.53 33.562 32.54.56.52 4 2 5 62.52 4 46.5 3 1 1.5 2.5 4 5 6.5 2.5 2.5 4 5 6 1.5 3 3.5 5 6 1.5 2.5 4 5 6 1.5 3 4 5 6 4 1 3 3 4 5 6 3.5 4 3.5 5 5 2 3 3.5 4.5 5.5 2 3 3.5 4.5 5 3 3 3.5 4.5 5 5 1 2.5 2.5 4 4.5 6 2.5 3 4 4 5.5 1.5 3.5 3 4 6 2 3.5 4 4 6 2.5 3 3.5 5 6 6 1 1.5255.55.5233.557225562.53355.522.544.55.5 7 1 2.5 3 4.5 5 7 2.5 3.5 5 5.5 6 2 2.5 4.5 5 6.5 1.5 3.5 4 5 6 4.5 3 4.5 5 6 8 0 5 44.556 53.5465.57 5 43.56.564.53.55 66.543.556.5 9 1 1.533.55 72.534 56.522 3 56.51 22.54 71.533.556.5 100 2.5344632.54552.5333.55.52.5343.53.523345 11 1 3 3.5 4.5 5 6 3 4 4.5 5 6 2.5 4 4 5 6 3 3.5 4 5 6 2.5 4 4.5 5 5.5 12 1 3.5 3 4 3.5 5 3 3 3 4 5.5 3 3 3.5 4 5.5 2.5 1.5 2 3 5.5 2.5 3 3 4 5 130 5.53.5554.53555444554344.552.534.5555 14 0 2.5 3 3 5 5 3 2.5 3.5 4.5 5 2.5 4 4 4 5 3.5 3.5 4 4.5 5 3 3.5 4 5 5 150 21.545.572.52.544 6 11.53.54.56 2 32.54.561.52.54.556.5 161 33546333.5362.533.545.52.52.533523334.5 17 1 2.5 2.5 4.5 5 7 2.5 2.5 4.5 5.5 6.5 1.5 2.5 3.5 4.5 7 1.5 3.5 3 5 6 1 2 4 5 5.5 18 1 3 3.5 4 5 5.5 3 3 4 5 5.5 2.5 3.5 4 4.5 5 2 3.5 3.5 4 6 3 3 4 4.5 5 191 2.52.53562.533561.533.5563334.5622.5456 20 0 2.5 3 3 5 6 2.5 2.5 3.5 4 5.5 2.5 2.5 4 4 5.5 1.5 2.5 3.5 4.5 5.5 1 3 3.5 4.5 6 21 0 2 2.5 4 4 6 2.5 3 3.5 4 5.5 2.5 2.5 3.5 4.5 5 2 2.5 2.5 4.5 6 2.5 2 4 4 5.5 22 1 1 2 3 5 6 2 2.5 4 5 6 1.5 4 4 4.5 7 2 2.5 3 5 6 1.5 2.5 4.5 5 6 23 1 2 3 4.5 5 6 3 3.5 4.5 5 5.5 2.5 3.5 4 5 6 3 3.5 4 5 5.5 3.5 4 4 5 6 24 1 4 2.5 3.5 4.5 6 3 3 3.5 4.5 6 2.5 2.5 3 5 6 2.5 3 3 4.5 6 2.5 3 3.5 5 6 25 0 2.5 1.5 2.5 4 6 2 2 3 3 6 1 2.5 2 3.5 5.5 2 2.5 2 4.5 3.5 1 2 4.5 3.5 5.5 261 32.544.56335562.533.5552.53.535623356.5 Lower voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 2.533 5 62.54 3 56.533.54 5 63.533.55 6 33.54 5 6 2 0 2.53.535.56 2 3 44.5622.54.567 3 34.556.523.52.55.57 3 1 235662.5346633.54.56.56.523.54.556.5234.566.5 4 1 33.53.55532.54.54.56235562.53.5355.52.53.5455.5 5 1 334562.5444.5634456.533.544.562.54455.5 6 1 2.543.556.533555.52.5445.5633.54.55.5624356 7 1 3.53.55675456.5734.5577345.56724565.5 8 0 4.5 3 3.5 6.5 7 5.5 2.5 4 6.5 6 5.5 3.5 3 5.5 6.5 5.5 4.5 4 5.5 6.5 4.5 3 4 6.5 6.5 9 1 3 3 56.5723.55 6 72.543.55.572.54 46.5723.545.56.5 10 0 2 4 3.5 4 5.5 2.5 3.5 4 4.5 6 2 3.5 3.5 5 6 3.5 3.5 3.5 5 6 2 4 3 5 6 111 2 5 5 5 6 4 4 55.562.544.55.56.52.5455.56 33.5556.5 121 334563444.56.52.543.54.56234462.53.5456.5 130 3.554.55633556.53.53.545534.55554455.56.5 14 0 3 3 3.5 5.5 5.5 3 3.5 4.5 5 6 2.5 3.5 5 5.5 6 3.5 4 4 5 6 2 4 4.5 5 6 15 0 2.5 5 4 6 7 3 3.5 4.5 5.5 6 2.5 3.5 3.5 5.5 7 3.5 4 4.5 6 6.5 2.5 4 3.5 5 6.5 161 334562.533.556345563454.552.53456.5 17 1 2.5 4 4 5.5 7 2.5 4.5 4.5 5.5 6.5 2 3.5 4.5 6 7 3 3.5 5 5 6.5 2.5 4 3.5 5 6.5 18 1 3 3.5 4 5 6 3 4 4 6 6.5 3.5 3 4 5.5 6 3 4 4 5.5 6.5 2.5 3 4 5.5 6 191 2.533.556.52.534.556.523.54.5562.53.54.55 62.53.53.566 200 3 34.566.533.55 6 72.544.56 7 33.555.56.523.55 66.5 21 0 3 4.5 3.5 5.5 6 3 4 4.5 5 6 2.5 4 4 6 7 3.5 4 4.5 5.5 5.5 2.5 4.5 5 5.5 6 221 2 43.55.56.533.555.56 2 44.56 73.5355.56.52.543.55.57 23 1 3 4 5 5.5 6.5 3 4 5 6 6.5 3 4 4.5 5.5 3.5 3.5 3.5 5 5 6.5 3.5 4 5 5.5 6.5 24 1 3 3.5 4 5 5.5 3 3.5 4 5 5.5 2.5 3.5 4 6 6 3 3.5 4.5 5 5.5 2.5 3.5 3.5 5 6 25 0 2.5 4.5 3.5 5.5 5.5 3.5 4 5 4 6 2.5 2.5 4.5 5.5 7 2.5 3 5 5 6 2 3.5 3 5 6 26 1 3 4 5 6 6.5 3 3.5 5 5 6.5 2.5 3 5 5.5 7 3.5 3.5 5 6 7 2.5 4.5 4.5 5 6.5 Appendix A. Ratings 179

Table A.7: Experiment V (Figure 4.14, p. 113). Mean ratings averaged over two chords. Labelling as in Table A.5, p. 177. Upper voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 2.5 2.5 3 5 7 2 3 4 4.5 6.5 1.5 2.5 4 6 6 1.5 2 3.5 5.5 7 2 2 3.5 4.5 6.5 2 0 11.54 66.52 2 4 5 71.524.57 6 21.535 71.524.566.5 3 1 134.55612.555612456.51.5244.561.5245.56.5 4 1 23.545.562.533.54.56.521.54.5661.523.556.52 23.54.56 5 1 2.5 2.5 4 5.5 7 2 3.5 3.5 4 6.5 1.5 2.5 4.5 5 5.5 1.5 2 3.5 4 7 1 2.5 3.5 5 6 6 1 2.5 3.5 4.5 5.5 7 2.5 3.5 2.5 5 7 2 2.5 4 6 6 3 2.5 3.5 4.5 6.5 2 3 3 5.5 5.5 7 1 1 3 55.571.52.55.55 7 11.53.546.51.523 5 6 12.535.56.5 8 0 31.52 56.52 2 26.56.513.53 5 7 1 22.56.5721.54 6 7 9 1 11.53.5571.523.576.51 12.55.56.51 1 4 76.511.53 66.5 10 0 2 2.5 3.5 5.5 6.5 2 3 3.5 5 6.5 1.5 2 4 6 6 1.5 2.5 3 5 6 1.5 2.5 2.5 5 6 11 1 1.5 3.5 5 5.5 7 2.5 3.5 4.5 6 6.5 2 2 4.5 5.5 6.5 2 2.5 3.5 6 6 2 3 4.5 6 6 12 1 2 3 4 6.5 6.5 2 3.5 4.5 5 6.5 2 2 3.5 4.5 7 2.5 2 3.5 5.5 6.5 2.5 2 4.5 5 7 13 0 1.5 1.5 3.5 5.5 6.5 2.5 2 4.5 5 6 2 2.5 3 5.5 5.5 2.5 3 3 4 6 2 2 3.5 4 7 14 0 2 2.5 3.5 5.5 5.5 2.5 2.5 4 5 6 2 2.5 3.5 5.5 5 2.5 2.5 3.5 4.5 6 3 2.5 4 5 5.5 15 0 1.5 2.5 4 6 7 2 3 3.5 5.5 7 1 1 4.5 6 6 1 1.5 4 5 7 1 1.5 4 6 6.5 161 2 23.55.572 2 45.571.51.55 7 6 11.53 5 7 11.5356.5 171 234.56722.53.55.571146.5711.5357113.556.5 18 1 2 3 4 5 6 2.5 3 4 5 6.5 1.5 1.5 4 5 6.5 1 2 3.5 4.5 6 1.5 2.5 3 4.5 5.5 19 1 2 2.5 3.5 5 7 2 2.5 3.5 5 7 1.5 3 4 5.5 6 2 1.5 3.5 4.5 7 1.5 2.5 3.5 5 6 200 22.54.556.52255.561134.56112.55611.535.56.5 21 0 3 3 3.5 5.5 7 2.5 2.5 4 5.5 6.5 3.5 3 4 6.5 6 2.5 2.5 3.5 4.5 6.5 2 2.5 3 5 7 221 1245.5722.53.5671245.5611.535711.5356.5 231 1.5345.57133.55.5712457124.556.51.52.53.55.56.5 24 1 1.5 2.5 4 5.5 6.5 2 2.5 4 5 6.5 2 1.5 4.5 6 6.5 2.5 2 3 5 6.5 1.5 2.5 3.5 5 6.5 25 0 2 2.5 4.5 5.5 7 2 2.5 4 5.5 7 1.5 2 4.5 5.5 7 1.5 1.5 3.5 5.5 6.5 1 2 3.5 5.5 7 26 1 2.5 2.5 4 5.5 6.5 2 2.5 3 5 6.5 1.5 2 4 6 5.5 1.5 1.5 2.5 5 6.5 1 1.5 3.5 5 6.5 Middle voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 2.533.556.533.54.54.56.523 3 55.522.544.56 2 23.556.5 2 0 1 22.55.54 2 22.55.562 32.5561.533.556.51.52 35.54 3 1 1.5235.5623456123.55.56123.55622456 4 1 4.534 55.52 33.556.51.51.52.556 1 23.545.522.53.556 5 1 1.5 3.5 3 4 6.5 2 3 3.5 4.5 6 2.5 2.5 4 4 6 2.5 2 4 4.5 5 1.5 2.5 3.5 4.5 6 6 1 1.532.55.563 32.5561.52.53 65.52 2 4 5 6 2 22.556 7 1 2 3 3.5 5.5 6.5 2 2 4.5 5 6.5 1.5 1.5 3 5 7 1 1.5 3 4.5 6.5 1 2 4.5 5 6 8 0 3 3 4 4.5 6.5 4 2.5 4 4.5 6.5 3 3 1.5 2 7 4.5 3.5 3.5 4 7 3 2 3.5 4.5 5.5 9 1 1.524 5 7 1 23.54.56.511.53 56.51 23.54 7 1 13.557 10 0 1.5 2.5 3.5 4.5 6 1.5 3 2.5 5 6 1.5 1.5 3 3.5 5.5 1 2 2.5 4.5 5 1.5 1.5 3.5 3.5 5 11 1 2.5 3 4 5.5 6.5 2 4.5 4 5 7 5 2 3.5 5.5 6.5 2 3 4 4.5 6 2 3.5 3.5 5.5 6 12 1 2 2.5 3.5 4 6 3 2 3 3.5 6 1.5 2 2 3 6 2 1.5 2.5 2.5 6 1 2 2 4.5 4.5 13 0 1.5 3 4 4.5 4 5.5 3.5 3.5 5 2 2.5 3.5 3.5 4 1.5 2 2.5 3 5 4 4.5 3 4 5.5 1.5 140 2.53 3 5 5 3 33.555.52 33.5452.543.545.52.533.555.5 15 0 1.5 2 3 5 6.5 1 2 3 5.5 6 1 1 2.5 3.5 6.5 1 1 3.5 5 6.5 1 1 2 5 6.5 16 1 2.5 3 3 5.5 7 2 2.5 2.5 5 7 1.5 1.5 2.5 3.5 6.5 1.5 2 2 3.5 6.5 1.5 2 2 4 6 17 1 2 2.5 4 5.5 6 2 2.5 3 5.5 7 1 1.5 3 5 7 1 1 3.5 5 6.5 1 1 3 4.5 7 181 23.54 5 62.52.544.561.523.55 6 11.544.55.51.523.545.5 191 2 3 3 56.52 3 3 5 6 12.54 56.52 23.55 71.523.546.5 20 0 2 2.5 2.5 3.5 6.5 2 2.5 3 4.5 6 1 1.5 2 3.5 5.5 1.5 1 3 3 6 1.5 2 3.5 4.5 6.5 210 2.53.534.562.53 3 56.5113.546.51 2 24.561.52 2 4 6 22 1 1 1.5 3 5.5 7 1.5 2.5 3 5.5 6.5 1.5 2 3 4.5 6.5 1.5 1.5 3.5 5.5 6.5 2 1.5 2.5 5 6.5 231 1.534.55.562345623456.52.533.5572.533.556 241 1.54.53562.533561.523462.523461.52356 25 0 2 2.5 4 5.5 6 1.5 3.5 2 4 6.5 1 1 1.5 2.5 6 1 1.5 1.5 2.5 6 1.5 1 2 5 6 26 1 2 2.5 2.5 5.5 6 2 2.5 3 5 6 1 2.5 3 5 6 1.5 2.5 3.5 5 6 2 2 3 4.5 6.5 Lower voice −55 ms −27 ms 0 ms +27 ms +55 ms P instr v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 v1 v2 v3 v4 v5 1 1 2 4 3.5 6.5 7 3.5 3.5 4 5.5 6.5 3.5 3.5 4 5.5 6.5 2.5 4 3.5 5 7 2.5 3.5 4 6 7 2 0 23.546.572.53 4 6 71.533.55.571.52.52.54 7 2 3 36.57 3 1 1.5 3.5 5 6 6.5 2 3.5 4 5.5 7 1.5 3.5 5 5.5 7 2 3.5 4.5 5.5 7 1.5 2 4 5.5 7 4 1 2.5 4 4.5 6 6 3 3.5 4 5.5 6.5 1.5 2.5 3.5 6 6 2 2.5 3.5 4.5 6 2 3.5 3.5 5.5 6.5 5 1 3445.56.53445.571.53.5467244562.5445.57 6 1 3 4 5 6 6 33.5556.51.533 6 7 22.53.555.52.52.535.56 7 1 2.5 3.5 5.5 6 7 2 3.5 5 5.5 7 3 3 4.5 5.5 7 2.5 3.5 4.5 6 7 2.5 2.5 4.5 5.5 7 8 0 2345.56.52.545663.53367444.557333.55.56 9 1 2446723.546.572.53.53.56.57234.57732.5467 10 0 2 3.5 3.5 6.5 6 2 3 4.5 5.5 6.5 2 2.5 3.5 5.5 6.5 2.5 3.5 3.5 3 6.5 2 3 3 5.5 6 11 1 3 4 5 5.5 7 3 3.5 4.5 5.5 6.5 2.5 3.5 4 6 6.5 2.5 5 4 5.5 6.5 3 3.5 4.5 5.5 7 121 2.533.55.573 3 3 5 71.533.56 7 22.534 71.53 25.56.5 13 0 1.5 4 4.5 4 6.5 2.5 4 4.5 5.5 4.5 1.5 3 4.5 4 4 2.5 3 4 5 6.5 2.5 3 4 6.5 4 14 0 2.5 4 4 6 6.5 3.5 3.5 4 5 6 2.5 3.5 4 5 6.5 3.5 4 4 4.5 6 2.5 4 4 5.5 6 150 2 44.57 72.534.56.57 2 33.56.572.543.55 71.53.536.57 161 3337733.546722.53.55.5723356.52.53367 17 1 2.5 3.5 3.5 7 7 2.5 3 4.5 5.5 7 1.5 3.5 4.5 6.5 7 2.5 4 4 5 6.5 2 3 3 7 6.5 18 1 2 3 4 5 6 2.5 3.5 4 5 6.5 2.5 3 4 6 6.5 3 3.5 4 5 6.5 2.5 3 4 5.5 6.5 19 1 2.5 3.5 3 6.5 6.5 2 3 4 5.5 7 2 3 4.5 6 7 2.5 4 4 5 6.5 2.5 3 3.5 5.5 6.5 20 0 2 3 4.5 6.5 7 2 3.5 4.5 5.5 7 2.5 3 4 6.5 7 3 3.5 4.5 6.5 6.5 2.5 2.5 4.5 5 7 21 0 2.5 4 5 6.5 6.5 3 3.5 4 6 7 3 4.5 3.5 5.5 7 2.5 3.5 3 4.5 7 2.5 4.5 3.5 6 6 221 3 33.56.5723.55 6 7 22.54.56 7 2 43.54.56.523.546.57 23 1 2.5 3.5 5 6 6.5 2 4.5 5 5.5 7 2.5 3.5 5 6.5 7 2 4 4 6.5 6.5 2.5 3 5 7 7 241 2.533.566.52.5345.56 22.53 6 72.533.54.562 33.566 25 0 2 3.5 4 7 7 2.5 2.5 4.5 6 6.5 2 2.5 3 5 6.5 1.5 3 3 4.5 7 2 3 3.5 6 7 26 1 2.5 3.5 3.5 7 7 2.5 4 5 6 7 2 3.5 4.5 6 7 2 3.5 4 5 6.5 2.5 3.5 3.5 6.5 4 180 Appendix A. Ratings 55 0 +55 − 27, 0, +27, +55 ms). − 55, − 270+27 − 27 0 +27 +55 − 55 − 55 0 +55 − 270+27 − Upper voice Middle voice +0 MV +10 MV +20 MV +0 MV +10 MV +20 MV 27 0 +27 +55 − 55 − Pinstr 1 12 1 53 0 433 5414534455576743345312212 4 543 45 4 13 344 656 5456 3 1 657 4 7777 1 655 5 777 5568 14 3 556 5 343 777 233902444256577713333112111 4 3 667 543 2 4 345 4 3 656 4 4 423 444 332 445 657 2 666 232 4 665 111 777 222 3 111 233 3 555 4 111 232 5 122 364 4 4 343 333 322 112 232 111 101110 3 443121 3131 4 444 565140 4150 2 5 766 5 435 545 544160 3 4 433 5 4 244 2 544 4171 666 6 4 234 3 435 434 666 3181 3 445 5 3 446 443 566191 676 334 777 122 3 766 3 666201 567 3 657 4443355566733344312111 4 3 111 433 644210 3 3 344 333 777 354 3 3 544 3 334 4 4220 4 455 5 5 777 335 5 222 543 2 4 453 646 4231 344 443 3 334 223 4 3 555 343 4 32414 433 667 413 232 622 555 111 655 4251 657 222 4 556 221 321 3 111 423260 766 3 766 333 122 3 3 656 111 423 2 343 3 3 2 111 3 545 3 434 234 232 335 5 555 4 223 666 3 111 322 767 233 123 3 223 434 1 2 122 3 122 234 3 21 322 122 112 121 Table A.8: Experiment VIvoice (Section (upper, 4.6, middle), p. velocity 118). combinations (+0, Ratings +10, of +20 the MIDI 26 velocity participants units, (P) MV), by and instrument asynchrony (instr, ( 1: piano, 0: other instruments), Appendix B

Curriculum Vitae

Name Werner Goebl Address Bonygasse 29/2, A-1120 Vienna, Austria For current contact details, please confer to my webpage http://www.oefai.at/∼werner.goebl.

I was born in Klagenfurt, Austria, September 12, 1973. 1979–1983 Primary School (Volksschule) in Pettendorf near Regensburg, Germany. 1983–1991 Highschool “Musisches Gymnasium” BG III in Salzburg, Aus- tria. June 1991 Graduation with distinction (Matura mit Auszeichnung) at BG III Salzburg in German, Latin, Mathematics, Music, and Physics. 1990–1995 University of Music “Mozarteum,” Salzburg and University of Music, Vienna: piano pedagogy, degree awarded in June 1995 (Instrumental- und Gesangsp¨adagogik, IGP I). 1993–1999 University of Vienna: major in , with Psychology, Sociology, and History as electives. December 1999 Mag. phil. with the Master’s thesis “Numerisch-klassifikatori- sche Interpretationsanalyse mit dem ‘B¨osendorfer Computer- fl¨ugel’ ,” University of Vienna. 1994–2000 Piano performance studies (Konzertfach) at the University of Music, Vienna (Klasse Prof. Noel Flores). January 2000 Concert diploma (1. Diplom).

181 182 Appendix B. Curriculum Vitae

Since February 2000 Researcher at the Austrian Research Institute for Artificial In- telligence in the project: “Computer-Based Music Research: Artificial Intelligence Models of Musical Expression” with Prof. Gerhard Widmer. Since October 2000 PhD student at the Institut f¨ur Musikwissenschaft, Karl- Franzens-Universit¨at Graz with Prof. Richard Parncutt as supervisor. April–July 2001 Guest researcher at the Department of Speech Music and Hear- ing at the Royal Institute of Technology, Stockholm, Sweden. Since October 2002 Piano chamber-music studies with Prof. Avo Kouyoumdjian at the University of Music, Vienna.