Jo¨rg Langner* and Werner Goebl† Visualizing *Musikwissenschaftliches Seminar, Humboldt- Universita¨t zu Berlin Unter den Linden 6 Expressive Performance 10099 Berlin, Germany [email protected] in Tempo–Loudness †Austrian Research Institute for Artificial Intelligence (O¨ FAI) Space Freyung 6/VI 1010 , [email protected]

The previous decades of performance research have gle piece. In general, it remains unclear whether yielded a large number of very detailed studies ana- the expressive deviations measured are due to de- lyzing various parameters of expressive music per- liberate expressive strategies, musical structure, formance (see Palmer 1997 and Gabrielsson 1999 motor noise, imprecision of the performer, or even for an overview). A special focus was given to ex- measurement errors. pressive piano performance, because the expressive In the present article, we develop an integrated parameters are relatively few (timing, dynamics, analysis technique in which tempo and loudness and articulation, including pedaling) and compara- are processed and displayed at the same time. Both tively easy to obtain. The majority of performance the tempo and loudness curves are smoothed with studies concentrated on one of these parameters ex- a window size corresponding ideally to the length clusively, and in most of these cases, this parame- of a bar. These two performance parameters are ter was expressive timing. then displayed in a two-dimensional performance In our everyday experience, we never listen to space on a computer screen: a dot moves in syn- one of these parameters in isolation as it is ana- chrony with the sound of the performance. The tra- lyzed in performance research. Certainly, the lis- jectory of its tail describes geometric shapes that tener’s attention can be guided sometimes more to are intrinsically different for different perfor- one particular parameter (e.g., the forced stable mances. Such an animated display seems to be a tempo in a Prokofieff Toccata or the staccato– useful visualization tool for performance research. legato alternation in a Mozart Allegro), but gener- The simultaneous display of tempo and loudness ally the aesthetic impression of a performance allows us to study interactions between these two results from an integrated perception of all perfor- parameters by themselves or with respect to prop- mance parameters and is influenced by other fac- erties of the musical score. tors like body movements and the socio-cultural The behavior of the algorithm and insights pro- background of a performer or a performance as vided by this type of display are illustrated with well. It can be presumed that the different perfor- performances of two musical excerpts by Chopin mance parameters influence and depend on each and Schubert. In the first case study, two expert other in various and intricate ways. (For example, performances and a professional recording by Maur- Todd 1992 and Juslin, Friberg, and Bresin 2002 pro- izio Pollini are compared; in the second case study, vide modeling-based approaches.) Novel research an algorithmic performance according to a basic methods could help us to analyze expressive music performance model is contrasted by Alfred Bren- performances in a more holistic way to tackle these del’s performance of the same excerpt. These two questions. excerpts were chosen because articulation is con- Another problem of performance analysis is the stant throughout the whole excerpt (legato), and enormously large amounts of information the re- analysis can concentrate on tempo and dynamics. searcher must deal with, even when investigating, for example, only the timing of a few bars of a sin- Method

Computer Music Journal, 27:4, pp. 69–83, Winter 2003 Our visualization requires two main steps in pro- ᭧ 2003 Massachusetts Institute of Technology. cessing. The first step involves data acquisition ei-

Langner and Goebl 69

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 msec is 20ע ther from performances made on special recording relatively quickly. The typical error of instruments such as MIDI grand pianos or directly sufficiently precise for the present purpose (see also from conventional audio recordings (i.e., commer- Goebl and Dixon 2001). cial compact discs). Second, the gathered data must be reduced (smoothed) over a certain time window Loudness Data corresponding to a certain granularity of display. For both MIDI and audio data sources, the loudness information was always taken from a recording of the MIDI file or the audio file itself, respectively. It Timing Data would have been very difficult to model an overall loudness curve based only on MIDI information. The timing information of expressive performances A MATLAB implementation of Zwicker’s loudness in MIDI format has the advantage of having each model (Zwicker and Fastl 2001) was used to con- onset clearly defined, although the precision of vert the audio file (in pulse-code modulated WAV some computer-monitored pianos is not much format) into its loudness envelope in sones (Pam- higher than obtaining timing data from audio re- palk, Rauber, and Merkl 2002). First, the audio sig- cordings. (For a Yamaha Disklavier, see Goebl and nal was converted into the frequency domain and Bresin 2001). Yet, each performed onset must be bundled into critical-bands according to the Bark matched to a symbolic score of a given piece so scale. After determining spectral and temporal that the onsets of the track level can be automati- masking effects, the loudness sensation (sones) was cally determined (i.e., score-performance matching; computed from the equal-loudness levels (phons), see Heijink et al. 2000 and Widmer 2001). The which in turn were calculated from the sound pres- track level is a unit of score time (e.g., quarter note, sure levels in decibels (dB SPL). The loudness enve- eighth note) that defines the resolution at which lope was sampled at 11.6-msec intervals according tempo changes are measured. The track level is to the window size and sampling rate used (1,024 usually faster than the beat as indicated through at 44,100 samples per second with 50% overlap). the time signature. For example, in the Chopin A similar implementation was used in earlier stud- Etude it is the sixteenth note. From this, the tempo ies (Langner et al. 2000; Langner 2002; Langner and curves (in beats per minute relative to the notated Goebl 2002). The advantage of using a loudness beat) are computed. measure instead of sound level is discussed by Timing information from audio recordings was Langner (2002, pp. 29–30). obtained by using an interactive software tool for From the loudness envelope, for each tracked automatic beat detection (Dixon 2001a, 2001b). point in time one loudness value was taken for fur- The software analyzes audio data, finding the on- ther data processing. This single loudness value for sets of as many of the musical notes as possible, each track time was taken as the maximum value and proposes a possible beat track by displaying the in a window half an inter-track interval before and beats as vertical lines over the amplitude envelope after the corresponding track time. Because the of the audio signal. The user has the opportunity to track grid could miss important loud notes in some adjust false track times. The system also provides cases that did not coincide with a track time, this audio feedback in the form of a percussion track windowing procedure accounted for that and would (playing at the track times) mixed with the original have taken the neighboring louder note as the loud- sound. After correcting some of the errors made by ness value of that particular track. This procedure the system, the remainder of the piece can be auto- is particularly important for low track rates. matically retracked, taking into account the correc- tions made by the user. When the user considers Data Reduction the track times to be correct, the track times can be saved on disk in a text format. With this tool, Both tempo and loudness data were smoothed us- timing data of audio recordings can be gathered ing overlapping Gaussian windows. We refer to the

70 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Table 1. Window sizes used for smoothing the third dimension in the display) the trajectory of the performances of the Chopin Etude initial red dot fades out and decreases in size over time. This is meant to evoke an impression of a Performer Window Size (sec) three-dimensional virtual space in which the trajec- 09 2.486 (bar) tory moves towards the viewer. Pianist 18 2.896 (bar) The current dot of the display can show high- M. Pollini 3.212 (bar) level score information, such as the current bar ס 1.606 (quarter note half bar) number. To indicate some types of structural prop- The window sizes correspond to the mode duration of a performed erties of the score, the current dot is enlarged and bar or to the mode duration of a quarter note, respectively. The changed in color at phrase boundaries. mode duration was the most often occurring inter-onset interval Snapshots of this visualization technique as well (quantized to 10 msec; see Goebl and Dixon 2001). as the animations are implemented in the MAT- LAB environment. The animations were saved as window size as being the time from the left to the QuickTime movies using a routine freely available right point of inflection (turning point) of the Gaus- on the Internet (Slaney 1999). The frame rate was sian window (in sec) corresponding to two standard chosen to be 0.1 sec (i.e., 10 frames per sec). The deviations (i.e., 2r). A smoothed data point y at a examples discussed below can be downloaded as certain time t is determined as in Equation 1, QuickTime videos from www.oefai.at/ϳwernerg/ where x(t) is the unsmoothed data with the sam- animations. pling period F (frame period in sec); k was set to the integer nearest to 3r/F, since for larger values the exponential term is vanishingly small. Case Study I: Chopin’s E-Major Etude k 22 iF)/2r To exemplify the properties of the visualization)מiF)e ם ͚ x(t kמסi ס y(t) k (1) technique, several performances of the initial -iF)/222r 21 bars of Chopin’s Etude, Op. 10, No. 3 were cho)מ͚ e k sen. Two of them were recordings on a Bo¨ sendorferמסi For the current study, we usually took a window SE290 computer-controlled grand piano made for a size corresponding to the average performed dura- previous study ( 9 and 18 from Goebl 2001; tion of a bar, resulting typically in a window size of the timing information comes in a MIDI-like for- around 3 sec (see Tables 1 and 2). The choice of mat, and the pianists were asked to perform only window size is arbitrary and can be set by the in- until bar 21), and a professional recording by Maur- vestigator. For example, a window size correspond- izio Pollini (1985). The Chopin Etude has a very ing to the length of a quarter note will bring out homogenous texture. Sixteenth notes are omnipres- more local phenomena of a performance, and a ent; therefore, the track level was set to the smoothing window on a four-bar level will show sixteenth-note level. In Figure 1, the raw tempo only very long-term developments. and loudness curves on the sixteenth-note level of the three performances of the Etude are plotted against time. The thick lines represent the Two-Dimensional Display smoothed data curves with a smoothing window corresponding to an average performed bar, or two The smoothed data is displayed in a two- quarter notes. The exact window sizes are printed dimensional space of tempo (x axis) versus loud- in Table 1. Figure 2 shows a musical score of the ness (y axis). This visualization is conceptualized excerpt. to work as an animation over time: a red dot moves The display of tempo and loudness curves in in synchrony with the music, leaving behind it a Figure 1 illustrates the second of the two main trajectory. To elaborate the impression of time (the steps in data processing as mentioned above: data

Langner and Goebl 71

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 1. Tempo and loud- shows tempo curves (left) lines with the correspond- ness curves of the initial and loudness curves (right) ing numbering on top. In 21 bars of the Chopin on the sixteenth note level the bottom panels (Pol- Etude in E Major, Op. 10, (lines with dot markers) lini), the data were also No. 3, as played by Pianist and smoothed on the bar smoothed on the half-bar 9 (top), Pianist 18 (mid- level (thick line) against level (quarter-note level, dle), and Maurizio Pollini time in sec. The faint ver- dotted lines). (bottom). The figure also tical lines indicate bar

reduction via smoothing. These smoothed curves beginnings of bar 6 are hidden by the trajectories. are the basis for the animations. In Figure 3, snap- However, the reported effects can be seen in shots of such animations are shown. The instant Figure 1 or with the QuickTime movies available where the animation was stopped is either shortly online. Another striking similarity between the pi- after the beginning of bar 14 (left column) or anists is that at the beginning of a phrase, the per- shortly before the beginning of bar 21 (right col- formers mostly increase the tempo first and then umn). The thicker black disks indicate bar lines. the loudness. This leads repeatedly to the phenom- Many interesting observations emerge from that enon that the tempo apex occurs before the loud- combined display that reveal both the similarities ness apex, as can be seen in all three performances and the differences between the performances. The (see Figure 3, left column). Moreover, we can state most fundamental commonality between the three a certain tendency towards a counterclockwise performances is that the expressive trajectory tends movement of the expressive trajectories, as found to go to the lower left side of the space at phrase in all three performances. boundaries at the beginnings of the bars 6, 9, On the other hand, the differences between the and 14, which corresponds to a slowing down and a performances are striking at first glance. As men- decrescendo towards the end of phrases. An excep- tioned above, Pianist 9 is not concerned much with tion is Pianist 9, who ignores the phrase boundaries slowing down at phrase boundaries, except at the at bars 6 and 14. In the snapshots of Figure 3, the end of bar 8. It happens that he starts the large cres-

72 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 cendo from measure 14 onward already in the mid- change the location and extent of turning points dle of the space, whereas both Pianist 18 and Mr. and peaks in data. All three examples in Figure 3 Pollini begin this development closer to the lower- suggest that the performed dynamic peak does not left corner. coincide with the notated climax in bar 17. This is The visually prominent loops resulting from the true for Pianists 9 and 18, but not for Mr. Pollini, increase of measures 14 onward also reveal coun- as can be verified in Figure 1. Mr. Pollini plays the terclockwise shapes for all three pianists. These ff chord loudest while the two others do not. The trajectories reflect the strong crescendo and the ri- quarter note smoothing condition (Figure 1, lower tenuto at the apex of that phrase in bar 17, as indi- left panel) shifts Mr. Pollini’s dynamic peak to- cated in the score. The ritenuto at the apex of wards the correct position, but the performance tra- Pianist 18 is strongest, but Pianist 18 then speeds jectory still suggests a decrescendo before the up again while decreasing volume, a tendency that climax on bar 17. When changes in data do not oc- is only weakly found in Mr. Pollini’s performance cur symmetrically, smoothing will shift the turn- and in Pianist 9’s performance. ing point. Therefore, analysis with this The artistry of Mr. Pollini comes out when the visualization technique should be accompanied by concept of his interpretation is taken into consider- conventional data display for detailed analysis. ation: throughout the first 14 bars, he remains very soft and avoids larger tempo changes to spare his expressive energies for the coming outburst. An- Case Study II: Schubert’s G-Flat-Major other difference between the famous pianist and Impromptu the two others is that Pollini does not slow down The main strength of the introduced display is to too much at the end of the section. He planned, of course, to play the whole Etude and not only the elucidate relations between tempo and dynamics. first 21 measures, as the other pianists did (see also Exactly this relationship was modeled by Neil Figure 1). Todd in a very simple way: ‘‘[T]he faster the The shape of a trajectory is very much dependent louder, the slower the softer’’ (Todd 1992). Windsor on the window size of the smoothing window ap- and Clarke (1997) used this model to test how plied to the data. With the average performed bar much of the expressive variation of a real perfor- duration as the size of the Gaussian window be- mance could be explained and what remains still tween the two turning points, we eliminate all unexplained. They used Schubert’s G-flat-major fluctuations within a bar. Only larger develop- Impromptu (D. 899, No. 3) as a test piece, as shown ments of tempo and dynamics remain in the dis- in Figure 5. The only input to the Todd model is play. Figure 4 displays the same performance of the grouping structure of the musical score accord- Maurizio Pollini with the smoothing window ing to Lerdahl and Jackendoff (1983). Windsor and shrunk by a factor of 2, so that it corresponds to Clarke (1997) produced their preferred algorithmic the quarter-note level (half bar; see also Figure 1 performance using different parameters for timing bottom panels, dotted line). Many more loops ap- and dynamics and called it the hybrid performance. pear, and the extent of used performance space is It was fitted to a human performance by varying larger. For example, the late ff chord in bar 17 and the parameters of the Todd model on five different its delayed succeeding note (as plotted in Figure 1, phrase levels by trial and error. The chosen values left bottom panel) clearly appear in that display, are AP(1,1,1,2,4) for timing and AP(4,8,8,1,1) for dy- but this very local phenomenon was smoothed out namics. (The five different numbers refer to the at bar-level windowing. The general question arises model parameters, beginning with the highest whether we could speak of a slowing down when phrase levels (8-bar, 4-bar, 2-bar, 1-bar, half-bar). one or two notes appear delayed rather than of a Thus, in this hybrid performance, timing is moni- local expressive deviation. tored by the smallest phrase units (half-bar level), It is also important to point out that smoothing whereas the dynamics contours are shaped more by introduces some artifacts too. Smoothing can higher phrase levels (especially 4- and 2-bar phrase

Langner and Goebl 73

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 2. Fre´de´ric Chopin: Etude in E Major, Op. 10, No. 3, measures 1–21. The score was prepared with computer software by the authors after the Paderew- ski Edition.

74 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 3. Expression trajec- 18 (middle), and Maurizio beats per minute (on a log- stants further in the past tories over bars 1–14 (left Pollini (bottom). The tra- arithmic scale) of a quar- appear smaller. The more column) and 1–21 (right jectories of the first 14 ter note, and the y axes prominent black circles in- column) of the beginning bars still can be seen in represent loudness mea- dicate the beginning of a section of Chopin’s Etude, the right panel snapshots sured in sones. The larger new bar. The climax of Op. 10, No. 3, performed as very faint lines. The points represent the end of bar 17 is marked ff (right by Pianist 9 (top), Pianist x axes represent tempo in the excerpt, whereas in- column).

(a) (b)

(c) (d)

structure; see Windsor and Clarke 1997, p. 141). mance shows basically a diagonal line that reflects The exact smoothing values we used are shown in the ‘‘faster-louder’’ rule proposed by Todd. How- Table 2. ever, it is not a perfect diagonal as would have been In Figure 6, the first 16 bars of this algorithmic expected for the following two reasons. First, in performance is contrasted to a professional perfor- this hybrid performance, different parameters for mance by Alfred Brendel (1997). The hybrid perfor- timing and intensity were chosen. Second, the lin-

Langner and Goebl 75

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 3. Continued.

(e) (f)

Figure 4. Expression trajec- Pollini. The trajectories of arithmic scale) of a quar- stants further in the past tories over bars 1–14 (left) the first 14 bars are still ter note, and the y axes appear smaller. The more and 1–21 (right) of the be- observable in the right fig- represent loudness mea- prominent circles indicate ginning section of Cho- ure as very faint lines. The sured in sones. The larger the beginning of a new pin’s Etude, Op. 10, No. 3, x axes represent tempo in points represent the end of bar. performed by Maurizio beats per minute (on a log- the excerpt, whereas in-

(a) (b)

76 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 ear relationship between timing and dynamics in interaction between these two parameters. This vi- terms of MIDI velocity units is not linear when dy- sualization is not only useful for scientific re- namics are expressed in terms of peak loudness search, but it is also intuitively comprehensible for units (in sones), as measured from the audio file re- musicians and audiences. It may therefore serve as corded from the MIDI file using a standard software a visible link between performance research and synthesizer. Still, the difference between the two performance practice. performances is striking: Mr. Brendel divides the The number of performances analyzed so far is first eight bars into two phrases of equal length comparably small. However, two of the tendencies (Figure 6, upper right panel). Then, he spans a observed with this method are so striking that we larger arch over the second eight bars (bars 9–16), decided to present them as preliminary results. where he only holds back his tempo increase First, the pianists of the investigated performances briefly approximately at the middle of bar 10. On tend to approach the climax of a phrase by increas- the other hand, Todd’s model responds obediently ing tempo first and loudness slightly later. Second, to the symmetrical grouping structure. The asym- they shape tempo and loudness within a phrase in metric phrase concept of Mr. Brendel’s interpreta- a way that makes the expression trajectories move tion becomes even more lucid with two-bar level counterclockwise. We found this phenomenon also smoothing (Figure 6, bottom panels). At this level, at smaller time windows, for example at the almost no other trajectory movement can be found eighth-note level for the Chopin Etude for all three in the Todd performance than a strict diagonal one. performances to a striking degree. We observed (To remark that Mr. Brendel mysteriously immor- similar tendencies for performances of composi- talized his initials A. B. in the two-bar smoothing tions by Chopin, Bach, and Beethoven as well; de- would not belong in a scientific discourse!) tailed analyses will be reported in future studies. This comparison of a performance model to a These insights seem to be new for performance re- professional performance was conducted not to search and had not been expected, although similar prove Todd’s model too simple, which the model tendencies can be found in the data plots of earlier was obviously planned to be, but to demonstrate performance studies (e.g., Gabrielsson 1987). the behavior of the two-dimensional display on real The findings of the present study imply some and artificial data and its strength of highlighting preliminary hypotheses that remain subject to fur- ther investigation. Pianists tend to increase loud- relations between the two performance parameters ness at tempo maxima, to decrease tempo at displayed. loudness maxima, to decrease loudness at tempo minima, and to increase tempo at loudness min- ima. This observation possibly mirrors a deep prin- Conclusions and Future Work ciple of shaping tempo and loudness. Further investigation will help to clarify the validity and A novel approach for an integrated visualization of scope of this principle. two performance parameters—tempo and loud- Smoothing leads to a reduction of the amount ness—was introduced, and its properties and behav- and complexity of data. However, another advan- ior were demonstrated with examples from several tage of smoothing is that with the choice of the expressive performances. This approach involves window size, a specific granularity of data display data acquisition from audio recordings and MIDI can be determined. Expressive microstructure (e.g., files, data reduction by continuous smoothing over the systematic variations in the Siciliano rhythm a certain time window, and a two-dimensional dis- reported by Gabrielsson 1987) is the lowest level on play that can be viewed on a computer screen as an which timing can be studied. Larger performance animation or as a snapshot for a particular excerpt trends and developments will become clear at of the performance. higher levels of abstraction. The display shows both the tempo and the dy- The abstraction level in the performance display namic shaping of a performance and elucidates the can be set by the smoothing window size. Multi-

Langner and Goebl 77

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 5. ’s Impromptu in G-flat Ma- jor, D. 899, No. 3, first 16 bars. The score was pre- pared with computer soft- ware by the authors after Editio Musica Budapest.

78 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 6. The first 16 bars umn). The x axis displays ferent ranges. In the top ning of bar 16. The upper of Franz Schubert’s Im- the tempo in beats per panels, the snapshot two pairs of figures display promptu in G-flat major, minute of a half note (be- stopped shortly after the data smoothed at the bar D. 899, No. 3, as ‘‘per- cause the time signature of beginning of bar 9; in the level. In the bottom panels, formed’’ by the Todd that piece is 4/2). The axes middle and bottom panels, 16 bars of the same perfor- model (left column) and of the figures have the the snapshot stopped im- mances were smoothed at Alfred Brendel (right col- same scale but show dif- mediately after the begin- the two-bar level.

(a) (b)

(c) (d)

level displays of performance data try to avoid the The smoothing procedure could be supported by choice for a particular time horizon. Examples for a perceptual hypothesis: the perceived tempo of an this type of display are Dynagrams (Langner et al. expressive performance is more stable than the 2000) or Oscillograms (Langner, Kopiez, and Feiten tempo resulting from the played note onsets mea- 1998; Langner 2002). However, they only deal with sured. This hypothesis was supported by recent re- a single performance parameter at a time. search. In a preference task, listeners appreciated a

Langner and Goebl 79

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Figure 6. Continued.

(e) (f)

Table 2. The window sizes used for smoothing the As mentioned earlier, smoothing always involves performances of the Schubert Impromptu some artifacts in data display. Peaks of larger devel- opments could occur at different positions in the Performance Window Size (sec) smoothed display in comparison to the data, and Hybrid 2.896 (bar) the magnitude of a loudness climax, for example, 5.797 (two bars) appears smaller after smoothing. The larger the A. Brendel 3.690 (bar) smoothing window, the more the smoothed display 7.380 (two bars) will deviate from the data. These effects are obvi- ous and could be reduced by using a window size that varies with musical phrase structure. In any slightly smoothed beat track in comparison to an case, the informed investigator will be able to cope unsmoothed one when they listened to an expres- with these side effects. sive piano performance with the beat track to be The trajectories of this visualization technique rated as a series of metronome clicks sounding in evoke a visual impression of gestural motion. They parallel to the music (Cambouropoulos et al. 2001). might be associated with the motion pictures by They also tend to underestimate local timing devi- Truslit (1938; see Repp 1993), which are synoptic ations when they are asked to tap along with ex- pictures from an inner sensation of motion either pressive performances (Dixon and Goebl 2002). by the performer or the listener. Affinities can also These findings, although still preliminary, could be found to shapes coming from Manfred Clynes’s support the smoothing from a perceptual point of sentograph (e.g., Clynes 1983). Similarities of both view. Still, it remains to be investigated what Truslit and Clynes to the expression trajectories smoothing windows (type and size) correspond best may be seen as merely incidental: they both reflect to a perceptual or mental representation of an ex- some perceptual properties of expressive music per- pressive performance and whether the same inte- formance. Our two-dimensional display, to the con- grating mechanisms apply to the continuous trary, picture measured performance data only. perception of dynamics. These similarities possibly reflect a deeper connec-

80 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 tion. What they do have in common is an affinity quantitatively analyze the two-dimensional time for the phenomenon of motion, which is very series as such with modern artificial intelligence meaningful for both the production and the percep- methods (using self-organizing maps; see Pampalk, tion of music. Widmer, and Chan in press). Such a method allows We envisage a range of future applications of this one to pinpoint pianist-specific performance char- technique. Besides the described analysis of large acteristics over large data sets and to name intrin- numbers of performances by famous musicians, it sic peculiarities of particular famous pianists. could also be used in music teaching to clarify cer- tain high level developments of students’s perfor- mances. Especially for this purpose, a real-time Acknowledgments implementation would be very helpful. A first im- plementation of a real-time system was done by This research was in part supported by project Simon Dixon (Dixon, Goebl, and Widmer 2002), Y99–INF, sponsored by the Austrian Federal Minis- who used a real-time tempo-tracking algorithm try of Education, Science, and Culture (BMBWK) in that processes audio data directly without using the form of a START Research Prize. The Austrian any higher-level knowledge of the musical struc- Research Institute for Artificial Intelligence ac- ture. knowledges basic financial support from the Aus- The idea of the two-dimensional space could also trian Federal Ministry for Education, Science, and be reversed, using it as an interactive control of Culture as well as from the Austrian Ministry for music performance. The user moves the head of Transport, Innovation, and Technology. Thanks are the trajectory using the computer mouse and the due to Luke Windsor and Eric Clarke for providing performance unfolds according to the mouse move- the Todd hybrid performance MIDI file of the ment in the space. A first prototype of such a per- Schubert Impromptu, and to Simon Dixon, Elias formance control system was implemented in Java Pampalk, Giampiero Salvi, Asmir Tobudic, and by students at the Austrian Research Institute for Gerhard Widmer for providing generous help dur- Artificial Intelligence in Vienna. A likewise imple- ing work on this article. Finally, we are grateful to mentation was developed using a different two- Roberto Bresin and an anonymous reviewer for dimensional control space as an user interface for their comments on an earlier draft. controlling morphing between different emotional states in synthesized expressive performances (Can- azza et al. 2000). Their control space is called a per- References ceptual expressive space and spans a space between adjectives like ‘‘hard,’’ ‘‘soft,’’ ‘‘light,’’ and ‘‘heavy.’’ Brendel, A. 1997. (Recorded 1972.) Schubert: The Com- Further work is needed to understand the mean- plete Impromptus; Moment Musicaux. London: Philips ing of certain repeating shapes of the trajectories. Classics, 456 061–2. The analyzed performances could be rated by ex- Cambouropoulos, E., et al. 2001. ‘‘Human Preferences for pert listeners in great detail; that is, they could be Tempo Smoothness.’’ In H. Lappalainen, ed. Proceed- asked to indicate which sections they like and ings of the VII International Symposium on System- which they dislike. These qualitative data could be atic and Comparative Musicology, III International used to evaluate the meaning of the corresponding Conference on Cognitive Musicology. Jyva¨skyla¨, Fin- shapes of the expressive trajectories (Langner and land: University of Jyva¨skyla¨, pp. 18–26. Canazza, S., et al. 2000. ‘‘Audio Morphing Different Ex- Goebl in preparation). Furthermore, it would be in- pressive Intentions for Multimedia Systems.’’ IEEE teresting to compare the performance trajectories Multimedia 7(3):79–83. with emotional responses of musically trained lis- Clynes, M. 1983. ‘‘Expressive Microstructure in Music, teners, for example by using Emery Schubert’s Linked to Living Qualities.’’ In J. Sundberg, ed. Studies method of two-dimensional emotional space (Schu- of Music Performance. : Royal Swedish bert 1996, 1999). Another approach could be to Academy of Music, pp. 76–181.

Langner and Goebl 81

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Dixon, S. E. 2001a. ‘‘Automatic Extraction of Tempo and The GERM Model.’’ Musicae Scientiae Special Issue Beat from Expressive Performances.’’ Journal of New 2001–2002:63–122. Music Research 30(1):39–58. Langner, J. 2002. Musikalischer Rhythmus und Oszilla- Dixon, S. E. 2001b. ‘‘An Interactive Beat Tracking and tion. Eine theoretische und empirische Erkundung. Visualisation System.’’ Proceedings of the 2001 Inter- Frankfurt: P. Lang. national Computer Music Conference. San Francisco: Langner, J., and W. Goebl. 2002. ‘‘Representing Expres- International Computer Music Association, pp. 215– sive Performance in Tempo-Loudness Space.’’ Escom 218. 10th Anniversary Conference on Musical Creativity. Dixon, S. E., and W. Goebl. 2002. ‘‘Pinpointing the Beat: Lie`ge, Belgium: Universite´ de Lie`ge Tapping to Expressive Performances.’’ In C. Stevens, Langner, J., and W. Goebl. In preparation. ‘‘Aesthetic Per- D. Burnham, G. McPherson, E. Schubert and J. Ren- formance Evaluations in the Context of Performance wick, eds. Proceedings of the 7th International Confer- Research: An Assessment of Two-Dimensional Perfor- ence on Music Perception and Cognition, Sydney 200. mance Trajectories.’’ Adelaide, Australia: Causal Productions, pp. 617–620. Langner, J., R. Kopiez, and B. Feiten. 1998. ‘‘Perception Dixon, S. E., W. Goebl, and G. Widmer. 2002. ‘‘Real and Representation of Multiple Tempo Hierarchies in Time Tracking and Visualisation of Musical Expres- Musical Performance and Composition: Perspectives sion.’’ In C. Anagnostopoulou, M. Ferrand, and A. from a New Theoretical Approach.’’ In R. Kopiez and Smaill, eds. Proceedings of the 2nd International Con- W. Auhagen, eds. Controlling Creative Processes in ference on Music and Artificial Intelligence. Berlin: Music. Frankfurt: P. Lang, pp. 13–35. Springer, pp. 58–68. Langner, J., et al. 2000. ‘‘Real-Time Analysis of Dynamic Gabrielsson, A. 1987. ‘‘Once Again: The Theme from Shaping.’’ In C. Woods et al., eds. Proceedings of the Mozart’s Piano in a Major (K.331).’’ In A. Ga- 6th International Conference on Music Perception and brielsson, ed. Action and Perception in Rhythm and Cognition. Keele, UK: Keele University, Department Music. Stockholm: Royal Swedish Academy of Music, of Psychology, pp. 452–455. pp. 81–103. Lerdahl, F., and R. Jackendoff. 1983. A Generative The- Gabrielsson, A. 1999. ‘‘Music Performance.’’ In D. ory of Tonal Music. Cambridge, Massachussetts: MIT Deutsch, ed. Psychology of Music, 2nd ed. San Diego: Press. Academic Press, pp. 501–602. Palmer, C. 1997. ‘‘Music Performance.’’ Annual Review Goebl, W. 2001. ‘‘Melody Lead in Piano Performance: Ex- of Psychology 48:115–138. pressive Device or Artifact?’’ Journal of the Acoustical Pampalk, E., A. Rauber, and D. Merkl. 2002. ‘‘Content- Society of America 110(1):563–572. Based Organization and Visualization of Music Ar- Goebl, W., and R. Bresin. 2001. ‘‘Are Computer- chives.’’ Proceedings of the 10th ACM International Controlled Pianos a Reliable Tool in Music Perfor- Conference on Multimedia. Juan les Pins, France: mance Research? Recording and Reproduction ACM, pp. 570–579. Precision of a Yamaha Disklavier Grand Piano.’’ In Pampalk, E., G. Widmer, and A. Chan. In press. ‘‘A New C. L. Buyoli and R. Loureiro, eds. Workshop on Cur- Approach to Hierarchical Clustering and Structuring of rent Research Directions in Computer Music. Barce- Data with Self-Organizing Maps.’’ Intelligent Data lona, Spain: Audiovisual Institute, Pompeu Fabra Analysis Journal 8(2). University, pp. 45–50. Pollini, M. 1985. (Recorded 1972.) Chopin: Etudes. Ham- Goebl, W., and S. E. Dixon. 2001. ‘‘Analyses of Tempo burg: Deutsche Grammophon Gesellschaft 413 794–2. Classes in Performances of Mozart Piano .’’ In Repp, B. H. 1993. ‘‘Music as Motion: A Synopsis of Alex- H. Lappalainen, ed. Proceedings of the VII Interna- ander Truslit’s ‘Gestaltung und Bewegung in der Mu- tional Symposium on Systematic and Comparative sik’.’’ Psychology of Music 21:48–72. Musicology, III International Conference on Cognitive Schubert, E. 1996. ‘‘Measuring Temporal Emotional Re- Musicology. Jyva¨skyla¨, Finland: University of Jyva¨s- sponse to Music Using the Two-Dimensional Emotion kyla¨, pp. 65–76. Space.’’ In B. Pennycook and E. Costa-Giomi, eds. Pro- Heijink, H., et al. 2000. ‘‘Make Me a Match: An Evalua- ceedings of the 4th International Conference for Music tion of Different Approaches to Score–Performance Perception and Cognition. Montreal: Faculty of Music, Matching.’’ Computer Music Journal 24(1):43–56. McGill University, pp. 263–268. Juslin, P. N., A. Friberg, and R. Bresin. 2002. ‘‘Toward a Schubert, E. 1999. ‘‘Measuring Emotion Continuously: Computational Model of Expression in Performance: Validity and Reliability of the Two-Dimensional

82 Computer Music Journal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Emotion-Space.’’ Australian Journal of Psychology Widmer, G. 2001. ‘‘Using AI and Machine Learning to 51(3):154–165. Study Expressive Music Performance: Project Survey Slaney, M. 1999. ‘‘Quicktime Tools for MATLAB.’’ Inter- and First Report.’’ AI Communications 14(3):149–162. val Technical Report 1999–066. Palo Alto, California : Windsor, W. L., and E. F. Clarke. 1997. ‘‘Expressive Tim- Interval Research Corporation. ing and Dynamics in Real and Artificial Musical Per- Todd, N. P. M. 1992. ‘‘The Dynamics of Dynamics: A Model of Musical Expression.’’ Journal of the Acousti- formances: Using an Algorithm as an Analytical cal Society of America 91(6):3540–3550. Tool.’’ Music Perception 15(2):127–152. Truslit, A. 1938. Gestaltung und Bewegung in der Mu- Zwicker, E., and H. Fastl. 2001. Psychoacoustics: Facts sik. Berlin-Lichtenfelde: Chr. Fiedrich Vieweg. and Models, 2nd edition. Berlin: Springer-Verlag.

Langner and Goebl 83

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021