Visualizing Expressive Performance in Tempo–Loudness Space
Total Page:16
File Type:pdf, Size:1020Kb
Jo¨rg Langner* and Werner Goebl† Visualizing *Musikwissenschaftliches Seminar, Humboldt- Universita¨t zu Berlin Unter den Linden 6 Expressive Performance 10099 Berlin, Germany [email protected] in Tempo–Loudness †Austrian Research Institute for Artificial Intelligence (O¨ FAI) Space Freyung 6/VI 1010 Vienna, Austria [email protected] The previous decades of performance research have gle piece. In general, it remains unclear whether yielded a large number of very detailed studies ana- the expressive deviations measured are due to de- lyzing various parameters of expressive music per- liberate expressive strategies, musical structure, formance (see Palmer 1997 and Gabrielsson 1999 motor noise, imprecision of the performer, or even for an overview). A special focus was given to ex- measurement errors. pressive piano performance, because the expressive In the present article, we develop an integrated parameters are relatively few (timing, dynamics, analysis technique in which tempo and loudness and articulation, including pedaling) and compara- are processed and displayed at the same time. Both tively easy to obtain. The majority of performance the tempo and loudness curves are smoothed with studies concentrated on one of these parameters ex- a window size corresponding ideally to the length clusively, and in most of these cases, this parame- of a bar. These two performance parameters are ter was expressive timing. then displayed in a two-dimensional performance In our everyday experience, we never listen to space on a computer screen: a dot moves in syn- one of these parameters in isolation as it is ana- chrony with the sound of the performance. The tra- lyzed in performance research. Certainly, the lis- jectory of its tail describes geometric shapes that tener’s attention can be guided sometimes more to are intrinsically different for different perfor- one particular parameter (e.g., the forced stable mances. Such an animated display seems to be a tempo in a Prokofieff Toccata or the staccato– useful visualization tool for performance research. legato alternation in a Mozart Allegro), but gener- The simultaneous display of tempo and loudness ally the aesthetic impression of a performance allows us to study interactions between these two results from an integrated perception of all perfor- parameters by themselves or with respect to prop- mance parameters and is influenced by other fac- erties of the musical score. tors like body movements and the socio-cultural The behavior of the algorithm and insights pro- background of a performer or a performance as vided by this type of display are illustrated with well. It can be presumed that the different perfor- performances of two musical excerpts by Chopin mance parameters influence and depend on each and Schubert. In the first case study, two expert other in various and intricate ways. (For example, performances and a professional recording by Maur- Todd 1992 and Juslin, Friberg, and Bresin 2002 pro- izio Pollini are compared; in the second case study, vide modeling-based approaches.) Novel research an algorithmic performance according to a basic methods could help us to analyze expressive music performance model is contrasted by Alfred Bren- performances in a more holistic way to tackle these del’s performance of the same excerpt. These two questions. excerpts were chosen because articulation is con- Another problem of performance analysis is the stant throughout the whole excerpt (legato), and enormously large amounts of information the re- analysis can concentrate on tempo and dynamics. searcher must deal with, even when investigating, for example, only the timing of a few bars of a sin- Method Computer Music Journal, 27:4, pp. 69–83, Winter 2003 Our visualization requires two main steps in pro- ᭧ 2003 Massachusetts Institute of Technology. cessing. The first step involves data acquisition ei- Langner and Goebl 69 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 msec is 20ע ther from performances made on special recording relatively quickly. The typical error of instruments such as MIDI grand pianos or directly sufficiently precise for the present purpose (see also from conventional audio recordings (i.e., commer- Goebl and Dixon 2001). cial compact discs). Second, the gathered data must be reduced (smoothed) over a certain time window Loudness Data corresponding to a certain granularity of display. For both MIDI and audio data sources, the loudness information was always taken from a recording of the MIDI file or the audio file itself, respectively. It Timing Data would have been very difficult to model an overall loudness curve based only on MIDI information. The timing information of expressive performances A MATLAB implementation of Zwicker’s loudness in MIDI format has the advantage of having each model (Zwicker and Fastl 2001) was used to con- onset clearly defined, although the precision of vert the audio file (in pulse-code modulated WAV some computer-monitored pianos is not much format) into its loudness envelope in sones (Pam- higher than obtaining timing data from audio re- palk, Rauber, and Merkl 2002). First, the audio sig- cordings. (For a Yamaha Disklavier, see Goebl and nal was converted into the frequency domain and Bresin 2001). Yet, each performed onset must be bundled into critical-bands according to the Bark matched to a symbolic score of a given piece so scale. After determining spectral and temporal that the onsets of the track level can be automati- masking effects, the loudness sensation (sones) was cally determined (i.e., score-performance matching; computed from the equal-loudness levels (phons), see Heijink et al. 2000 and Widmer 2001). The which in turn were calculated from the sound pres- track level is a unit of score time (e.g., quarter note, sure levels in decibels (dB SPL). The loudness enve- eighth note) that defines the resolution at which lope was sampled at 11.6-msec intervals according tempo changes are measured. The track level is to the window size and sampling rate used (1,024 usually faster than the beat as indicated through at 44,100 samples per second with 50% overlap). the time signature. For example, in the Chopin A similar implementation was used in earlier stud- Etude it is the sixteenth note. From this, the tempo ies (Langner et al. 2000; Langner 2002; Langner and curves (in beats per minute relative to the notated Goebl 2002). The advantage of using a loudness beat) are computed. measure instead of sound level is discussed by Timing information from audio recordings was Langner (2002, pp. 29–30). obtained by using an interactive software tool for From the loudness envelope, for each tracked automatic beat detection (Dixon 2001a, 2001b). point in time one loudness value was taken for fur- The software analyzes audio data, finding the on- ther data processing. This single loudness value for sets of as many of the musical notes as possible, each track time was taken as the maximum value and proposes a possible beat track by displaying the in a window half an inter-track interval before and beats as vertical lines over the amplitude envelope after the corresponding track time. Because the of the audio signal. The user has the opportunity to track grid could miss important loud notes in some adjust false track times. The system also provides cases that did not coincide with a track time, this audio feedback in the form of a percussion track windowing procedure accounted for that and would (playing at the track times) mixed with the original have taken the neighboring louder note as the loud- sound. After correcting some of the errors made by ness value of that particular track. This procedure the system, the remainder of the piece can be auto- is particularly important for low track rates. matically retracked, taking into account the correc- tions made by the user. When the user considers Data Reduction the track times to be correct, the track times can be saved on disk in a text format. With this tool, Both tempo and loudness data were smoothed us- timing data of audio recordings can be gathered ing overlapping Gaussian windows. We refer to the 70 Computer Music Journal Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/014892603322730514 by guest on 24 September 2021 Table 1. Window sizes used for smoothing the third dimension in the display) the trajectory of the performances of the Chopin Etude initial red dot fades out and decreases in size over time. This is meant to evoke an impression of a Performer Window Size (sec) three-dimensional virtual space in which the trajec- Pianist 09 2.486 (bar) tory moves towards the viewer. Pianist 18 2.896 (bar) The current dot of the display can show high- M. Pollini 3.212 (bar) level score information, such as the current bar ס 1.606 (quarter note half bar) number. To indicate some types of structural prop- The window sizes correspond to the mode duration of a performed erties of the score, the current dot is enlarged and bar or to the mode duration of a quarter note, respectively. The changed in color at phrase boundaries. mode duration was the most often occurring inter-onset interval Snapshots of this visualization technique as well (quantized to 10 msec; see Goebl and Dixon 2001). as the animations are implemented in the MAT- LAB environment. The animations were saved as window size as being the time from the left to the QuickTime movies using a routine freely available right point of inflection (turning point) of the Gaus- on the Internet (Slaney 1999). The frame rate was sian window (in sec) corresponding to two standard chosen to be 0.1 sec (i.e., 10 frames per sec). The deviations (i.e., 2r).