Real-Time Music Tracking Using Multiple Performances As a Reference
Total Page:16
File Type:pdf, Size:1020Kb
REAL-TIME MUSIC TRACKING USING MULTIPLE PERFORMANCES AS A REFERENCE Andreas Arzt, Gerhard Widmer Department of Computational Perception, Johannes Kepler University, Linz, Austria Austrian Research Institute for Artificial Intelligence (OFAI), Vienna, Austria [email protected] ABSTRACT e.g. [9, 19, 20]), is to start from a symbolic score represen- tation (e.g in the form of MIDI or MusicXML). Often, this In general, algorithms for real-time music tracking di- score representation is converted into a sound file using a rectly use a symbolic representation of the score, or a syn- software synthesizer. The result is a ‘machine-like’, low- thesised version thereof, as a reference for the on-line align- quality rendition of the piece, in which we know the time of ment process. In this paper we present an alternative ap- every event (e.g. note onsets). Then, a tracking algorithm proach. First, different performances of the piece in ques- is used to solve the problem of aligning the incoming live tion are collected and aligned (off-line) to the symbolic performance to this audio version of the score – thus, the score. Then, multiple instances of the on-line tracking al- problem of real-time music tracking can be treated as an gorithm (each using a different performance as a reference) on-line audio to audio alignment task. are used to follow the live performance, and their output is In this paper we follow a similar approach, but instead combined to come up with the current position in the score. of using the symbolic score directly, we propose to first As the evaluation shows, this strategy improves both the automatically align a recording of another performance of robustness and the precision, especially on pieces that are the same piece to the score. Then, we use this automati- generally hard to track (e.g. pieces with extreme, abrupt cally annotated ‘score performance’ as the new score rep- tempo changes, or orchestral pieces with a high degree of resentation for the on-line tracking process (for the related polyphony). Finally, we describe a real-world application, task of off-line performance to performance alignment see where this music tracking algorithm was used to follow a e.g. [18]). Our motivation for this is twofold. First of world-famous orchestra in a concert hall in order to show all, we expect the quality of the features to be higher than synchronised visual content (the sheet music, explanatory if they were computed from a synthesised version of the text and videos) to members of the audience. score. Also, in a performance a lot of intricacies are en- coded that are missing in the symbolic score, including 1. INTRODUCTION (local) tempo and loudness changes. In this way we im- Real-time music tracking (or, score following) algorithms, plicitly also take care of special events like trills, which which listen to a musical performance through a micro- normally are insufficiently represented in a symbolic score phone and at any time report the current position in the representation. musical score, originated in the 1980s (see [8, 24]) and As will be seen in this paper, this approach proves to still attract a lot of research [4, 6, 11, 15, 17, 21, 23]. In be promising, but the results also depend heavily on which recent years this technology has already found use in real- performance was chosen as a reference. To improve the world applications. Examples include Antescofo 1 , which robustness we further propose a multi-agent approach (in- is actively used by professional musicians to synchronise a spired by [25], where a related strategy was applied to off- performance (mostly solo instruments or small ensembles) line audio alignment), which does not depend on a single with computer realised elements, and Tonara 2 , a music performance as a reference, but takes multiple ‘score per- tracking application focusing on the amateur pianist and formances’ and aligns the live performance to all these ref- running on the iPad. erences simultaneously. The output of all agents is com- A common approach in music tracking, and also for bined to come up with the current position in the score. As the related task of off-line audio to score alignment (see will be shown in the evaluation, this extension stabilises our approach and increases the alignment accuracy. 1 repmus.ircam.fr/antescofo 2 tonara.com The paper is structured as follows. First, in Section 2 we give an overview on the data we use to evaluate our music tracker. For comparison, we then give results of the origi- c Andreas Arzt, Gerhard Widmer. nal tracking algorithm that our approach is based on in Sec- Licensed under a Creative Commons Attribution 4.0 International Li- tion 3. In Section 4 we present a tracking strategy based on cense (CC BY 4.0). Attribution: Andreas Arzt, Gerhard Widmer. “Real-time Music Tracking using Multiple Performances as a Reference”, off-line aligned performances, which shows promising but 16th International Society for Music Information Retrieval Conference, unstable results. Then, in Section 5 we propose a multi- 2015. agent strategy, which stabilises the tracking process and 357 358 Proceedings of the 16th ISMIR Conference, M´alaga, Spain, October 26-30, 2015 ID Composer Piece Name # Perf. Groundtruth CE Chopin Etude Op. 10 No. 3 (excerpt) 22 Match CB Chopin Ballade Op. 38 No. 1 (excerpt) 22 Match MS Mozart 1st Mov. of Sonatas KV279, KV280, KV281, 1 Match KV282, KV283, KV284, KV330, KV331, KV332, KV333, KV457, KV475, KV533 RP Rachmaninoff Prelude Op. 23 No. 5 3 Manual B3 Beethoven Symphony No. 3 1 Manual M4 Mahler Symphony No. 4 1 Manual Table 1. The evaluation data set. Error CE CB MZ RP B3 M4 Error CE CB MZ RP B3 M4 0.05 0.33 0.33 0.55 0.45 0.42 0.23 0.05 0.92 0.87 0.93 0.75 0.54 0.38 0.25 0.96 0.92 0.97 0.90 0.84 0.71 0.25 0.99 0.97 0.99 0.97 0.93 0.86 0.50 0.99 0.96 0.98 0.96 0.91 0.83 0.50 1 0.97 1 0.99 0.96 0.94 0.75 1 0.98 0.99 0.98 0.94 0.87 0.75 1 0.98 1 0.99 0.97 0.97 1.00 1 0.98 0.99 0.98 0.95 0.91 1.00 1 0.98 1 1 0.98 0.98 Table 2. Results for the original on-line tracking algo- Table 3. Results for the off-line alignments. The results rithm. The results are shown as proportion of correctly are shown as proportion of correctly aligned pairs of time aligned pairs of time points (note times or downbeat times, points (note times or downbeat times, respectively), for dif- respectively), for different error tolerances (in seconds). ferent error tolerances (in seconds). For instance, the first For instance, the first number in the first row means that for number in the first row means that for the Chopin Etude the Chopin Etude the alignment was performed for 33% of the alignment was performed for 92% of the notes with an the notes with an error smaller than or equal to 0.05 sec- error smaller than or equal to 0.05 seconds. onds. processed fully automatically. These will act as replace- improves the results for all test pieces. Next, we compare ments for the symbolic scores. We collected 7 additional the results of the previous chapters to each other (Section performances for each piece in the dataset. We made an 6). Finally, we describe a real-life application of our algo- exception for the excerpts of the Ballade and the Etude by rithm at a world-famous concert hall, where it was used to Chopin, as we already have 22 performances of those. We track Richard Strauss’ Alpensinfonie (see Section 7). thus reused these performances accordingly, randomly se- lected 7 additional performances for each performance in the evaluation set, and treated them in the same way as 2. DATA DESCRIPTION the other additional data (i.e. we did not use any part of To evaluate a real-time music tracking algorithm, a collec- the ground truth, everything was computed automatically tion of annotated performances is needed. Table 1 gives when they were used as a ‘score performance’). We also an overview on the data that will be used throughout the took care not to use additional performances of the same paper. It is important to note that the dataset includes two performer(s) that occur in our evaluation set. orchestral pieces (symphonies by Beethoven and Mahler), which in our experience are difficult challenges for mu- 3. STANDARD MUSIC TRACKING BASED ON A sic tracking algorithms, due to their high polyphony and SYMBOLIC SCORE REPRESENTATION complexity. The table also indicates how the ground truth was compiled. For the Chopin Ballade and Etude, and for Our approach to music tracking is based on the standard the Mozart piano sonatas we have access to accurate data dynamic time warping (DTW) algorithm. In [10] exten- about every note onset (‘matchfiles’) that was played, as sions to DTW were proposed that made it applicable for these were recorded on a computer-monitored grand piano on-line music tracking: 1) the path is computed in an in- (see [12] and [26] for more information about this data). cremental way, and 2) the complexity is reduced to being For the Prelude by Rachmaninoff as well as for the Sym- linear in the length of the input sequences.