Bassline Pitch Prediction for Real-Time Performance Systems

Proceedings ICMC|SMC|2014 14-20 September 2014, Athens, Greece Bassline Pitch Prediction for Real-Time Performance Systems Andrew Robertson Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London [email protected] ABSTRACT not. Dixon and Gouyon [5] distinguish between predictive and descriptive beat tracking behaviour. The former antic- This paper presents a methodology for predicting the pitch ipates the beat ahead of time, whereas the latter labels the of bass notes by utilising their metrical position within beat having observed the audio. Real-time beat trackers the bar. Our system assumes two separate audio channels generally function in a causal predictive manner. Ellis’s for drums and bass. We make use of onset detection and [6] model uses dynamic programming to calculate the beat beat tracking algorithms to determine the onset times for times and has been used in a real-time context [7] [8] in each bass note and the beat locations. A monophonic bass which beat predictions are continually updated on the ba- track is analysed for repetitive structures relative to the beat sis of new observed information. grid, enabling the system to make a prediction of the pitch Computational modelling of composition has proved a of each bass note prior to any pitch-related analysis. We difficult task [9]. Pearce and Wiggins [10] examine how present an analysis on a small collection of studio record- Markov models or n-grams have been used to model the ings. statistical expectation of melodic parts. Assayag and Dub- nov [11] propose the use of the factor oracle, used in 1. INTRODUCTION string matching tasks, to analyse harmonic and melodic sources within music improvisation. Stark and Plumb- When humans participate in live music performance, an el- ley [12] make predictions of the values of future beat- ement of prediction is required. Where there is a musical synchronous chroma vectors by matching recent observa- score, performance requires each player to simultaneously tions in the sequence to similar occurrences in the past. predict the desired playing time for each specified musical Stowell and Plumbley [13] used a predictive schema for event and enact the requisite motor actions ahead of time real-time classification of a human percussive vocal in the to schedule each note onset. Where music is improvised, “beatbox style”, whereby a fast reaction is made to clas- the underlying structure of the music must be anticipated sify audio using a provisional classification whilst a more in order to compose phrases that will work in the context reliable delayed decision is made afterwards. This system of the performance. Whilst signal processing can provide produces audio (such as a kick or snare sound) based on information about auditory events after the fact, Collins [1] the initial classification at low latency after the onset is de- regards the ability of human musicians predict and antici- tected, but makes changes to the audio output where this pate on a variety of time-scales as crucial to their skill in initial classification appears erroneous. performing. Vercoe and Puckette [2] proposed the listen-perform-learn In this paper, we investigate a related problem of real- model as an integral design in their score following sys- time bass pitch prediction based on musical position. For tem. Raphael [3] makes use of Bayesian graphical models live performance, onset detection methods [14] [15] can re- to learn a performer’s timing tendencies from a series of liably indicate the presence of a new note event with a short rehearsals. To participate in improvised music to a steady latency. However, the determination of pitch requires the beat, such as blues and rock jams, a player is required to use of pitch tracking techniques, such as the yin algorithm predict key and chord changes ahead of time and yet this [16], which would impose a larger latency than the thresh- is unproblematic for human musicians, partly on the basis old for successful networked performance, measured to be that these tend to involve the repetition of a set sequence approximately 30 msec [17]. A predictive schema might of chords [4]. be used to overcome such latency issues. Prediction plays an important role in the development of The predicted output could be used in the synthesis of real-time beat trackers which tend to operate on the ba- audio parts or in augmenting the performance with light- sis that musical events occur more often on the beat than ing or visuals. Currently, synchronised computer accom- paniment can be unreliable, particularly when reliant on Copyright: c 2014 Andrew Robertson et al. This is beat trackers which are vulnerable to errors such as skip- an open-access article distributed under the terms of the ping ahead or behind by a beat [18]. By following the Creative Commons Attribution 3.0 Unported License, which permits unre- bassline, a beat tracker might be able to recognise when stricted use, distribution, and reproduction in any medium, provided the original this occurs and autocorrect. The proposed schema of beat- author and source are credited. synchronous bassline analysis makes it comparatively sim- - 480 - Proceedings ICMC|SMC|2014 14-20 September 2014, Athens, Greece bass room audio audio 0.4 0.3 0.2 score at lag onset 0.1 detector beat tracker 0 timestamped 0 5 10 15 20 25 30 onset events lag (beats) 0.6 pitch quantiser beat times tracker 0.5 0.4 pitched score onset events 0.3 0.2 0.1 quantiser beat times 5 10 15 20 25 30 lag (beats) pattern Figure 2. Scores of correlation of quantised pitched events analyser/ at a range of lags for the song The Radio’s Prayer I (top) quantised predictor and Idiots Dance (bottom) pitched events pitch predictions 2.1 Event Quantisation From a standard onset detector, such as those proposed by Figure 1. Overview of the algorithm. Pitched events are Bello et al. [15], we obtain event times from a bass sig- analysed whilst onset events are labelled according to pre- nal with low latency, typically under 10 msec. In practice, dictions. we compensate for latency by first detecting the onset using peak thresholding on the onset detection function, and then using a second stage that determines the precise point ple to follow the structural sections within a song, thereby for which the energy change is greatest in the signal. This bringing about an interactive system that is capable of fol- is done iteratively by dividing the recent audio buffer into lowing an expected performance. segments, choosing the segment which has the highest energy, then repeating the process until the buffer segment contains just one sample and an exact sample position is chosen. 2. METHODOLOGY We then wish to specify the musical position of the event relative to the beat grid. Let our sequence of beat times be fb0; b1; b2; b3; ::g. Then for an event at time t, we find n Our proposed system uses beat tracking to provide an ap- such that bn <= t < bn+1. The musical position of the proximation of the beat grid which can be used to trans- event, γ(t), in beats from the start is given by form event times into a quantised musical time. We assume that there is a dedicated bass stream, typically be through t − b use of a D.I. box or an insert on the mixing desk, and also γ(t) = n + n : (1) a general audio stream available suitable for beat tracking, bn+1 − bn such as from a mono mix of all channels or a room micro- phone on the drums. A selection of real-time beat tracking If we wish to quantise this position to be on an eighth note, algorithms have been developed that might be suitable for triplet, sixteenth or sixteenth note triplet relative to the beat such a task, including B-Keeper [19], a specialised drum times, we can divide the beat into twelve and instead of the m tracking system, IBT∼ [20], a multi-agent system based on fractional term, find the value m for which n + 12 is clos- Dixon’s Beatroot [21], and btrack∼ [7] and beatseeker∼ est to γ(t) and meets the condition that 3 divides m or 4 [8] which use autocorrelation. The proposed algorithm de- divides m. For pitch detection, we use the yin algorithm tects new bass events using an onset detector and finds the [16] on the segment of audio that immediately follows the musical position relative to the beat grid. By estimating detected onset. Our implementation used a real-time ver- the optimal lag at which the bassline repeats, we can make sion of the pyin algorithm by Mauch and Dixon [22] with a reasonable causal prediction for the pitch using previ- a framesize of 8192 to provide sufficient resolution in the ous pitch tracking estimates. A similar system has been lower frequencies, although the probabilistic component of proposed by Stark and Plumbley [12] for the prediction of their approach was not required. This results in a frequency beat-synchronous chroma sequences. An overview of the for each note event, that we then round to the closest MIDI algorithm is shown in Figure 1. pitch, p(t). - 481 - Proceedings ICMC|SMC|2014 14-20 September 2014, Athens, Greece 2.2 Repeated Structure Detection 3. EVALUATION To detect repeated structures, we look for the optimal shift To evaluate our two methods, we chose some studio multi- at which the same bass part is played. This is particularly track recordings that allowed the loading of a room audio common in rock, blues, pop and dance music, although rel- track and a direct bass track.

Load more