A Polyphonic Pitch Tracking Embedded System for Rapid Instrument Augmentation
Total Page:16
File Type:pdf, Size:1020Kb
A polyphonic pitch tracking embedded system for rapid instrument augmentation Rodrigo Schramm Federico Visi Department of Music Institute of Systematic Musicology Universidade Federal do Rio Universität Hamburg / Grande do Sul / Brazil Germany [email protected] [email protected] André Brasil Marcelo Johann Department of Music Institute of Informatics Universidade Federal do Rio Universidade Federal do Rio Grande do Sul / Brazil Grande do Sul / Brazil [email protected] [email protected] ABSTRACT (e.g. instrument shape, modes of interaction, aesthetic, This paper presents a system for easily augmenting poly- etc.) while others are extended. The augmentation may phonic pitched instruments. The entire system is designed focus on the interaction between the performer and the in- to run on a low-cost embedded computer, suitable for live strument, on how sound is generated, or both. performance and easy to customise for different use cases. Many hardware devices such as tiny computers, micro- The core of the system implements real-time spectrum fac- controllers, and a variety of sensors and transducers have torisation, decomposing polyphonic audio input signals into become more accessible. In addition, plenty of software music note activations. New instruments can be easily added and source libraries to process and generate sounds [12] are to the system with the help of custom spectral template freely available. All these resources create a powerful toolset dictionaries. Instrument augmentation is achieved by re- for building DMIs. placing or mixing the instrument's original sounds with a Instrument makers can manipulate audio with the aid of large variety of synthetic or sampled sounds, which follow many digital signal processing algorithms and libraries [15, the polyphonic pitch activations. 5, 7, 16, 14]. These programming tools can operate on single (or multiple) audio channels, performing tasks such as pitch shifting, pitch detection, distortion, delay, reverb, EQ, etc. Author Keywords However, detection and decomposition of polyphonic audio multi-pitch activations, polyphonic augmented instruments, in real time is still a challenging task, especially on low- embedded Linux computers cost embedded systems. This imposes some limitations to the design of new DMIs, requiring complex signal process- CCS Concepts ing techniques, and/or the use of multi-channel transducers (e.g. hexaphonic guitar pickups) placed on the body of the •Applied computing ! Sound and music computing; musical instrument for polyphonic source separation. •Hardware ! Digital signal processing; •Human- We present a method for rapid and easy replacement of centered computing ! Mixed / augmented reality; electric-acoustic sounds on augmented polyphonic instru- ments1. The proposed technique implements a machine 1. INTRODUCTION learning method based on spectrogram factorisation [18], which can detect the multi-pitch activations from the poly- Designing a Digital Music Instrument (DMI) is a very com- phonic content. It runs in real-time on low-cost embedded plex task. Many aspects must be planned to achieve a Linux computers2, with low latency, using a single mono in- successful development as revealed by a recent study [11]. put channel. This means that the technique is suitable for Morreale and McPherson analysed many of these aspects the design of new instruments that require only one trans- through an extensive survey with 70 DMI expert makers. ducer, or it can be applied to augmenting traditional in- Among the makers' answers are concerns about the func- struments with mono channel output (e.g. electric-acoustic tionality, aesthetic and craftsmanship aspects, familiarity, guitar). Moreover, the solution is portable and has low set- the simplicity of interaction, set-up time, portability, low up time. latency, modularity and ownership. Several approaches implement some kind of cross-correlation Augmented instruments can be a bridge between tradi- or template matching in order to estimate the similarity be- tionally established musical instruments and novel digital tween the input audio signal and a pre-learned pattern [19, interfaces. They can naturally address some of the above 13]. These methods might work well to estimate the acti- aspects so that original characteristics might be preserved vation of pre-learned timbre features, however, they cannot distinguish the concurrent multi-pitch content present in the audio signal. Pre-learned templates have been explored in the Music Licensed under a Creative Commons Attribution 1The term polyphonic, in this paper, means the presence of 4.0 International License (CC BY 4.0). Copyright multiple concurrent pitched note activations at same time. remains with the author(s). 2In this work we have tested our system on a Raspberry NIME’18, June 3-6, 2018, Blacksburg, Virginia, USA. Pi-3 Model B (https://www.raspberrypi.org) 120 Multi-pitch Performance mode Sound activations PLCA synthesis Single audio Polyphonic channel Time/Frequency electric-acoustic representation Dictionary instrument Training mode Template extraction Figure 1: System pipeline. Information Retrieval (MIR) field for the task of automatic 2.1 Polyphonic Pitch Activations music transcription based on spectrogram factorisation. Im- Given an audio input in the frequency domain representa- provements have been achieved in the recent years using Ω×T tion, we want to decompose the spectrum X!;t 2 R techniques based mainly on Non-Negative Matrix Factori- where ! denotes log-frequency and t time. In this model, sation (NMF) [17, 10] and Probabilistic Latent Component X!;t is approximated by a bivariate probability distribution Analysis (PLCA) [1, 2]. Additionally, these methods allow P (!; t), which is in turn decomposed as: the integration of music language models that can guide the X factorisation convergency to meaningful results. A draw- P (!; t) = P (t) P (!js; f; p)Pt(sjp)Pt(fjp)Pt(p) (1) back of these solutions is the high computational cost in- s;f;p volved to process the algorithm. Usually, automatic music transcription algorithms are designed to achieve better gen- where P (t) is the spectrogram energy. Similar to [4], the eralisation and ideally perform well with many kinds (or variable p 2 f40; :::; 108g denotes pitch in linear MIDI scale, classes) of musical instruments. In this paper, we go the f 2 f1; :::; 5g denotes the pitch deviation from 12-tone equal opposite way. Since the target is only one particular musi- temperament in 20 cent resolution. The range of the possi- cal instrument, we propose a simplification in the factorisa- ble values of f depends on the spectrogram resolution. Vari- tion process in favour of the usability of the system by the able s 2 f1; :::Sg denotes a specific part of the instrument instrument maker. body. For example, the factorisation applied to the spectro- The spectrogram factorisation in our approach is guided gram of an electric guitar with six strings would ideally use by a timbre dictionary, which is built from audio samples of S = 6. The unknown model parameters Pt(sjp), Pt(fjp), the target musical instrument. The method allows a rapid and Pt(p) are estimated through the iterative expectation- customisation and extension by adding new dictionaries. maximization (EM) algorithm [9]. The model parameters It is worth to note that this approach does not aim at are randomly initialised, and the EM algorithm iterates over creating a new audio to MIDI conversion technique. Multi- 30 iterations. Details of how the expectation and maximiza- pitch detection is still an open problem in MIR [3] research tion steps are computed can be found in [4]. and we do expect many false positiveand false negativepitch At the end of the spectrogram factorisation, Pt(p) gives activations. This is the reason why this approach imple- the multi-pitch activations (amplitudes of pitch components) ments a dedicated sound generation stage based on table- in semitone scale, while Pt(fjp) estimates the respective based Numerically Controlled Oscillators (NCO)3, which, pitch deviations, being useful to detect possible small tune in fact, explores the multi-pitch activation to create new deviations or pitch bends. Pt(sjp) gives the contribution of sounds with polyphonic content, embracing spurious pitches each instrument part s for each pitch activation p. or harmonic estimates. Figure 1 illustrates the system pipeline. Details of this system are explained in Section 2 and exper- 2.1.1 Pitch Resolution × Computational Cost iments are described in Section 3. Conclusions and future This approach uses the constant Q-transform for the fre- work are drawn in Section 4. quency representation. The log-frequency resolution (bins per octave) of the constant Q-transform can be tuned in 2. SYSTEM DESIGN order to control the computational need of the algorithm, regulating the tradeoff between low computation and pitch The core of this work is based on multi-pitch detection of the precision. This means that if there is not enough computa- polyphonic content generated by the musical instrument. In tional power, we can reduce the number of frequency bins particular, our system looks for multi-pitch activations es- per octave (e.g. one template per semitone, f 2 f1g). On timated by a spectrogram factorisation algorithm based on the other hand, if we want more precision on the fundamen- the PLCA technique. We follow the implementation pro- tal frequency estimate of each pitch detection and there is posed by Benetos and Weide [4], which uses a fixed dictio- sufficient computational power in the embedded computer, nary with spectral templates. Analogously, in our approach then we can increase the number of frequency bins (e.g. 5 the input spectrogram is factorised into pitch activations, bins per octave give us a 20 cent resolution, f 2 f1; :::; 5g). pitch deviations, and instrument parts (the system focus Another important factor that might affect the real-time on only one instrument, but we might want to distinguish performance is the total extent that variable p can span. As identical pitches originated from distinct parts of the instru- a design decision, the instrument maker can constrain an ment).