@Let@Token Live Orchestral Piano, a System for Real-Time
Total Page:16
File Type:pdf, Size:1020Kb
Live Orchestral Piano, a system for real-time orchestral music generation Leopold´ Crestel Philippe Esling IRCAM - CNRS UMR 9912 IRCAM - CNRS UMR 9912 [email protected] [email protected] ABSTRACT spectral components, but can lead to a unique emerging timbre, with phenomenon such as the orchestral blend [4]. This paper introduces the first system performing auto- Given the number of different instruments in a symphonic matic orchestration from a real-time piano input. We cast orchestra, their respective range of expressiveness (tim- this problem as a case of projective orchestration, where bre, pitch and intensity), and the phenomenon of emerging the goal is to learn the underlying regularities existing be- timbre, one can foresee the extensive combinatorial com- tween piano scores and their orchestrations by well-known plexity embedded in the process of orchestral composition. composers, in order to later perform this task automati- This complexity have been a major obstacle towards the cally on novel piano inputs. To that end, we investigate a construction of a scientific basis for the study of orchestra- class of statistical inference models based on the Restricted tion and it remains an empirical discipline taught through Boltzmann Machine (RBM). We introduce an evaluation the observation of existing examples [5]. framework specific to the projective orchestral generation Among the different orchestral writing techniques, one task that provides a quantitative analysis of different mod- of them consists in first laying an harmonic and rhythmic els. We also show that the frame-level accuracy currently structure in a piano score and then adding the orchestral used by most music prediction and generation system is timbre by spreading the different voices over the various highly biased towards models that simply repeat their last instruments [5]. We refer to this operation of extending a input. As prediction and creation are two widely differ- piano draft to an orchestral score as projective orchestra- ent endeavors, we discuss other potential biases in evalu- tion [6]. The orchestral repertoire contains a large num- ating temporal generative models through prediction tasks ber of such projective orchestrations (the piano reductions and their impact on a creative system. Finally, we provide of Beethoven symphonies by Liszt or the Pictures at an an implementation of the proposed models called Live Or- exhibition, a piano piece by Moussorgsky orchestrated by chestral Piano (LOP), which allows for anyone to play the Ravel and other well-known composers). By observing an orchestra in real-time by simply playing on a MIDI key- example of projective orchestration (Figure1), we can see board. To evaluate the quality of the system, orchestrations that this process involves more than the mere allocation of generated by the different models we investigated can be notes from the piano score across the different instruments. found on a companion website 1 . It rather implies harmonic enhancements and timbre ma- nipulations to underline the already existing harmonic and 1. INTRODUCTION rhythmic structure [3]. However, the visible correlations between a piano score and its orchestrations appear as a Orchestration is the subtle art of writing musical pieces fertile framework for laying the foundations of a computa- for the orchestra, by combining the properties of various tional exploration of orchestration. instruments in order to achieve a particular sonic rendering [1,2]. As it extensively relies on spectral characteristics, Statistical inference offers a framework aimed at auto- orchestration is often referred to as the art of manipulating matically extracting a structure from observations. These instrumental timbres [3]. Timbre is defined as the property approaches hypothesize that a particular type of data is which allows listeners to distinguish two sounds produced structured by an underlying probability distribution. The objective is to learn the properties of this distribution, by arXiv:1609.01203v2 [cs.LG] 18 May 2017 at the same pitch and intensity. Hence, the sonic palette of- fered by the pitch range and intensities of each instrument observing a set of those data. If the structure of the data is is augmented by the wide range of expressive timbres pro- efficiently extracted and organized, it becomes then possi- duced through the use of different playing styles. Further- ble to generate novel examples following the learned dis- more, it has been shown that some instrumental mixtures tribution. A wide range of statistical inference models have can not be characterized by a simple summation of their been devised, among which deep learning appears as a promising field [7,8]. Deep learning techniques have been 1 https://qsdfo.github.io/LOP/ successfully applied to several musical applications and neural networks are now the state of the art in most music Copyright: c 2017 Leopold´ Crestel et al. This is information retrieval [9–11] and speech recognition [12, an open-access article distributed under the terms of the 13] tasks. Several generative systems working with sym- Creative Commons Attribution 3.0 Unported License, which permits unre- bolic information (musical scores) have also been success- stricted use, distribution, and reproduction in any medium, provided the original fully applied to automatic music composition [14–18] and author and source are credited. automatic harmonization [19]. from unseen piano scores. We investigate a class of mod- Piano els called Conditional RBM (cRBM) [24]. Conditional score models implement a dependency mechanism that seems adapted to model the influence of the piano score over Orchestration the orchestral score. In order to rank the different mod- els, we establish a novel objective and quantitative evalua- tion framework, which is is a major difficulty for creative French horns and systems. In the polyphonic music generation field, a predictive task with frame-level accuracy is commonly Orchestra used by most systems [15, 17, 25]. However, we show Trumpets score that this frame-level accuracy is highly biased and maxi- Trombones mized by models that simply repeat their last input. Hence, Tuba we introduce a novel event-level evaluation framework and benchmark the proposed models for projective orchestra- tion. Then, we discuss the qualitative aspects of both the Figure 1. Projective orchestration. A piano score is pro- models and evaluation framework to explain the results ob- jected on an orchestra. Even though a wide range of or- tained. Finally, we selected the most efficient model and chestrations exist for a given piano score, all of them will implemented it in a system called Live Orchestral Piano share strong relations with the original piano score. One (LOP). This system performs real-time projective orches- given orchestration implicitly embeds the knowledge of the tration, allowing for anyone to play with an orchestra in composer about timbre and orchestration. real-time by simply playing on a MIDI keyboard. The remainder of this paper is organized as follows. The first section introduces the state of the art in conditional Automatic orchestration can actually cover a wide range models, in which RBM, cRBM and Factored-Gated cRBM of different applications. In [20, 21], the objective is to (FGcRBM) models are detailed. Then, the projective or- find the optimal combination of orchestral sounds in or- chestration task is presented along with an evaluation frame- der to recreate any sonic target. An input of the system work based on a event-level accuracy measure. The mod- is a sound target, and the algorithm explores the different els are evaluated within this framework and compared to combination using a database of recorded acoustic instru- existing models. Then, we introduce LOP, the real-time ments. The work presented in [22] consists in modifying projective orchestration system. Finally, we provide our the style of an existing score. For instance, it can gener- conclusions and directions of future work. ate a bossa nova version of a Beethoven’s symphony. To our best knowledge, the automatic projective orchestration 2. CONDITIONAL NEURAL NETWORKS task has only been investigated in [23] using a rule-based approach to perform an analysis of the piano score. Note In this section, three statistical inference models are de- that the analysis is automatically done, but not the alloca- tailed. The RBM, cRBM and FGcRBM are presented by tion of the extracted structures to the different instruments. increasing level of complexity. Our approach is based on the hypothesis that the statisti- cal regularities existing between a corpus of piano scores 2.1 Restricted Boltzmann Machine and their corresponding orchestrations could be uncovered through statistical inference. Hence, in our context, the 2.1.1 An energy based model data is defined as the scores, formed by a series of pitches The Restricted Boltzmann Machine (RBM) is a graphi- and intensities for each instrument. The observations is a cal probabilistic model defined by stochastic visible units set of projective orchestrations performed by famous com- v = fv1; ::; vnv g and hidden units h = fh1; ::; hnh g. posers, and the probability distribution would model the The visible and hidden units are tied together by a set of set of notes played by each instrument conditionally on the weights W following the conditional probabilities corresponding piano score. n It might be surprising at first to rely solely on the sym- Xh bolic information (scores) whereas orchestration is mostly p(vi = 1jh) = σ(ai + Wijhj) (1) defined by the spectral properties of instruments, typically j=1 n not represented in the musical notation but rather conveyed Xv in the signal information (audio recording). However, we p(hj = 1jv) = σ(bj + Wijvi) (2) make the assumption that the orchestral projection performed i=1 by well-known composers effectively took into account the where σ(x) = 1 is the sigmoid function. subtleties of timbre effects. Hence, spectrally consistent 1+exp(−x) The joint probability of the visible and hidden variables orchestrations could be generated by uncovering the com- is given by posers’ knowledge about timbre embedded in these scores.