Jamming with Plunderphonics: Interactive Concatenative Synthesis of Music Jean-Julien Aucouturier Francois Pachet

JOURNAL OF NEW MUSIC RESEARCH 1 Jamming With Plunderphonics: Interactive Concatenative Synthesis Of Music Jean-Julien Aucouturier Francois Pachet high energy high energy low energy low energy tom-toms Abstract— tom-toms no tom-toms some cymbals This paper proposes to use the techniques of Concate- and cymbals native Sound Synthesis in the context of real-time Music Interaction. We describe a system that generates an audio track by concatenating audio segments extracted from pre-existing musical files. The track can be controlled in real-time by specifying high-level properties (or con- t straints) holding on metadata about the audio segments. A constraint-satisfaction mechanism, based on local search, selects audio segments that best match those constraints at any time. We describe the real-time aspects of the system, notably the asynchronous adding/removing of constraints, and report on several constraints and controllers designed Database for the system. We illustrate the system with several appli- cation examples, notably a virtual drummer able to interact with a human musician in real-time. Fig. 1. Illustration of metadata-driven concatenative synthesis: an audio track is generated as a continuous concate- Index Terms—musaicing, concatenative synthesis, inter- nation of audio samples, which are selected in a database action, real-time, constraint satisfaction according to some criteria holding on their metadata. I. INTRODUCTION Most Text-To-Speech (TTS) systems today are able to synthesize text typed in by a user in real-time, through based on metadata such as energy, presence of certain drum a grapheme-to-phoneme transcription of the sentences to sounds, etc. utter ([5]). Such systems typically rely on Concatenative We propose a constraint-satisfaction algorithm to control Sound Synthesis (CCS), a paradigm which uses a database the high-level properties of the generated audio track, such of samples, called units, and a unit selection algorithm that as its energy or its continuity, and a real-time mechanism to finds the sequence of units that best match a target sound or allow constraints to be modified at any time. Constraint sat- phrase. TTS systems are completely user-driven in the sense isfaction programming (CSP) is a paradigm for solving dif- that they only produce responses to the user input (text), ficult combinatorial problems, particularly in the finite do- without any possibility for a predetermined strategy. main. In this paradigm, problems are represented by vari- Inspired by the success of TTS, CSS is gaining more ables having a finite set of possible values, and constraints and more attention in the field of music, as recently re- represent properties that the values of variables should have viewed in [22]. While some commercial systems like Synful’s in solutions. CSP is a powerful paradigm because it lets RPM Synthesizer ([8]) also build up on the idea of purely the user state problems declaratively by describing a pri- user-driven synthesis, a number of experimental systems, ori the properties of its solutions and use general-purpose like John Oswald’s historical Plunderphonics effort ([10]) algorithms to find them. There have been numerous ap- or more recent semi-automatic systems ([7], [21]), propose plications of CSP to music, e.g. for automatic generation a generative, compositional approach where the system pro- of playlists of music titles [1], automatic harmonization [14] duces musical textures according to some prescribed target. and spatialization [13]. For our “Musaicing” system [24], Along these lines, we introduced in [24] the concept of musi- we introduced the idea of using CSP to generate audio se- cal mosaics (“Musaicing”), which reconstructs a given piece quences of sound samples, with high-level constraints hold- of music using sound samples extracted from other pieces. ing on the metadata of the samples. The work presented in While musically more interesting than user-driven synthesis this paper is a real-time, interactive extension of Musaicing. tools, such generative systems are intrinsically static, and Figure 2 shows an overview of the principal components unable to adapt to real-time user input. of the system. The concatenation engine is composed of a This paper describes a real-time interactive music sys- CSP solver component which is responsible for the contin- tem based on concatenative synthesis, which is an attempt uous solving of the constraint problem, and a player com- to find a middle point between the purely user-driven and ponent which is responsible for the rendering of the con- purely generative approaches. We propose a system able to tinuous concatenation of the successive solutions found by generate an audio track by concatenating audio segments, or the solver. This concatenation engine can be controlled in samples, which can be controlled in real-time by high-level real-time by a set of controller components, which can mod- properties holding on their metadata (possibly automati- ify the constraint problem asynchronously (following the ar- cally extracted). The typical sample used in the system is a rows labeled “1” in Figure 2), and may react to information few beats’ audio extract from a given piece of music, which received from both the solver (“2”) and the player (“3”). corresponds to a musical bar (or measure), and can there- The paper is organized as follows. Section II and III de- fore be looped while preserving a feeling of steady beat and scribe the concatenation engine. Section II presents the metric. Figure 1 illustrates such a continuous concatenation CSP solver which is at the core of the system: an object- of audio samples, which are being selected from a database oriented implementation of a local search constraint satisfaction technique, called adaptive search [4]. Section III Jean-Julien Aucouturier is assistant researcher in Sony Computer then describes the specific extension of this framework, Science Laboratory, Paris, France (Phone: (+33)144080514, Email: which we call incremental CSP, to handle real-time sequence [email protected]). building and asynchronous CSP modification. We notably Francois Pachet is researcher in Sony Computer Science examine the careful communication scheme between the Laboratory, Paris, France (Phone: (+33)144080516, Email: solver and the player components. Section IV then presents [email protected]) several possible ways to interact with the concatenation en- 2 JOURNAL OF NEW MUSIC RESEARCH Controller 1 C1 C2 C3 2 CSP 3 2 1 Controller Player 3 V1 V2 V3 V4 Fig. 2. Overview of the principal components of the system. ... ... ... ... The concatenation engine is composed of a CSP solver and value A value B value A value E a player. The CSP can be modified asynchronously (1) by value D value D value B value F value E value G value C value J several controllers, which are monitoring both the CSP (2) value H value H value N value K and the player (3) value J value K value P value U value M value M value R value W value S value P value S value Z ... ... ... ... gine, and describes a set of controller components that were designed in this purpose, notably controllers capturing in- Fig. 3. A CSP with 4 variables and 3 constraints. formation from an incoming MIDI or audio stream e.g. from a human musician interacting with the system. The final section (V) gives a number of usage examples of the system, with an emphasis on a real-time drumming machine • the cost F (Vi,C) of a given variable Vi with value Xi, able to interact with a human performer. We show that, with respect to a given constraint C, which represents contrary to more traditional mapping-based systems, the ”how badly” Xi satisfies C constraint satisfaction approach offers an effective and ele- • the cost F (Vi) of a given variable Vi with value Xi, which gant way to handle the tradeoff between the reactivity and is the weighted sum of its costs F (Vi,C) with respect the autonomy of the system, which is a core issue when to each constraint holding on Vi. Each constraint has building interesting interactive systems ([12]). a weight, which enables to balance the importance of some constraints over some others. Section III.F will II. CONSTRAINT-BASED CONCATENATIVE illustrate the importance of constraint weighting. SYNTHESIS • the global problem cost F (CSP ), which is the sum of the F (V ) for all V in the problem. A. Constraint Satisfaction i i Assigning a new value to the a variable V0 modifies the We define the selection of audio samples to build a con- costs F (V0,Ci) of all the constraints Ci holding on V0, in catenated audio sequence as a (finite-domain) constraint- turn possibly modifies all the costs F (Vi) of all the variables satisfaction problem (CSP). A sequence of samples is mod- within the scope of one of several constraints Ci, and finally eled as a sequence of M variables V1, V2, ..., VM whose values the global problem cost F (CSP ). can be taken from a finite database of N samples, called th their domain. Each variable Vi represents the i sample in The algorithm works as follows: the sequence. Figure 3 shows a possible CSP with 4 vari- • Start with a random assignment of values to variable ables, each with their finite domain, which can be different (i.e. a random sequence of samples) from one variable to the other. • Compute F (CSP ), the total cost of the sequence. The problem is to assign values to each variable so that • Repeat until F (CSP ) is below a given threshold : the resulting sequence satisfies a set of constraints defined – For each variable V , compute F (V ) by the user. Each constraint may hold on a subset of the i i – Find V the worst variable in the sequence, i.e.

Load more