Strategies for Managing Timbre and Interaction in Automatic
Total Page:16
File Type:pdf, Size:1020Kb
Strategies for Managing Timbre and Interaction in Automatic Improvisation Systems a b s t r a c t Earlier interactive improvisa- tion systems have mostly worked with note-level musical William Hsu events such as pitch, loudness and duration. Timbre is an integral component of the musi- cal language of many improvis- ers; some recent systems use timbral information in a variety of ways to enhance interactiv- ity. This article describes the imbre is an important structural element in tried to identify simple behavioral timbre-aware ARHS improvi- T mechanics that one might con- sation system, designed in free improvisation. When working with a saxophonist or other instrumentalist whose practice incorporates extended tech- sider desirable in a human im- collaboration with saxophonist John Butcher, in the context of niques and timbre variations, it is important to go beyond proviser and emulate them with recent improvisation systems note-level MIDI-like events when constructing useful descrip- some straightforward strategies. that incorporate timbral informa- tions of a performance. For example, a long tone may be held One novel aspect of my work is tion. Common practices in audio on a saxophone, with fairly stable pitch and loudness, but the the integral and dynamic part that feature extraction, performance state characterization and intensity of multiphonics is increased through embouchure timbre plays in sensing, analysis, management, response synthe- control. An experienced human improviser would perceive material generation and interaction sis and control of improvising and respond to this variation. decision-making. agents are summarized and In this paper, I will focus on systems designed to improvise Collins [9] observed that a ma- compared. with human musicians who work with information beyond chine improviser is often “tied . note-level events. I am mostly interested in what Rowe calls to specific and contrasting musical player paradigm systems [1], that is, systems intended to act as an styles and situations.” For example, artificial instrumentalist in performance, sometimes dubbed a system that tracks timbral changes over continuous gestures machine improvisation systems. Blackwell and Young [2] also may be a poor match for an improviser playing a conventional discuss the similar notion of a Live Algorithm, essentially a ma- drumkit. We are attracted to the idea of a general machine chine improviser that is largely autonomous. improvisation system that can perform with any human instru- Earlier machine improvisers typically use a pitch-to-MIDI mentalist. This generality is perhaps mostly appropriate as an converter to extract a stream of MIDI note events, which is ideal for the decision logic of the Processing stage, detailed then analyzed; the system generates MIDI output in response. below. Specific performance situations inform our design deci- The best known of these systems is probably George Lewis’s sions, especially in the information we choose to capture and Voyager [3]. With the increase in affordable computing power, how we shape the audio output. the availability of timbral analysis tools such as Jehan’s ana- lyzer~ object [4] and IRCAM’s Zsa descriptors [5], and related Related SyStemS developments in the Music Information Retrieval community, researchers have built systems that work with a variety of real- In this survey of machine improvisation systems that work with time audio features. timbral information, I will identify the goals and general oper- Since 2002, I have worked on a series of machine improvisa- ational environments for each; discussion of technical details tion systems, primarily in collaboration with saxophonist John will be postponed to subsequent sections. This is not intended Butcher [6,7]. Butcher is well known for his innovative work as an exhaustive survey; however, the projects overviewed do with saxophone timbre [8]; all my systems incorporate aspects represent a spectrum of the technologies used in such systems. of timbral analysis and management. Our primary design goals An earlier system is Ciufo’s Eighth Nerve [10], for guitar and were to build systems that will perform in a free improvisation electronics, which combines sensors and audio analysis. Pro- context and are responsive to timbral and other variations in cessing and synthesis are controlled by timbral parameters. the saxophone sound. The latest is the Adaptive Real-time His later work, Beginner’s Mind [11], accommodates a variety Hierarchical Self-monitoring (ARHS) system, which I will fo- of input sound sources and incorporates timbral analysis over cus on in this paper. In addition to Butcher, my systems have entire gestures. performed with saxophonists Evan Parker and James Fei and Collins [12] describes an improvisation simulation for hu- percussionist Sean Meehan. man guitarist and four artificial performers. His emphasis is on Butcher is mostly interested in achieving useful musical extracting event onset, pitch and other note-level information results in performance, rather than exploring aspects of from the input audio. machine learning or other technologies. To this end, I have Yee-King’s system [13] uses genetic algorithms to guide im- provisations with a saxophonist and a drummer. The improvis- ing agents use either additive or FM synthesis. Van Nort’s system [14] works with timbral and textural in- William Hsu (musician, educator), San Francisco State University, 1600 Holloway Avenue, San Francisco, CA 94132, U.S.A. E-mail: <[email protected]>. formation, and performs recognition of sonic gestures in per- See <userwww.sfsu.edu/~whsu/improvsurvey.html> for audio documentation related to formance. It has been used primarily in a trio with accordion this article. and saxophone. ©2010 ISAST LEONARDO MUSIC JOURNAL, Vol. 20, pp. 33–39, 2010 33 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/LMJ_a_00010 by guest on 29 September 2021 Fig. 1. block Diagram of the arHs system. timbral and gestural information is extracted from the real-time audio stream (sensing stage), analyzed by the Processing stage, and made available to a virtual improvising ensemble (synthesis stage). Young’s NNMusic [15] works with At this point, the Processing stage stage). The Interaction Management pitch and timbral features such as bright- takes data collected by the Sensing stage Component (IMC) computes high-level ness. It has participated in a variety of and organizes it into high-level descrip- descriptions of the performance (Pro- score-based and improvisatory pieces, tors of the performance state. There is cessing stage) and coordinates the high- with instruments such as flute, oboe and usually post-processing of extracted fea- level behavior of the ensemble. piano. tures, determination of musical struc- Casal’s Frank [16] combines Casey’s tures and patterns and the development Soundspotter MPEG7 feature recogni- and preparation of behavioral guidelines SenSing tion software [17] with genetic algo- for improvising agents. The Sensing stage extracts low-level rithms; Frank has performed with Casal Finally, the outcomes of the Processing features from the input audio. These on piano and Chapman Stick. stage are fed to subsystems that directly features usually include MIDI-like note generate audio output; this is the Synthe- events, FFT bins and various timbre mea- sis stage. Boundaries between the stages surements. ORganizatiOnal tend to blur in real systems; however, this FRamewORk is a convenient framework for discussing the sensing stage Rowe’s Interactive Music Systems [18] information flow. in the arHs system characterizes the processing chain of My earlier work [20] took a relatively MIDI-oriented interactive music systems compositional approach in emphasiz- as typically consisting of an input Sens- aRHS SyStem OveRview ing the detection and management of ing stage followed by a computational The Adaptive Real-time Hierarchical specific timbral and gestural events. In Processing stage and an output Response Self-monitoring (ARHS) system (Fig. 1) contrast, the ARHS system works with stage. Blackwell and Young [19] discuss is the latest of my machine improvisation general event types and performance de- a similar organization. systems. It extracts in real-time percep- scriptors. I was interested in constructing For the purposes of this paper, I will tually significant features of a saxophon- compact, perceptually significant charac- assign operations to subsystems in the fol- ist’s timbral and gestural variations and terizations of the performance state from lowing manner. Starting at the input to uses this information to coordinate the a small number of audio features. The a machine improvisation system, we cap- performance of an ensemble of virtual extensive literature in mood detection ture real-time audio data and subdivide improvising agents. and tracking from the Music Information the data into windows. Per-window mea- Timbral and gestural information is ex- Retrieval community mostly applied to surements and features are computed. tracted from the real-time audio stream traditional classical music or pop music These operations will be discussed under (Sensing stage) and made available to a (see, for example, Yang [21] and Lu et the Sensing stage. virtual improvising ensemble (Synthesis al. [22]); it suggests that compact perfor- 34 Hsu, Timbre and Interaction in Automatic Improvisation Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/LMJ_a_00010