Developing Interactive Electronic Systems for Improvised Music

Jason Alder

Advisor: Jos Herfs

ArtEZ hogeschool voor de kunsten

2012

Contents

INTRODUCTION ii

1. EVOLUTION OF ELECTRONICS IN MUSIC 1

2. IMPROVISATION 5

3. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 16

4. ARCHITECTURE 27 A. CLASSIFICATION PARADIGMS 27 B. LISTENER 33 C. ANALYZER 39 D. 59

5. CONCLUSION 69

REFERENCES 73

ii

Introduction

This paper will discuss how one can develop an interactive electronics system for improvisation, looking at how this system differs from one designed for composed music, and what elements are necessary for it to “listen, analyze, and respond” musically. There will be a look at the nature of improvisation and intelligence, and through discussions of research done in the fields of cognition during musical improvisation and of artificial intelligence, insight will be gathered as to how the interactive system must be developed so that it too maintains an improvisational nature. Previous systems that have been developed will be examined, analyzing how their design concepts can be used as a platform from which to build, as well as look at what can be changed or improved, through an analysis of various components in the system I am currently designing, made especially for non-idiomatic improvisation.

The use of electronics with acoustic instruments in music is generally the result of the goal of opening up possibilities and using a new sonic palette. There is a wealth of approaches for how the electronics get implemented, such as a fixed performance like tape-playback pieces, or the use of effects to manipulate the acoustic sound like guitar pedals, or pre-recorded/sequenced material being triggered at certain moments. A human is often controlling these electronics, whether that is the performer or another person behind a computer or other medium, but the possibility of the electronics controlling themselves brings some interesting ideas to the improvisation world. With the advances in technology and computer science, it is possible to create an interactive music system that will “interpret a live performance to affect music generated or Introduction iii modified by computers” (Winkler, 1998). Using software such as Max/MSP, the development of a real-time interactive system that “listens” and “analyzes” the playing of an improviser, and “responds” in a musical way, making its own

“choices” is closer to fact than the science-fiction imagery it may impart.

1

1. Evolution of Electronics in Music

An initial question some may have when considering improvisation with a computer is, “Why?” More specifically, “Why improvise with a computer when you could improvise with other humans?” The use of electronics in music is not an entirely new concept. The Theremin, developed in 1919, is one of the earliest electronic instruments1. Utilizing two antennae, one for frequency and the other for amplitude, it produces music through pitches created with oscillators. The instrument is played by varying the distance of one’s hands to each of the antennae. Moving the right hand towards and away from the antennae connected to the frequency changes the sounding pitch, while the other hand does the same in respect to the amplitude antennae to change the volume (Rowe,

1993). Throughout the 20th century, more and more instruments utilizing electric current were developed, for example monophonic keyboard instruments like the Sphärophone (1927), Dynaphone (1927-8), and the Ondes Martenot

(1928). These first attempts at electronic instruments were often modeled to try to provide characteristics of acoustic instruments. Polyphonic inventions such as the Givelet (1929) and Hammond Organ (1935) became more commercially successful as replacements for pipe organs, although the distinct characteristic sound of the Hammond also gave rise to those wanting to experiment with its sonic possibilities beyond the traditional manner (Manning, 2004).

As has been the case throughout the development of music, the change and development of new technology opens doors and minds to previously

1 For an explanation and demonstration of Theremin playing, see http://www.youtube.com/watch?v=cd4jvtAr8JM Chapter 1 2 unexplored musical territory. Chopin and Liszt had the virtue of inspiration “by the huge dramatic sound of a new design. The brilliance and loudness of the thicker strings was made possible by the development of the one-piece cast- iron frame around 1825” (Winkler, 1998). In late 1940s Paris, Pierre Schaeffer was making Musique Concrète using the new recording technology available by way of the phonograph and magnetic tape, and “the invention of the guitar pickup in the 1930s was central to the later development of rock and roll. So it makes sense today, as digital technology provides new sounds and performance capabilities, that old instruments are evolving and new instruments are being built to fully realize this new potential” (Winkler, 1998)

Balilla Pratella, an Italian futurist, published his Manifesto of Futurist

Musicians in 1910 calling for “the rejection of traditional musical principles and method of teaching and the substitution of free expression, to be inspired by nature in all its manifestations” and in his Technical Manifesto of Futurist Music

(1911) that should “master all expressive technical and dynamic elements of instrumentation and regard the as a sonorous universe in a state of constant mobility, integrated by an effective fusion of all its constituent parts” and their work should reflect “all forces of nature tamed by man through his continued scientific discoveries, […] the musical soul of crowds, of great industrial plants, of trains, of transatlantic liners, of armored warships, of automobiles, of airplanes” (Manning, 2004). In response, Luigi Russolo published his manifesto The Art of Noises:

“Musical sound is too limited in qualitative variety of timbre. The most complicated of reduce themselves to four or five classes of instruments differing in timbre: instruments played with the bow, plucked instruments, brass-winds, wood-winds and percussion Evolution of Electronics in Music 3

instruments… We must break out of this narrow circle of pure musical sounds and conquer the infinite variety of noise sounds.” (Russolo, 1913)

John Cage’s interest in improvisation and indeterminacy was an influence to the composers of the sixties that first began experimenting with electronic music in a live situation. Gordon Mumma’s Hornpipe (1967), “an interactive live- electronic work for solo hornist, cybersonic console, and a performance space,” used microphones to capture and analyze the performance of the solo horn player, as well as the resonance and acoustic properties of the performance space. The horn player is free to choose pitches, which in turn affects the electronics in the “cybersonic console”. The electronic processing emitting from the speakers then changes the acoustic resonance of the space, which is re- processed by the electronics, thus creating an “interactive loop” (Cope 1977).

Morton Subotnick worked with electrical engineer Donald Buchla to create the multimedia opera Ascent Into Air (1983), with “interactive computer processing of live instruments and computer-generated music, all under the control of two cellists who are part of a small ensemble of musicians on stage” (Winkler, 1998).

Subotnick later worked with Marc Coniglio to create Hungers (1986), a staged piece where electronic music and video were controlled by the musicians.

Winkler comments on the “element of magic” in live interactive music, where the “computer responds ‘invisibly’ to the performer”, and the heightened drama of observing the impact that the actions of the clearly defined roles of computer and performer have on one another. He continues by saying that “since the virtue of the computer is that it can do things human performers cannot do, it Chapter 1 4 is essential to break free from the limitations of traditional models and develop new forms that take advantage of the computer’s capabilities” (Winkler, 1998).

The role of electronics in music is that of innovation. The aural possibilities and a computer’s abilities to perform actions that humans cannot, create a world of options not previously available. Utilizing these options fulfills Russolo’s futurist vision, and using these tools for improvisation expands the potential output of an electronics system. By allowing artificial indeterminism, human constraints are dissipated and doors are opened for the potential of otherwise unimaginable results. 5

2. Improvisation

The question of how one makes a computer capable of improvising is one of the crucial elements in the task of developing an interactive improvisational system. As a computer is not self-aware, how can it make “choices” and respond in a musical manner? To address this issue, I looked to the nature of improvisation. What is it that is actually happening when one improvises? What is the improviser thinking about in order to play the “correct” notes, such that it sounds like music, as opposed to a random collection of pitches or sounds? Some may have a notion that improvisation is just a free-for-all, where the player can do anything they wish, but this is clearly not the case. If one were to listen to an accomplished pianist play a solo, as well as an accomplished classical pianist play a cadenza, they would likely make their respective improvisations sound easy, effortless, and flow in its style. But if the roles were reversed, and the jazz pianist played a Mozart cadenza and a classical pianist played a solo in a jazz standard, there would likely be a clear difference in how they sound. The music- theorist Leonard Myer defines style as:

“a replication of patterning, whether in human behavior or in the artifacts produced by human behavior, that results from a series of choices made within some set of constraints… [which] he has learned to use but does not himself create… Rather they are learned and adopted as part of the historical/cultural circumstances of individuals or group” (Myer, 1989).

There are traits and traditions particular to each style that make a piece of music sound the way it does, and be identified as being in that style. Without the proper training and knowledge of rhythmic and harmonic development and particular important traits for each style, a player cannot properly improvise within it, Chapter 2 6 which is why one would hear such a difference between the classical and jazz pianists improvising in the same pieces.

Improvisation takes elements from material and patterns of its associated musical culture. “The improviser’s choices in any given moment may be unlimited, but they are not unconstrained” (Berkowitz, 2010). Mihály

Csikszentmihályi, a psychologist specializing in the study of creativity states:

“Contrary to what one might expect from its spontaneous nature, musical improvisation depends very heavily on an implicit musical tradition, on tacit rules… It is only with reference to a thoroughly internalized body of works performed in a coherent style that improvisation can be performed by the musician and understood by the audience” (Csikszentmihályi and Rich, 1997).

These traditions and rules are the conventions that stand as a basis, a common language, for the performer to communicate to the listeners. They are the referent, defined by psychologist and improviser Jeff Pressing as “an underlying formal scheme or guiding image specific to a given piece, used by the improviser to facilitate the generation and editing of improvised behavior…”

(Pressing, 1984). The ethnomusicologist Bruno Nettl calls the referent a “model” for the improviser to “ha[ve] something given to work from- certain things that are at the base of the performance, that he uses as the ground on which he builds” (Nettl, 1974). The referents, or models, are the musical elements such as melodies, chord patterns, bass lines, motifs, etc., used as the basis to build the improvisation. They provide the structural outline and the material, but are part of the larger knowledge base necessary, which is “built into long term memory”

(Pressing, 1998). Improvisation 7

It is also necessary to have “rapid, real-time thought and action” (Berkowitz,

2010) to successfully incorporate this musical information into a unique, improvised piece of music. Pressing says:

“The improviser must effect real-time sensory and perceptual coding, optimal attention allocation, event interpretation, decision-making, prediction (of the actions of others), memory storage and recall, error correction, and movement control, and further, must integrate these processes into an optimally seamless set of musical statements that reflect both a personal perspective on musical organization and a capacity to affect listeners” (Pressing, 1998).

Through study and practice, the referents become engrained into the playing of the improviser, and the note-to-note level of playing can be recalled automatically, allowing the improviser to focus more on the higher-level musical processes, such as form, continuity, feeling, etc.

Aaron Berkowitz, in his book The Improvising Mind: Cognition and

Creativity in the Musical Moment, studies which elements of improvisation are conscious or unconscious decisions. He finds that “some conventions and rules are accessible to consciousness, while others may function without conscious awareness” (Berkowitz, 2010). These elements of memory are related directly to the learning process, as stated by psychologist Arthur Reber:

“There can be no learning without memorial capacity; if there is no memory of past events, each occurrence is, functionally, the first. Equivalently, there can be no memory of information in the absence of acquisition; if nothing has been learned, there is nothing to store” (Reber, 1993).

The learning process can be separated into two forms, implicit and explicit.

Implicit learning is defined as: Chapter 2 8

“The acquisition of knowledge about the underlying structure of a complex stimulus environment by a process which takes place natural, simply and without conscious operations… a non-conscious and automatic abstraction of the structural nature of the material arrived at from experience of instances,” whereas explicit learning is: “A more conscious operation where the individual makes and tests hypotheses in a search for structure… [;] the learner searching for information and building then testing hypotheses… [;] or, because we can communicate using language… assimilation of a rule following explicit instructions” (Ellis, 1994).

The important difference between implicit and explicit learning is the conscious effort required of explicit learning and not of implicit. It is also possible to learn implicit information during explicit learning. Berkowitz gives the example of learning a foreign language, and memorizing phrases in the new language by explicitly focusing on features of the words, phrases, sounds, and structures, but at the same time implicitly learning other attributes of language (Berkowitz,

2010).

Similarly, implicit memory is defined as “memory that does not depend on conscious recollection,” and explicit memory as “memory that involves conscious recollection” (Eysenk and Keane, 2005). The relationship between learning and memory is not necessarily direct and can change. Something learned implicitly can be consciously, and thus explicitly, analyzed, and explicit knowledge can become implicit “through practice, exposure, drills, etc.…” (Gass and Selinker,

2008).

In Berkowitz’s interviews with classical pianist Robert Levin, Levin describes his thought processes, or sometimes lack thereof, while he improvises.

While being explicitly aware of the overall musical picture as it is happening, he Improvisation 9 is not thinking on a note-by-note basis of what he is doing, or what he will do. He allows his fingers to move implicitly, the years and years of practice guiding them in the right directions. He says of the process:

“I began to realize you’re just going to have to let go of it and go wherever you go. The way jazz people do: you have this syntactical thing just the way they have their formulas, you’ve got the basics of architecturally how a cadenza works and its sectionalization, which can be abstracted from all of these cadenzas, and then you just have to accept the fact that there’s going to be some disorder… When I play, I am reacting… your fingers play a kind of, how shall I say, a potentially fateful role in all this, because if your fingers get ahead of your brain when you’re improvising, you get nonsense or you get emptiness. I never, and I mean never, say ‘I’m going to modulate to f-sharp major now,’ or ‘I’m going to use a dominant seventh now,’ or ‘I’m going to use a syncopated figure now…’ I do not for one millisecond when I’m improvising think what it is I’m going to be doing. I don’t say, ‘Oh I think it’s about time to end now…’” (Levin, 2007).

Berkowitz focuses on comparing improvising with language production.

When speaking in one’s native language, there is not a word-by-word analysis of what is going to be said. The overall direction of the statement is known, but one is not thinking word-by-word, nor about specific grammatical rules. These are implicit elements that manifest during speaking. Children, when learning to speak, are able to do so without any explicitly taught grammar, but just learn to know what sounds “right”. There is also no acute awareness of the physical aspects of speech, such as tongue, lip, and larynx position (Berkowitz, 2010).

These just fall into their learned positions in the body’s muscle memory. This lack of direct cognition during spontaneous speech production is the same as in improvising. Once one has learned and internalized the vocabulary and grammatical rules to the point where it is automatically and implicitly recalled, they can “leave nearly everything to the fingers and to chance” (Czerny, 1839). Chapter 2 10

Achieving this level of competence comes from the development of one’s

“toolbox”, or Knowledge Base. Pianist Malcolm Bilson cites one of the elements for learning to improvise is collecting the ideas for this toolbox (Bilson, 2007) from the internalization of repertoire and exercises. Once the material has been stored in the toolbox, it can be drawn upon spontaneously during improvisation, but it is through the practice and refinement of the skill of improvising that one can “link up novel combinations of actions in real-time and chang[e] chosen aspects of them” giving one “the ability to construct new, meaningful pathways in an abstract cognitive space” (Pressing, 1984). This process of refinement and vocabulary development is largely implicit, in contrast to the explicitly rote learning of chords and harmonic progressions (Berkowitz, 2010).

While Levin acknowledges that his fingers play a “fateful role” in improvising, and that there is a lack of cognition of what exactly they will do, he says also:

“I get to a big fermata, I think, ‘What am I going to do now? Oh, I’ll do that.’ So there’s a bit of that, but not the sense of doing it every two bars” (Levin, 2007).

This creates a dichotomy in the thinking process. On one hand there is no thinking and purely allowing the fingers to move, but on the other hand there is having an overall sense of direction and where the fingers need to go and

“get[ting] reasonably lucky most of the time” (Levin, 2007). Psychologist Patricia

Nardone describes this “creator-witness dichotomy” (Berkowitz, 2010) as

“…ensuring spontaneity while yielding to it…[,] being present and not present to musical processes: a divided consciousness… [,] exploring a musical terrain that is familiar and unfamiliar…” She discusses this further: Improvisation 11

“One dialectic process is that while improvising musicians are present to and within the musical process, they are also concomitantly allowing musical possibilities to emerge pre-reflectively, effortlessly, and unprompted. Conversely, while musicians are outside the improvisational process and fully observant of it, they are paradoxically directing and ensuring the process itself. A second dialectical paradox is that in improvisation there is an intention to direct and ensure spontaneous musical variations while allowing the music itself to act as a guide toward a familiar domain. A third dialectical paradox is that while being present to and within the process of musical improvisation, musicians concomitantly allow the music to guide them toward an unfamiliar terrain. Conversely, while being outside the musical process and fully observant of it, musicians paradoxically intend the music toward a terrain that is familiar to them” (Nardone, 1997).

Paul Berliner speaks of the physicality of the improvisation process on the body,

“through its motor sensory apparatus, it interprets and responds to sounds and physical impressions, subtly informing or reshaping mental concepts” (Berliner,

1994). This physicality in improvisation can also be likened to that of spontaneous speech. One needs the effortless mechanical skills of, most often, their hands to play their instrument just as a speaker needs the mechanical skills of tongue, mouth, and larynx, as well as a proficiency of the syntax of music and language to effectively communicate (Berkowitz, 2010). Czerny also speaks of the creator-witness in reference to a speaker that “does not think through each word and phrase in advance… [but] must… have the presence of mind… to adhere constantly to his plan…” (Czerny, 1836).

Once this dichotomy of creator-witness has occurred, Levin describes his thoughts once he is done improvising, “After I’m finished doing it, I… have no idea what I played” (Levin, 2005). To this Berkowitz poses the questions, “Is not some memory of what is occurring during the improvisation necessary if the performer is to make it from point a to point b? Or can this only prove to be a Chapter 2 12 hindrance?” (Berkowitz, 2010). The answer to this lies in the findings of implicit and explicit memories. The practiced and honed skill of improvising, after time, enters in the implicit memory as motoric reactions, even though the actions themselves cannot be explicitly remembered. The improviser may begin with an idea, but is then led by the movements of the fingers, allowing the music to “flow from moment to moment magically manifest[ing], without a need to know or remember where one has been or where one is going. In improvised performance, the boundaries between creator and witness, past and future, and music and musician dissolve into the musical moment” (Berkowitz, 2010).

Willem J.M. Levelt describes the processes for the generation of speech in his book Speaking as:

Conceptualization. In this process, one plans “the communicative intention by selecting the information whose expression may realize the communicative goals.” In other words, one plans the idea(s) behind the intended message in a preverbal fashion. Formulation. In this process, the conceptualized message is translated into linguistic structure (i.e., grammatical and phonological encoding of the intended message take place). This phrase is converted into a phonetic or articulatory plan, which is a motor program to be executed by the larynx, tongue, lips, etc. Articulation. This is the process of actual motor execution of the message, that is, overt speech. Self-monitoring and self-repair. By using the speech comprehension system that is also used to understand the speech of others, the speaker monitors what he or she is saying and how he or she is saying it on all levels from word choice to social context. If errors occur, the speaker must correct them (Levelt, 1989; Berkowitz, 2010).

The application of these ideas to improvisation is logical. The overall improvisation is the concept, the form, structure, and style is the formulation, playing the music is the articulation, and as the music is happening the performer is monitoring the output and making corrections. Improvisation 13

Improvisation can also, however, be likened to learning a foreign language rather than a native language. Following Levelt’s processes, one is much more conscious of what the conceptualized statement is, the formulation of the translation and ordering of the words, and the correctly articulated pronunciation. Sometimes, particularly when beginning, the monitoring and repair section is not even achievable, as one does not even know that there was a mistake. It can be that the foreign language learner may have knowledge and understanding of the rules of sentence construction, but is not able to formulate them in a manner for an effective conversation. Berkowitz analogizes this to

Levin’s descriptions of learning to improvise, and the balance between thinking too much about what he was doing, and just allowing his fingers to go. The ability to think about the referent and overall structure interfered with the fingers and the note-by-note implicit level of playing. Michael Paradis says that the foreign language speaker “may either use automatic processes or controlled processes, but not both at the same time… Implicit competence cannot be placed under the conscious control of explicit knowledge” (Paradis, 1994).

Finding a balance between planning and execution in speech and improvisation is thus necessary. Eysenck and Keane estimate that 70 percent of spoken language uses recurrent word combinations, and thus pre-formulation is one tool for finding this balance (Eysenck and Keane, 2005). From a musical perspective, this is akin to combining elements from the “toolbox,” allowing for more attention to be paid to the referent.

Improvisation occurs constantly in everyday life. For example, it could also be analogous to the decision to drive to the store. There must be a general plan; Chapter 2 14 one must know the way and the best route to take, but what happens in between is unknown. Encountering other cars, traffic lights, road construction, a dog running across the street, etc., can all change the originally intended plan, and the ability to immediately react and adapt to the situation is imperative. Befitting of this example, Berkowitz says:

“Improvisation cannot exist without constraints, and that live performance will always require some degree of improvisation as its events unfold. Improvisation needs to operate within a system even when the resultant music transcends that system. Moreover, no performance situation- improvised or otherwise- exists in which all variables can be entirely predetermined” (Berkowitz, 2010).

Similarly, Levin states:

“The fact of the matter is that you are who you have been in the process of being who you will be, and in nothing that you do will you suddenly- as an artist or a person- come out with something that you have never done before in any respect. There will be quite possibly individual elements in a performance that are wildly and pathbreakingly different from anything that you’ve done before, but what about the rest and what kind of persona and consistency of an artist would you have if there was no way to connect these things…?” (Levin, 2007).

The key elements learned about improvisation here are the spontaneous development and recombination of previously learned material and the lack of specific conscious decisions, yet maintaining an overall view of the direction the music is going. The musical decisions that come from spontaneous recombination are sourced from the musician’s training and study, and what patterns have been learned and have found their way into the implicit memory.

This is why classical and jazz pianists will improvise differently to the same music; they have different “toolboxes”. It can then also be said that whatever goes into the toolbox will have an effect on the output. The training that a musician receives will be represented by the music produced. This is important Improvisation 15 to consider for the development of an electronic music system; the contents of its toolbox will reflect its output. Once an understanding of the nature of improvisation has been established, the application of these principles to the computer is the next step. 16

3. Artificial Intelligence and Machine Learning

The notion of a computer “making choices” in improvisation has been mentioned here. There is an implication that to make a choice, one must be capable of some amount of intelligence, which introduces the question, “What is intelligence?” One might consider the solving of complex equations by a highly gifted mathematician, or the moves performed by a chess master, or the diagnoses of disease by a doctor, as being intelligent. However, the tasks performed by all of these humans can also be accomplished by a computer, which is typically considered as not being intelligent. As Eduardo Reck Miranda says, “the problem is that once a machine is capable of performing such types of activities, we tend to cease to consider these activities as intelligent. Intelligence will always be that unknown aspect of the human mind that has not yet been understood or simulated” (Miranda, 2000). Defining intelligence may be a contentious task, so we will look to the attributes of it. Widmer points out that

“the ability to learn is undoubtedly one of the central aspects, if not the defining criterion, of intelligence and intelligent behavior. While it is difficult to come up with a general and generally agreed definition of intelligence, it seems quite obvious that we would refuse to call something ‘intelligent’ if it cannot adapt at all to changes in its environment, i.e., if it cannot learn” (Widmer, 2000).

It is quickly recognized that as the research and technology in the field of artificial intelligence advances, bringing “musicality to computer music, no model has yet come close to the complex subtleties created by humans” (Winkler,

1998), a sentiment echoed by Widmer’s statement that although computers and software can “extract general, common performance patterns; the fine artistic Artificial Intelligence and Machine Learning 17 details are certainly beyond their reach” (Widmer, 2000). Although Miranda claims that “from a pragmatic point of view, the ultimate goal of Music and AI

[Artificial Intelligence] research is to make computers behave like skilled musicians” (Miranda, 2000), it is clear that a machine is not human, and any attempts to create an intelligent computer are merely tasks of trying to recreate processes of the brain.

So the focus becomes one of determining what these processes are, accomplished by looking at the desired end result. When creating a model, attention is paid to the original design and the details necessary to copy it. But is the goal really to create a system that is a copy of a human? One of the desirable attributes of a computer is exactly that it is not human, such as its ability to handle and process large amounts of data and perform calculations with a speed and accuracy far greater than that of a human. Dannenburg speaks of the advantages of relying on a computer’s skills and its ability to “compose complex textures that are manipulated according to musical input. For example, a dense cloud of notes might be generated using pitches or harmony implied by an improvising soloist. A dense texture is quite simple to generate by computer, but it is hard to imagine an orchestra producing a carefully sculpted texture while simultaneously listening to and arranging pitch material from a soloist”

(Dannenberg, 2000). Rowe points out that human limitation and variability was precisely an element that led to the use of electronics in music (Rowe, 1993) and

Bartók comments on the use of the mechanized pianola that “took advantage of all the possibilities offered by the absence of restraints that are an outcome of the structure of the human hand” (Bartók, 1937). Chapter 3 18

Michael Young identifies a resulting attribute of what he calls a “living” computer as being “unimagined music, its unresolved and unknown characteristics offering a genuine reason for machine-human collaboration.” If the computer is to “extend, not parody, human creative behaviour, machine music should not emulate established styles or practices, or be measured according to any associated, alleged aesthetic” (Young, 2008). It is the discovery of new ideas and material through the use of computers in music to “create new musical relationships that may exist only between humans and computers in a digital world” (Winkler, 1998) that drives the continuing research in the development of computers in music.

Looking at these factors it can be seen that a desired system may “behave in a human-like manner in some respects but in a non-human-like manner in other respects [… Exhibiting] appropriate behavior… in a manner which leads to a certain goal” (Marsden, 2000). Referring to Widner’s quote previously about intelligence, that goal is the ability to learn.

This then brings the question, “What is learning?” Russell and Norvig define it as “behaving better as a result of experience” (Russell and Norvig, 1995); while

Michalski states that it is “constructing or modifying representations of what is being experienced” (Michalski, 1986). These two definitions address different elements of learning; improvement of behavior as stated by Russell and Norvig, and acquisition of knowledge of the surroundings as stated by Michalski.

Marsden summarizes by saying that one key feature of an intelligent animal is its ability to learn spontaneously from its experiences and adapt future actions as a response to this, and that a second feature is being able to perform in unfamiliar Artificial Intelligence and Machine Learning 19 environments of which they have no previous knowledge, “tolerably well.” As such, a goal of Artificial Intelligence is the capacity to learn and apply this learning in unfamiliar situations (Marsden, 2000).

How, then, does a computer accomplish learning in its quest for intelligence? Widmer cites Michalski’s definition, “learning as the extraction of knowledge from observations or data”, as the “dominant paradigm in machine learning research”, with examples of “classification and prediction rules (Clark and Niblett, 1989, Quinlan, 1990), decision trees (Quinlan, 1986, 1993), or logic programs (Lavrac and Dzeroski, 1994)” (Widmer, 2000). Through the use of algorithms, a computer is able to assess data and make comparisons for purposes of classification. For example, from a stream of pitches an algorithm can analyze music to “look for collections of notes which form a series, or… check collections of notes to see if they form a series” (Wiggens & Smaill, 2000).

Learning is thus accomplished through observation of data, allowing the computer to classify notes as being part of a defined series, or looking for the series within the notes. Empirical predictions based on trends and probabilities can be made using generalizations based upon these observations. It is possible to analyze a stream of notes, looking at intervallic relationships, to determine the likelihood of what the next note played will be. For instance, if the software sees the ascending step-wise motion of the incoming pitches F G A, it could reasonably assume that the next note played could be a B. Coupled with some programmed information akin to the knowledge “toolbox” discussed in the previous section about improvisation, the computer could make even more robust analyzations on the basis of tonality to predict upcoming notes, thus Chapter 3 20 knowing that B-flat is also a likely possibility. As the computer continues to analyze and find trends and patterns in a piece of music, its Knowledge Base can grow and assign more accurate weights to the probabilities of certain notes. In this respect, the learning occurs corresponsive to “behaving better as a result of experience.”

Music-theorist Heinrich Schenker says that repetition is “the basis of music as art. It creates musical form, just as the association of ideas from a pattern in nature creates the other forms of art” (Schenker, 1954). For this reason, the ability to recognize patterns is an important one for computers, and a key feature for music systems. Patterns occur in music in all different levels, including “pitch, time, dynamics and timbre dimensions of notes, chords and harmony, contours and motion, tension and so on” (Rolland and Ganascia, 2000). Scale structures, melodic sequences, rhythms, and chord progressions are all based on the repetition of patterns. The cognitive processes of expectation and anticipation derive from the brain’s ability to pick out and identify patterns (Simon and

Sumner, 1968). A cadential chord progression of a V resolving to ii, for instance, is called a deceptive cadence. Typically in Western music, the chord pattern should resolve to I, and because the pattern does not go where the listener expects or anticipates that it will, they have been deceived.

Robert Rowe’s software Cypher uses the concept of anticipation to predict the performer’s playing by looking for patterns in real-time. In this sense, Cypher is learning based on Russell and Norvig’s definition, “behaving better as a result of experience”. Once Cypher detects the first half of a recognized pattern, it assumes that it will be continued, and can then respond to this information as Artificial Intelligence and Machine Learning 21 appropriate (Rowe, 1993). The recognition and extraction of patterns involves

“detecting parts of the source material that have been repeated, or approximately repeated, sufficiently to be considered prominent”. Some questions raised by Rolland and Ganascia are: “How should ‘parts’ be selected?”,

“What is ‘approximate repetition’?” “What is ‘sufficiently’?” “What algorithms can be designed and implemented?” (Rolland and Ganascia, 2000). The manner in which these questions are answered depends on the nature of the music and how the pattern information is to be used by the software.

Rowe defines two goals in pattern processing as “1) learning to recognize important sequential structures from repeated exposure to musical examples

(pattern induction), and 2) matching new input against these learned structures

(pattern matching).” Additional information can also be collected from the patterns, such as the frequency and context of occurrence, and the relationships between them. Differences such as transposition or retrograde are two such relationships that can enrich the capabilities of the pattern identifier. Other enrichment can be the ability to recognize differences with the addition or omission of notes, metric and rhythmic displacements, altered phrasing and articulation, and ornamentation (Rolland and Ganascia, 2000).

There will be an inherent bias from the system developer as to the decision of what constitutes “sufficiently” prominent material to be analyzed. Widmer addresses the fact that bias can occur in the “representation language in which the learning system can represent its hypotheses” and that one must “be very conscious of, and explicit about, any assumptions that guide his/her choice […] of representation language” (Widmer, 2000). Rowe stresses that it is “critical to Chapter 3 22 take care that the parameters of the representation preserve salient aspects of the musical flow” (Rowe, 1993), and Miranda cites, “Designers of AI systems require knowledge representation techniques that provide representational power and modularity. They must capture the knowledge needed for the system and provide a framework to assist the systems designer to easily organize this knowledge (Bench-Capon, 1990; Luger and Stubblefield, 1989).” The point here is to be mindful of how musical information is expressed to the computer. For example, in a piece of music there could exist two phrases, one a C-major scale, the other an Eb-major scale. If this were represented as note names (Fig. 1) the two phrases would be regarded as not matching. However, if they were represented as intervals (Fig. 2), counted as the number of semitones between notes (note, the ‘-‘ for the value of note1, because it requires two notes for there to be an interval, thus analysis cannot begin until the second note is played) then the phrases would be considered matches, and the computer could choose to take an action on the basis of the knowledge that there is scalar activity occurring. Another example could be in regard to rhythm. For instance, there could be a phrase played all in half-notes, and then again all in quarter-notes. If the analysis were looking solely at the lengths of the notes and phrases, the two would not match. However, if the lengths of the notes were represented as ratios compared to the previous note, in this example all would be 1:1, then there would be a match. These are merely two very simple examples of the way the representative language can impact the analysis results. It is also not to say that a phrase analysis should be based solely on one or the other pieces of information, nor that the differences should be disregarded, either. The information that the melodic line is the same intervals but transposed, and that the rhythmic pattern Artificial Intelligence and Machine Learning 23 is the same but double speed, is also important data that must be expressed and recorded as a separate point of analysis. This illustrates examples of how data can be interpreted by “abandon[ing] the note level and learn[ing] expression rules directly at the level of musical structures” (Widmer, 2000).

Phrase1 Phrase2 Phrase1 Phrase2

note1 C note1 Eb note1 - note1 -

note2 D note2 F note2 2 note2 2

note3 E note3 G note3 2 note3 2

note4 F note4 Ab note4 1 note4 1

note5 G note5 Bb note5 2 note5 2

note6 A note6 C note6 2 note6 2

note7 B note7 D note7 2 note7 2

note8 C note8 Eb note8 1 note8 1

Fig. 1 Fig. 2

For ways to describe these musical structures, we will look again to comparisons in language. Crucial to the understanding of a language is the knowledge of the grammar, which must be based on mathematical formalism to correctly assess the function of each element of a sentence (Chomsky, 1957).

Miranda uses an example of the sentence “A musician composes the music.” To put this sentence in mathematical terms, the knowledge will be represented in variables: Chapter 3 24

S = NS + VS (Sentence = Noun Sentence + Verb Sentence) A musician + composes the music

NS = A + N (Noun Sentence = Article + Noun) A + musician

VS = V + NS (Verb Sentence = Verb + Noun Sentence) composes + the music

Describing the sentence with variables allows for substitutions from a set:

A = {the, a, an}

N = {dog, computer, music, musician, coffee} V = {composes, makes, hears}

So the formula S = NS + VS could yield the sentence “The dog hears a computer”, but it could also produce “The coffee makes a dog”. These mathematical formalisms help to describe the rules of the language, but don’t prevent these sorts of nonsense errors. For that, a certain amount of semantic rules or context must also be supplied to the system, which can be explored through the use of

Artificial Neural Networks (ANN).

ANNs, or “connectionism” or “parallel distributed processing (PDP)”, are models based on biological neural networks, or broadly speaking, the way the human brain operates. The important elements of an ANN are that the neurons, or nodes, are independent and simultaneously operating; they are interconnected, feeding information between each other; and they are able to learn based on input data and adapt the weights of their interconnections

(Toivianen, 2000). The basic model of an ANN consists of a number of input and Artificial Intelligence and Machine Learning 25 output nodes that are connected to each other at different weights. As each input node receives information, it passes it to the others for more processing and outputs a result. The weights of the connections determine how much influence the data has, and these weights adjust themselves as the data is acquired and reviewed. If the processed output corresponds to the expected output from the training, the connection weight is strengthened, and conversely if it is not the expected output then the weight is weakened.

ANNs can be trained through data sets to learn what result a certain input should obtain. Using the example of the data set above, an ANN could learn correct semantics by having correct sentences “read” to it. By training on this data, for example, “The dog hears a computer”, “A musician composes the music”,

“A computer makes the music”, “A dog hears the coffee”, the network can adjust the weights of the connections between words, learning that certain words are more likely to follow others, while some will never follow others, “The coffee composes an dog”. This principle can be applied similarly in music.

Cypher uses a neural network in chord identification to determine “the central pitch of a local harmonic area” (Rowe, 1993). To broadly summarize its operations, it uses twelve input nodes, each corresponding to one pitch class regardless of octave, which activate when their pitch is played. Each input node then sends a message to the six different chord theories of which it could be a part (based on triad formations). For example, if a C is played, it sends a “+” message to the chord theories of C major, c minor, F major, f minor, Ab major, and a minor. It also sends a “-“ message to all the other chord theories. Doing this with every note received, Cypher begins to determine what the harmonic area is Chapter 3 26 based on the most prevalent chords. This information is then fed into another network to determine the key. The key theories most affected are those that could be the tonic, dominant, or subdominant of the arriving chord. So, a C major chord would send a “+” message to the key theories of C major, F major, f minor, and G major, and a “-“ message to the rest.

As the computer continues to learn through observations of the musical environment, the data can be stored into a database for retrieval. As new information comes in, the system can analyze and reference it to the database, making decisions based on the previous material. In this way, learning occurs initially through Michalski’s definition, and then by Russell and Norvig’s. The potential of what information the system extracts from its analysis is huge.

Anything that can be represented in a language understood by the computer is possible, and the task then lies within the creativity of the system designer. In addition to the note and rhythm examples already given, patterns could be found in dynamics and volume, density of sound, speed, register, timbre, etc. 27

4. Architecture

Rowe’s Cypher, consists of “two main components, the listener and the player. The listener (or analysis section) characterizes performances represented by streams of MIDI data. The player (or composition section) generates and plays music material” (Rowe, 1993). Most importantly, in regard to an improvisation system, is that Cypher listens and generates music in real- time, without triggering previously recorded or sequenced material, and without following a timeline based score as a reference.

4a. Classification Paradigms

Rowe makes a distinction in the classification of interactive systems, separating the paradigms between Score-driven and Performance-driven systems.

Score-driven systems:

“Use predetermined event collections, or stored musical fragments, to match against music arriving at the input. They are likely to organize events using the traditional categories of beat, meter, and tempo. Such categories allow the composer to preserve and employ familiar ways of thinking about temporal flow, such as specifying some events to occur on the downbeat of the next measure or at the end of every fourth bar.” As compared to Performance-driven systems which: “Do not anticipate the realization of any particular score. In other words, they do not have a stored representation of the music they expect to find at the input. Further, performance-driven programs tend not to employ traditional metric categories but often use more general parameters, involving perceptual measures such as density and regularity, to describe the temporal behavior of music coming in” (Rowe, 1993)

The importance in making this distinction is in how the software handles the incoming data regarding the live performer, and what techniques must be used Chapter 4 28 to respond. A score-driven system uses just that, a score, or some representation of a score, programmed into the software for it to follow and to which the incoming signal is matched. Just as a conductor will follow notes and rhythms as indications as to where the players are, a score-based system is programmed to also identify certain moments or characteristics to know where the player is, such as pitches, intervals, rhythms, and phrases. A score-driven system can also be leading the performance, functioning based on a clock and reacting to certain moments in accordance to what the current duration since the beginning of the piece (or section, or other defined onset) is. As these event markers are found, the score-based system is programmed to perform a function associated with certain events. For example, play x chord when the performer arrives at y note, or add delay to this phrase, or harmonize this section, etc.

In contrast, the performance-driven system does not follow a score or have any information about the specific performance pre-programmed. It does not know, for example, that in measure 54 there will be a cadence leading to a key change. These systems react based on other information it receives, specifics of which will be discussed later. Because performance-driven systems are not dependent on prior knowledge of the upcoming music, these systems are clearly better suited for an improvisational setting.

George Lewis, a jazz trombonist, began building and performing with his interactive system, Voyager, in the late seventies. He says of it:

“The computer was regarded as ‘just another musician in the band.’ Hours were spent in the tweaking stage, listening to and adjusting the real-time output of the computer, searching for a range of behavior that was compatible with human musicians. By compatible, I mean that music transmits information about its source. An improviser Architecture: Classification Paradigms 29

(anyone, really) takes the presence or absence of certain sonic activities as a guide to what is going on. When I speak of musical ‘interaction’, I mean that the interaction takes place in the manner of two improvisers that have their own ‘personalities.’ The program’s extraction of important features from my activity is not reintroduced directly, but used to condition and guide a separate process of real-time algorithmic composition.

The performer interacts with the audible results of this process, just as the program interacts with the audible results of what I am thinking about musically; neither party to the communication has final authority to force a certain outcome- no one is ‘in charge.’ I communicate with such programs only by means of my own musical behavior” (Lewis, 1994).

This approach is a guideline for which my development in an interactive system is based. The improviser and computer are independent of each other with their own voice and musical personality. They are not directly controlling, but rather interacting with and influencing each other, the same way in which a human duo improvisation would occur. This exemplifies another paradigm, that of Instrument vs. Player. In an instrumental system, the effect of the computer is that of adding to and enhancing the input signal with the intention of being an extension of it, much like many guitar effects-pedals. The result is as though the combined elements are one player and the music would be heard as a solo. In the instrumental paradigm, the performer is controlling the direction of the electronics. A player system could also behave like an instrumental system at times, but the intention is to construct an artificial player with its own musical presence, personality, and behavior. The degree to which it follows the input signal varies, and in an improvisational setting neither performer nor computer is controlling, but rather influencing each other. In this way, the result is more Chapter 4 30 like a duet (Rowe, 1993; Winkler, 1998). Voyager is an example of the Player paradigm, and is the goal of an interactive music system.

Rowe identifies three stages of an interactive system’s processing chain: sensing, where the input data is collected; processing, where the computer interprets the information it has sensed and makes decisions based on it; and response, where the system produces its own output (Rowe, 1993). From this point these stages will be referred to respectively as the Listener, Analyzer, and

Composer components.

The elements of the interactive music system described here have been designed for a monophonic wind instrument, specifically and . With that in mind, there are certain characteristics that have developed as a response to the particular needs of this instrument, as well as some that have been neglected, such as addressing the possibilities offered by a polyphonic instrument. There are some basic technical requirements that won’t be discussed in much detail, but it will be stated what they are.

First is a computer with the software Max/MSP from the company Cycling

742 with which the patch will be written. A patch is the name for a program written within Max/MSP. This is one of the most used applications for creating live electronic music. One of the beneficial features is the ability to create modular components. That is, an element designed to perform a certain task or function can be created on its own as a separate patch and incorporated into

2 Max/MSP is commercially available from www.cycling74.com. A free application developed by Miller Puckette, the author of Max/MSP, is Pure Data (PD) available from www.puredata.info. PD functions very similarly to Max/MSP, but not without some differences. Most notable of these are the availability of third party objects, some of which will be discussed here. Architecture: Classification Paradigms 31 larger patches as a subpatch. Not only does this ease in troubleshooting, by being able to verify that individual modules work on their own, but it also encourages sharing within the community of users. It is very common practice for small objects, abstractions, or patches that one has created to be made available for others to use in their own works. It can greatly reduce time consumption if an object or patch already exists that will perform the task one needs it to, without having to program it entirely oneself. Patches are also adaptable, so that if the originally conceived function doesn’t operate in the exact way needed for a different project, small modifications can be made to incorporate it correctly.

The modularity also enables one’s own work to be used in their own future projects.

The second requirement is a soundcard capable of accepting two microphone inputs, and third are two microphones, a standard dynamic or condenser mic and a second contact mic.

Fig. 3 shows an input chain utilizing the two microphones. MIC 1 is the standard microphone for capturing the sound of the instrument and MIC 2 is the contact microphone. A contact microphone is a special piezo that reacts to vibrations rather than sound waves. The contact MIC 2 in Fig. 3 acts as a gate for the signal from MIC 1. A threshold is set for MIC 2, as seen in the subpatch p vca in Fig. 4, whereby any signal below the threshold closes the gate and no signal from MIC 1 will pass. By placing the contact microphone on the instrument, it will open the gate when the vibrations of the instrument exceed the threshold, as when playing, and allow the signal from the standard MIC 1 to pass. Using this Chapter 4 32 method helps to prevent unwanted extraneous room noise from passing through the microphone, and can also be used to more accurately capture data.

Fig. 3- Input Chain

Fig. 4- p vca subpatch, developed by Jos Zwaanenburg3

3 Jos Zwaanenburg: http://web.mac.com/cmtnwt/iWeb/CMTNWT/Teachers/0D06AA24-D6CF- 11DA-9F63-000A95C1C7A6.html 33

4b. Listener

The Listener is the stage of the system that collects the data from the input signal, and it is here that the decision must be made of what the relevant data to be collected is. Cypher uses the information from pitch, velocity, duration, and onset time, represented in MIDI format. From this it makes other analytical classifications like register, speed (horizontal density), single notes versus chords (vertical density), and loudness. One of the major limitations of Cypher, as it was written in the late eighties/early nineties, is the representation of data only as MIDI. The MIDI protocol strips away other important elements such as timbre, which can also supply information about the overtone partials in a pitch, and noisiness and brightness of a sound. MIDI also principally limits the pitches to the well-tempered scale, although extra Continuous Controller information can be added to introduce pitch bends. Additionally, it doesn’t make use of the live audio signal and therefore the Composer stage can only create pitch-based music from digital synthesis and not from transformation of the original sound, more of which will be discussed later.

Technology has advanced since the development of Cypher, and computers today are much faster and hardware more sophisticated and capable of handling

DSP (Digital Signal Processing). DSP allows the analysis of an audio signal so that timbral information can be included, as well as the representation of the true pitch as hertz. Since DSP is using the live audio signal it is also possible to affect it in the Composer stage, adding transformational effects like delay, transposition and harmonization, ring modulation, distortion, etc. Chapter 4 34

Using some Max/MSP objects such as analyzer~ created by Tristan Jehan4, data can be extracted such as pitch, loudness, brightness, noisiness, Bark scale, attack, and sinusoidal peaks of the partials. Pitch is represented in both hertz and a decimalized MIDI note, which allows for either tempered or untempered use of the data. For example, MIDI note 60.25 is equal to a C that is 25 cents sharp. Two approaches to the use of the data can be taken, either noting the exact tuning of the pitch, or the tempered note regardless of tuning discrepancies, depending on the intended use. The loudness value measures the input signal volume on a scale of decibels. Brightness is a timbral measure of the spectral centroid, or the perceived brightness of the sound, whereas noisiness is a timbral measure of spectral flatness, on a scale of 0-1. 0 is more “peaky” like a pure sine wave, which oscillates with a certain number of peaks in the signal spectrum to create a frequency, whereas 1 is more “noisy” like white noise, where peaks of all frequencies are of the same power and create a flat spectrum. The Bark scale measures the loudness of certain frequency bands that are associated with hearing (Zwicker and Festl, 1990). An attack is reported whenever the loudness increases by a specified amount within a specified time, and the sinusoidal peaks of the partials report the frequencies and amplitudes of a specified number of overtone partials in the signal.

Another object similar to analyzer~ is sigmund~, created by Miller

Puckette5. It provides some of the same data, although some of it is formatted or functions differently. Pitch is available as a continuously outputted decimal MIDI

4 Tristan Jehan: http://web.media.mit.edu/~tristan/maxmsp.html

5 Miller Puckette: http://crca.ucsd.edu/~msp/software.html Architecture: Listener 35 note, but not as hertz, but sigmund~ has a parameter notes which outputs the pitch at the beginning attack of a note rather than continuously. This can be useful when dealing with an unstable pitch such as from a wind instrument, which is making constant minute fluctuations, and the desired data is that of the principle pitch. Loudness is reported, but as linear amplitude rather than as decibels. Sinusoidal components are also available, but organized differently.

Sigmund~ outputs the sinusoids in order of amplitude, whereas analyzer~ does so in order of frequency. This difference can affect which frequencies are reported, depending on how many sinusoids are asked for. For example, if three peaks are requested from each object, analyzer~ will output the lowest three partials, but sigmund~ will output the three partials with the highest amplitude.

The choice of which to use again lies in how the data will be used. Sigmund~ does not provide data for brightness, noisiness, attack, or Bark scale.

In addition to the inherent data available from analyzer~ and sigmund~, the duration of a note can be calculated by measuring the time between the onset of a note and when either the pitch changes or the volume drops to 0. Fig. 5 demonstrates receiving the data from midivelocity and upon receipt of a non- zero, starts the timer. Midivelocity sends a zero at the end of every note and is described in more detail in the discussion of the Analyzer component. When the timer receives this zero message, it stops and thus calculates the time between start and stop giving the duration of a note in milliseconds. Chapter 4 36

Fig. 5- Note Duration

A common problem of computer electronics is that of pitch detection in real-time. It is difficult for the computer to correctly analyze analog pitch, especially at fast tempi. With MIDI controllers such as keyboards, EWIs

(Electronic Wind Instruments), or electronic percussion the MIDI information can be transferred immediately and note names can be understood based on which key or combination of keys is pressed. With an analog signal, the computer must first try to interpret the pitch to determine what note it hears, which creates latency. In a fast passage it is likely that the computer will miss or misinterpret some notes. In relation to a “live” human duo improvisation, one player will surely not be able to recreate every single note that the other has played, but will understand the overall shape and idea. Young also recognizes the need for a broader analysis as it pertains to freely improvised music (Young,

2008). Since the genre is not reliant on precise harmonic relationships and rhythms, it is sometimes better to not focus on capturing every individual note, but instead to focus on phrases. Architecture: Listener 37

Max/MSP allows for recording into a buffer~, a “storage space” for the audio signal. Other objects can call upon the recording in the buffer for playback and manipulations to the signal can be made. Buffers can be of different lengths, but an initial choice must be made as to what that size will be. When the buffer has been filled, it continues recording back at the beginning, overwriting the previous contents. Making the size too small could potentially mean that previously played and relevant material is no longer accessible, so it is better to err on the large side. There is an upper limit, however, based on factors such as the computer’s available memory. Fig. 6 shows a buffer of ten minutes called improv1. When the Record to Buffer toggle is on, the signal is recorded, as shown by the waveform, and the clocker object is started. The time from clocker correlates to the current recording position in the buffer, buffertime, and this data can be used to reference specific points of the recording. If the buffer reaches the end and restarts at the beginning, clocker is reset as well.

Fig. 6- Recording Buffer

A global time component can also be used, measuring the overall time from the start of the performance. Fig. 7 demonstrates a simple way of achieving this. Chapter 4 38

The timer receives a bang from inlet1 to start counting. Inlet1 would be connected to the Global Start, which could be the opening of the patch, or another start button used to begin the patch for performance. Inlet2 receives a bang at the beginning of each event, which causes timer to output the current time in milliseconds. This timestamp can be used in the data collection as a way to identify each event.

Fig. 7- Global Time

Rhythm is of course another important element of music that should be discussed. Previous systems have devised methods of interpreting rhythms and tempi. Rowe, Winkler, and Cope each discuss techniques to gather this information in their books, to which I refer the interested reader. In the context of , however, the necessity for this exact information is less important because the style is free from constraints of a unifying tempo and meter. More important aspects are the general amount of activity within a period of time (horizontal density), the time elapsed between events (delta time), and the length of events (duration). 39

4c. Analyzer

From the Listener component the data needs to be sent for interpretation in the Analyzer. In addition to analysis, this section will also create the database for storage and retrieval. There is a multitude of ways to analyze the data depending on what parameters are needed or desired for the Composer Section.

Fig. 8 shows a patch that analyzes for pitch, pitch class, interval, register, lowest pitch, highest pitch, number of note occurrences, loudness, note duration, delta time, and horizontal density, as well as the timbral characteristics brightness and noisiness. Data for the beginning and ending of phrases, the globaltime, and buffertime are also recorded. The characteristic descriptors are sent to individual databases, a global (master) database, and a phrase database. As each new phrase is completed, it is compared against the previous phrases to determine which is the closest match.

There are four elements used for organizational purposes, an index and phrase number, and globaltime, and buffertime stamps. The index is the counter in the upper-left corner of Fig. 8, counting every single event as it occurs, received from the object r midinote, which is sending from analyzer~ in another patch. To the right is the phrasemarker subpatch shown in Fig. 9. Globaltime begins counting at the start of the performance, activated here when the Record to

Buffer toggle from Fig. 6 is clicked, and does not stop for the entire duration of the performance. Buffertime is similar, however is meant to keep a record of the onset times of events happening in relation to the current position in the buffer.

The time will be the same as globaltime until the buffer is filled and starts over, also resetting buffertime. The reason for tracking both times is precisely because Chapter 4 40 of this possibility. If, for example, the performance has elapsed the buffer length, causing it to start over, but data from the previous cycle of the buffer needs to be used, it can be referenced using the globaltime, as using buffertime could relate to new data in the buffer. However, only referencing from globaltime will not be effective if the necessity is to playback current material from the buffer. In this case the position in the buffer from buffertime is needed. Architecture: Analyzer 41

Fig. 8- Analyzer Component

Chapter 4 42

The designer can independently determine what might constitute a phrase.

Rowe uses discontinuities in characteristics as an indication, with different characteristics applying different weights in the determination of phrase boundaries. He gives the example that discontinuities in timing are weighted more heavily than those in dynamics; meaning changes of dynamics are less likely to signal a phrase boundary than changes in the timing. When the amount of change of the different features exceeds a threshold, a phrase is marked. He also notes that, by the nature of this phrase finding, the discontinuities cannot be found until they’ve already occurred (Rowe, 1993).

Saxophonist and programmer Ben Carey uses silence as an indication of phrase separation in his interactive system _derivations (Carey, 2011). When the audio signal volume drops to 0, or another determined threshold, for a user- defined length of time, a phrase marker can be introduced. Fig. 9 demonstrates a method of achieving this in Max/MSP. The patch receives the loudness signal named envelope. When the signal level drops to 0, it starts the clocker. If the elapsed time reaches the threshold of 500 milliseconds a bang is sent. This bang indicates that a phrase has been finished, but what is also useful to know is when the next phrase begins. To indicate this, the bang is stored in onebang until a non-zero allows it to output, indicating the beginning of a new phrase. The non- zero also stops clocker, which then waits for another silence to begin counting again. Architecture: Analyzer 43

Fig. 9- Phrase Marker

The note-related material is next to the right in Fig. 8, starting with those concerning pitch. The first record is the actual pitch in MIDI note-number format.

Note 57, as shown in Fig. 8, corresponds to the pitch A3. The pitch class can then be calculated, resulting in the pitch without regard to octave. It is shown in Fig. 8 as A-2 octave simply because Max does not have the capability to display the note name without the octave indication, and -2 is the lowest octave. This display is only for the benefit of the user to easily see the pitch class, and the information to be recorded is in numeric values, in this case 9 for the note A (C=0, C#=1, etc.).

The interval is calculated by subtracting the current note from the previous, resulting in the number of semitones between them, and register is calculated by dividing the pitch by 12. Subtracting by the integer 0 results in a whole-number classification of register. The lowest and highest pitch are recorded twice, both Chapter 4 44 globally and on a phrase-by-phrase basis, and using a histo keeps a record of the number of times a notes is played.

Loudness is received from analyzer~ in decibel format, whereas the midivelocity is in MIDI format. MIDI keyboards send note-on messages when a key is depressed, but also a note-off message of a 0 upon its release. Midivelocity is calculated with a note-off function so that it operates in the same manner. A note-off is sent either when the note changes, when the volume from envelope drops below a threshold (40 in Fig. 10), or when the volume increases by a specified percentage after a specified time. The drop below the threshold is a latency compensation for the fact that the envelope won’t drop to 0 immediately after the player stops and so more accurately calculates the note-off time. The percentage threshold measures the envelope level every 50 milliseconds and divides by the previous value. If the increase is above the set percentage then a note-off is reported. The principle is similar to the attack data sent by analyzer~, however in analyzer~ it is measured by an increase in decibels within a given time. The method described in Fig. 10 was developed with wind instruments in mind and accounts for small spikes during tonguing, and is found to be more accurate in reporting attacks. It allows for the note-off message not only with staccato, but also with legato tonguing. An appropriate threshold should be personalized for each player and instrument, however. Architecture: Analyzer 45

Fig. 10- Midi Velocity with Note-off

The velocity values with the note-off messages help to determine note duration, as discussed earlier with Fig. 5. The delta time between the end of one event and the beginning of the next can be calculated similarly with a timer. The horizontal density is a measure of the number of notes that occur in a space of time. Fig. 11 demonstrates calculating this by counting the number of notes in a phrase and dividing the sum by the length of the phrase in milliseconds. The multiplication by 1000 and rounding off to an integer is merely to achieve a more comparable number to assign to the phrase for classification. Chapter 4 46

Fig. 11- Horizontal Density

The individual databases collect the information from every event for each descriptor separately. They are kept in a coll database stamped with the indexing number and the phrase to which they belong. The data in Fig. 12 shows an example from the pitch database. The first numbers of each line, 10-20, indicate the indexing number, the second indicates the phrase number, and the final number is the pitch expressed as a MIDI note. Individual databases are kept for pitch, pitch class, interval, register, loudness, duration, and deltatime. Highest and lowest pitch, number of note occurrences, and horizontal density are already statistical data, based on a broader spectrum, so they do not have their own coll.

Brightness and noisiness are also kept from individual databases because their data flows continuously, rather than on a per-event basis, so it will be recorded in a different manner that will be described later. Architecture: Analyzer 47

The master coll keeps all the individual data as well timestamps from globaltime and buffertime, organized by the index. The data in Fig. 13 reads index, phrase, globaltime, buffertime, pitch, pitchclass, interval, register, loudness, note duration, and deltatime.

One can see that some of the data doesn’t make sense, such as the duration values for index 10. Fig. 13 shows a note duration of 0 and delta time of 0, yet a difference of 519 between the start times of indices 10 and 11. There are a couple factors that can contribute to misleading data, one being complications with the Listener component. Further adjustments need to be made in the input chain by tweaking levels and thresholds to more accurately capture good data and filter out mistakes.

A second contributing factor that could occur, although that doesn’t appear to be the case in this instance, is time delay issues. Although data is flowing extremely quickly in the computer, the patch still ultimately follows a series of events, which can create slight inconsistencies. As the measurements are being recorded in milliseconds, which are generally imperceptible, some amount of leeway is acceptable.

A more holistic viewpoint was discussed earlier in the section about the

Listener component in regard to the nature of improvisation, an imperfect affair anyway. While striving for accurate data is the goal, accepting the imperfections can also bring a more “human” element. The comparison was made to a “live” human duo setting, and the fact that one player will not obtain all the information provided by the other, but will understand a more general idea of the phrase. Rowe expresses that the point is not to “’reverse engineer’ human Chapter 4 48 listening but rather to capture enough musicianship.” With its phrase analysis, the Analyzer can take this approach to interpreting what it hears as well. By computing the averages of the characteristic descriptors for each phrase, a generalized description can be rendered and assigned to each one.

The phrase coll is the largest database, keeping records of not only all the characteristics held in the master coll, but also of the highest and lowest pitch,

10, 2 56; 10, 2 17386 17386 56 8 -2 4 -28.907839 0 0;

11, 2 55; 11, 2 17905 17905 55 7 -1 4 -19.907631 228 0;

12, 2 50; 12, 2 18598 18598 50 2 -5 4 -27.446226 464 0;

13, 3 57; 13, 3 22499 22499 57 9 7 4 -19.360497 3436 3342;

14, 3 61; 14, 3 22826 22826 61 1 4 5 -24.470776 3436 3342;

15, 4 61; 15, 4 24033 24033 61 1 0 5 -34.994293 930 884;

16, 4 62; 16, 4 24359 24359 62 2 1 5 -31.124811 930 884;

17, 4 56; 17, 4 24729 24729 56 8 -6 4 -27.600847 930 884;

18, 4 64; 18, 4 25102 25102 64 4 8 5 -28.859121 696 0;

19, 4 65; 19, 4 25565 25565 65 5 1 5 -32.421593 271 0;

20, 4 63; 20, 4 25893 25893 63 3 -2 5 -31.064672 420 0;

Fig. 12- Pitch Coll Database Fig. 13- Master Coll Database

horizontal density, brightness, noisiness, the global and buffer end timestamps, and the phrase match and confidence level. For each of the descriptors, apart from the timestamps and highest and lowest pitch, the means and standard deviations are calculated for the phrase and stored in the phrase coll (Fig. 14), creating what

Thomas Ciufo calls a “perceptual identity” (Ciufo, 2005). At the end of each phrase, these values are sent for comparison against the means and standard deviations of all the previous phrases. The phrase with the most matches is reported with a confidence level, the percentage of matches. This data is added to Architecture: Analyzer 49 the phrase coll as well as to its own separate matches coll to keep track of which phrases matched to which descriptors for later retrieval.

Carey explores the concept of long-term memory with his _derivations. He has incorporated the ability to save databases and load them into the system in the future. This Rehearsal Database includes all the data that _derivations gathered during a previous use of the system, as well as the saved recording from the buffer. Loading previous databases allows the system to make use of what it has learned before “with an already rich vocabulary of phrases and spectral information” (Carey, 2011).

Chapter 4 50

Fig. 14- Phrase Coll Database

Architecture: Analyzer 51

Fig. 15- Phrase Matcher

The collection of information into the individual databases helps to create a system that is learning based on Michalski’s definition, “constructing or modifying representations of what is being experienced”. The incorporation of the phrase-matching component is the starting point to also bring it in line with

Russell and Norvig’s definition, “behaving better as a result of experience”. The arrival of information into the individual colls is akin to implicit learning, and actively matching this against other memories exhibits explicit learning behavior.

The system has had, and has made notes of, previous experiences, and the phrase-matching allows it to start comparing new experiences to the old ones and make decisions based on what it has learned. For example, in Fig. 15 phrase Chapter 4 52

34 is best matched to phrase 19 with a confidence level of 25%. The Analyzer could decide to use data from the matching parameters of phrases 34 and 19

(pitch, pitchclass, and brightness) to send to the Composer. Or, it could decide to use the data from the non-matching parameters, or perhaps it decides to just use data from brightness. Phrase matching could also use weighting to allow certain descriptors to play a more dominant role in determining which phrases match.

Using the confidence level enables an additional level of matching, and the

Analyzer could choose to match data only with phrases that have a confidence level at least as high. The means and standard deviations of the input signal could also be calculated in real-time and analyzed in another instance of the phrase matcher, calculating real-time matches to previous phrases characteristics. The Analyzer could then determine, for instance, that the performer is currently playing notes with short durations, and decide to accompany by playing a phrase or phrase fragment from the buffer of predominantly long notes. The options of possibilities are limited only to the creativity and knowledge of the system developer.

The concern of bias from the developer was mentioned earlier, and it is here and with the Composer component that it can be most evident. With the

Analyzer, the bias can result from the ways the system handles decision-making, whereas with the Composer it could be from the sonic and musical aesthetic of the developer, and what types of compositional techniques are used. Widmer cautioned in the choice of representation language to avoid bias. The relevance to his heed in this case lies in the programming of the decision-making. Architecture: Analyzer 53

It is important to not create solely finite conditional statements (if x occurs, then do y) as this leads to predictable behavior, not befitting of an improvisational system. A better condition would be: “if x occurs, then do y or z or q or l or w, or…” etc., where each variable is an appropriate response to the x condition. An example in a live improvisation is that Player 1 is improvising fast notes, mainly in a lower register, but sometimes will play a long, high note.

Player 2 hears this high note as a unique musical idea that he wants to utilize, and decides on possible options to do so, such as matching the long, high note; or playing short, low notes; or harmonizing the note; or use it as a starting note to base another phrase, etc. These decisions are all implicit responses of Player 2 that will manifest themselves naturally during improvisation. An even better condition would to be replace “if x occurs” with “if x occurs a (randomly generated number) of times”, and for each then statement also have variable factors, and then to have this entire conditional if-then statement active only at some times.

By using multiple instances of this type of condition available for different actions, a toolbox is being built up. The system will respond based on its programmed knowledge, and therefore may react similarly to a previous time, but never in the exact same way. It will be predictable in that its responses make sense in the moment and sometimes will make the same decision as it had in some previous instance, but unpredictable in what the output will be. This exemplifies Levin’s quote previously stated in regard to improvisation:

“The fact of the matter is that you are who you have been in the process of being who you will be, and in nothing that you do will you suddenly- as an artist or a person- come out with something that you have never done before in any respect. There will be quite possibly Chapter 4 54

individual elements in a performance that are wildly and pathbreakingly different from anything that you’ve done before, but what about the rest and what kind of persona and consistency of an artist would you have if there was no way to connect these things…?” (Levin, 2007).

The system will have its own personality and sound, the same way that people are able to hear Miles Davis, or John Coltrane, or any number of musicians, and immediately know that it is them playing, even though they are not playing exactly anything they’ve ever played before.

How the Analyzer makes the decisions of which action to take after making an analysis, or of which if-then condition to activate, is tied also to the discussion of improvisation. Discussed earlier was the fact that improvisers are aware of larger, global-scale, explicit elements, but the fine details are just motoric, implicit, responses. An interactive system can reconstruct this condition with the use of constrained randomization.

John Cage experimented with randomness and indeterminacy in the forties and fifties, using algorithmic and random procedures as compositional tools, to select options or set musical parameters (Winkler, 1998). This is related to improvisation in that the outcome is unknown until it happens. Algorithms are not cognitive and thus cannot make creative decisions, but they can, however,

“produce non-arbitrary changes in state… manifest[ed] as a ‘decision’ when it modifies the audio environment… [I]t has the affect of intention” (Young, 2008).

Young continues to say that the unpredictable output of both performer and computer should not be achieved through “simple sonification of rules or sheer randomness. There should be a critical engagement between intended behaviours, an appraisal of potential behaviours and response to actual sonic Architecture: Analyzer 55 realisations and their unfolding history.” A certain amount of randomization occurs during improvisation, but it is still within a context. The constraint is what makes it still sound like music, as opposed to pure chaos randomness. It is very easy to generate completely random output within Max/MSP, but it is also possible to use parameters to frame the randomization, as illustrated in the several types of procedures in Fig. 16. Fig. 16c-i are part of a collection from

Karlheinz Essel6. They provide useful expansions on randomization procedures.

Fig. 16a) generates a random integer between 0 and 9.

Fig. 16b) generates a random integer between 0 and 9, within 3 integers of the previous generation.

Fig. 16c) generates an integer between 0 and 9 where the adjacent outputs are adjacent numbers. Fig. 16d) generates an integer between 0 and 9 ensuring no immediate repetitions. Fig. 16e) generates an integer between 0 and 9 with a 30% chance of repetition. Fig. 16f) generates an integer between 0 and 9 without repeats until all numbers have been generated. Fig. 16g) generates a floating-point decimal number between -10 and 9.99999. Fig. 16h) uses the drunk object and will generate any float number up to 5 decimal points between -10 and 9.99999, using a Brownian linear scale.

Fig. 16i) generates an integer between 0 and 5 using a Markov chain, a table of transitional probability.

6 Karlheinz Essl: http://www.essl.at/ Chapter 4 56

Fig. 16- Random procedures

Some of the useful applications in music can already be seen, particularly with Fig. 16c, which can generate stepwise motion, and Fig. 16f, which can generate a twelve-tone row. All of the parameter settings, or arguments, given in the descriptions of the figures represent those illustrated, but can all be changed.

The random generators are not limited to producing only numbers between 0 and 9. The arguments for each of these objects can be linked to the data collected by the Analyzer to create randomizations that have a reference to the musical performance. For example, the lowest pitch and highest pitch could be fed to the between object in Fig. 16g to generate pitches within the same range.

Rowe uses another instance of an Analyzer in Cypher that listens to the output of the Composer. He calls this the Critic. The decisions the Composer has made of what music it will produce is sent to the Critic for analysis before being sent to the sound generators, and fits to Levelt’s fourth process of speech processing, self-monitoring and self-repair. This allows the system to make modifications before actually creating the music. Rowe acknowledges that

“evaluating musical output can look like an arbitrary attempt to codify taste,” Architecture: Analyzer 57 and the capacity for the system to have “aesthetic decision making” skills is

“arbitrary”, and it needs “a set of rules [that] controls which changes will be made to a block of music material exhibiting certain combinations of attributes”

(Rowe, 1993). This is again a viable source of bias. It could be argued that including various rules helps to maintain musicality that a computer cannot inherently have, but the counter-argument can easily be made as to how this definition of musicality is written. It is again important that the reactions of the

Critic aren’t represented by strict rules, but the use of probability weights can help maintain a learning paradigm. For example, if in one phrase the live performer played loudly and the computer responded by playing quietly, the

Critic could increase the probability weight that the next time the performer plays quietly, the computer will play loudly, as in a solo/comping exchange situation. Representing this musical possibility as a strict rule would not be conducive to improvisation, but incorporating it as a possibility in the toolbox with parameters to find the probability that this action is appropriate is.

Another possible way to incorporate a critic is by analyzing the output of the Composer with the response from the performer. In a duo improvisation, each player is responding to each other, taking in what the other has played and making musical comments, described by Hodson as “a self-altering process: the musical materials improvised by each musician re-enter the system, potentially serving as input to which the other performers may respond” (Hodson, 2007).

By analyzing how the live performer reacts to the computer, the system can learn about its own composing as well, and what “works” or not. Decisions can be made based on whether the performer is cooperating or trying to take the Chapter 4 58 music in a different direction. In this way, the critique is based on the performance and interaction of the moment, rather than codified rules. 59

4d. Composer

“Improvisation defies clear definition. Even though most musicians have difficulty explaining what it is, many can tell you the basic way that they approach it. Unlike jazz, which often deals with the improvisatory rules in a kind of gamelike exchange of modes and melodies, electronic music often lacks the qualities of rhythm, harmony, and melody that many jazz musicians rely on. Instead, electronic music improvisation is sound: the shape of the envelope; timbre; rhythm; layers or filtering; effects (echo, delay, ring modulation, etc.); amplitude; and duration. A seasoned improviser learns how to listen to many layers of sound activity as part of a performance” (Holmes, 2002).

Thom Holmes’ quote gives important insight for the approach to developing the Composer component of an electronic improvising system. Not only is it applicable to electronic improvisation, but also to the genre of free improvisation as a whole. Previous systems like Robert Rowe’s Cypher or George

Lewis’ Voyager created MIDI-based improvisations, which are focused on the note and rhythm paradigm. With the DSP capabilities of today, the musical realm for electronics is expanded exponentially. While pitch and rhythm are certainly still appropriate musical considerations, the world of sound design, with the ability to sculpt, manipulate, and synthesize, has become an equally viable option.

There are three types of compositional methods available to a computer: sequencing, transformation, and generation (Rowe, 1993). Sequenced music is predetermined in some way, traditionally as a MIDI sequence, but can also be prerecorded audio that is triggered to play back. Algorithms that produce a fixed response, such as those that do not use indeterminate variables, are also considered sequenced. Transformation takes the original material and changes it in some way to produce variations. This can range from obvious transformations, like adding a trill to a note or passing the signal through effects like a ring Chapter 4 60 modulator, to more intricate variations like creating a retrograde inversion or playing the signal backwards, to a complex re-synthesis of the entire sound spectrum. Generative composition uses algorithms with very little source material to produce music on its own. It could make use of information like a scale set from which to choose pitches, but the lines produced are unique choices from within the scale. Sound design techniques like additive or vector synthesis are also generative composition. Within the context of improvisation, transformative and generative composition are the most useful techniques and will be the ones addressed here.

The options for the capabilities of the Composer are limitless. It is in the development of this component, the building of the toolbox, that the designer’s creativity can unleash. Some of the transformational techniques that Cypher is capable of include:

Accelerator- shortens the durations between events.

Accenter- puts dynamic accents on some of the events in the event block. Arpeggiator- unpacks chord events into collections of single-note events, where each of the new events contains one note from the original chord. Backward- takes all the events in the incoming block and reverses their order.

Basser- plays the root of the leading chord identification theory, providing a simple bass line against the music being analyzed. Chorder- will make a four-note chord from every event in the input block. Decelerator- lengthens the duration between events. Flattener- flattens out the rhythmic presentation of the input events, setting all offsets to 250ms and all durations to 200ms. Glisser- adds short glissandi to the beginning of each event in the input block. Architecture: Composer 61

Gracer- appends a series of quick notes leading up to each event in the input block. Every event that comes in will have 3 new notes added before it.

Harmonizer- modifies the pitch content of the incoming event block to be consonant with the harmonic activity currently in the input. Inverter- takes the events in the input block and moves them to pitches that are equidistant from some point of symmetry, on the opposite side of that point from where they started. All input events are inverted around the point of symmetry.

Looper- the loop module will repeat the events in the input block, taken as a whole. Louder- adds crescendo to the events in the input block.

Obbligato- adds an obbligato line high in the pitch range to accompany harmonically whatever activity is happening below it.

Ornamenter- adds small, rapid figures encircling each event in the input block.

Phrase- temporally separates groups of events in the input block. Quieter- adds decrescendo to the events in the input block. Sawer- adds four pitches to each input event, in a kind of sawtooth pattern. Solo- is the first step in the development of a fourth kind of algorithmic style, lying between the transformative and purely generative techniques.

Stretcher- affects the duration of events in the input block, stretching them beyond their original length. Swinger- modifies the offset time of events in the input block. The state variable swing is multiplied with the offset if every other event; a value of swing equaling two will produce the 2:1 swing feel in originally equally spaced events.

Thinner- reduces the density of events in the input block. TightenUp- aligns events in the input block with the beat boundary. Transposer- changes the pitch level of all the events in the input block by some constant amount. Tremolizer- adds three new events to each event in the input block. New events have a constant offset of 100ms, surrounding the pitch with either two new above and one below, or two new below and one above. Chapter 4 62

Triller- adds four new events to each event in the input block as a trill either above or below the original pitch. (Rowe, 1993). These transformations are rather easy to accomplish within the MIDI domain, but many can also be applied in DSP. Of Rowe’s transformational techniques, the ones that are easily accomplished in direct relation to a phrase can be put into three categories: time-domain, pitch-domain, and volume-domain. Those in the time-domain include: accelerator, decelerator, looper, phrase, and stretcher; pitch-domain include: chorder, harmonizer, inverter, and transposer; and volume- domain are: louder, and quieter. Backwards is also an easy time transformation, but functions differently than Rowe’s. Rather than a retrograde as he describes, it is possible to play backwards like spinning a vinyl LP record backwards. A retrograde is also possible, but a more complicated task that will be discussed later.

Time-stretching is possible using objects such as the supervp~ (Super Phase

Vocoder) collection7 and grainstretch~8, allowing for speeding-up or slowing- down audio in the buffer without changing the pitch. These objects, as well as native objects like groove~, can also be used for looping, phrase-making, and backwards playback. Supervp~ and grainstretch~ are also capable of pitch- shifting for harmonizing and transposition. Other Fast Fourier Transform (FFT) objects like gizmo~ also perform pitch-shifting, and can be used for inversions.

This can be easily accomplished by using the same process as to create a MIDI inversion, shown in Fig. 17. This patch functions just as Rowe describes, inverted around middle C, or MIDI note 60. In this example a G (MIDI note 79) is played,

7 SuperVP is available from IRCAM: http://anasynth.ircam.fr/home/english/software/supervp

8 Grainstretch~ was written by Timo Rozendal: http://www.timorozendal.nl/?p=456 Architecture: Composer 63 nineteen semi-tones above middle C, which is then inverted to an F (MIDI note

41), nineteen semi-tones below. The pitches are converted to their frequencies in hertz, and the inverted pitch is divided by the original to find the transposition factor. This value is sent to gizmo~ (inside the pfft~ patcher) to transpose the incoming signal from the performer, producing an inverted accompaniment. The crescendo and decrescendo volume transformations are as easy as increasing or decreasing the amplitude over the length of the phrase playback.

Fig. 17- FFT Inversion

The other transformations Rowe uses, such as the retrograde, require adjustments to individual events within a phrase. The transformations can be applied similarly, but either data from the individual colls needs to be accessed to determine where the events occur within the buffer, or other techniques need to be used to manipulate the individual notes. Chapter 4 64

The examples of the objects above in the time and pitch domains can also be used in much more creative ways using DSP. The supervp~ objects has many options for cross-synthesizing one signal with another for vocoding and filtering applications, and grainstretcher~’s granular transformations can create a wealth of possibilities. The sinusoidal data from sigmund~ can also be used in a transformational manner with a generative aspect as well. Fig. 18 demonstrates a simple synthesizer that uses oscillators to generate sine-waves using the frequencies and amplitudes of the overtones from the input signal. Each frequency can also be transposed individually, or on a global level, and the amplitudes can be swapped to different frequencies. The drunksposition subpatch uses a random generator that can give a vibrato effect, with varying degrees of speed and width, using a transposition function. This synthesizer could be used as an effect on the input signal or using a phrase from the buffer.

Other typical effects are also transformational options of the Composer like delay, distortion, ring modulation, chorus, flanger, and envelope filters which can all easily be added to the signal chain.

Generative composition uses the completion of processes and algorithms to create music. Pre-existing material is not necessary, but the generation can be based on set parameters. Fig. 16f is an example of a generative algorithm that, when the max is set to 12, would produce the numbers for a twelve-tone serial row. Using these as MIDI pitch classes, octave displacements could be made and the notes sent to sound generators for further realization. The pitches could easily be played as MIDI output, or converted to frequencies and sent to other generators, like one of the oscillators of Fig. 18. Architecture: Composer 65

Fig. 18- Overtone Synth

Similar formalisms can be used for timing. Using Brownian motion from Fig.

16h, Essl also created a patch to generate rhythms. In Fig. 19, a sound is Chapter 4 66 produced between every 51-1000 milliseconds (entry delays, ED). The ED-value of 12 indicates that there are twelve permutations available (the row index), each assigned to a value between 51-1000. The Brown factor determines how close the output is to the previous generation, 0 creating a constant and 1 creating pure randomness. Fig. 20 combines these components to generate notes with a rhythm and articulation. The rhythm generator is enhanced with the durations, so that it creates notes that occur within a space of time from each other, but also last differing amounts of time. The pitch and durations are sent to a MIDI soundbank, an oscillator synthesizer, or both simultaneously. Arguments for these randomization modules can be taken from data from the Analyzer to make the output more relevant to the input signal. Further, the expansion of the toolbox can continue to enhance the generation from the Composer, such as by including data in regard to scales and modes. From this, the melody generator could have a more limiting set from which to compose, and formulas for rhythmic composition could create a more metered pulse.

Fig. 19 - Essl Brownian Rhythm Generator

Architecture: Composer 67

Fig. 20- Essl Brownian Pitch-Rhythm-Articulation Generator

Besides note-based synthesis, Max/MSP is also capable of soundscape creation. One simple example is Fig. 21 from Alessandro Cipriani and Maurizio

Giri’s book Electronic Music and Sound Design demonstrating a white noise generator with a frequency filter. Adjusting the parameters of the filter creates a wide spectrum of sonic variety. Other synthesis can be produced through combining and manipulating oscillators of different waveform shapes (sine, sawtooth, square, triangle), used in conjunction with envelope filters. Combining, layering, and using the output from one compositional element to affect and influence another are all methods to further create interesting results. The output from these soundscape generations can also be used for cross-synthesis transformation with the input signal or the buffer. The possibilities of sound design within Max/MSP are huge, and discussing them all is beyond the scope of Chapter 4 68 this paper. For further study, I refer the interested reader to Cipriani and Giri’s book.

Fig. 21- Cipriani/Giri- Noise Filtering

This section has discussed the design structure and architectural requirements for an improvisational system. Differences between score-driven and performance-driven paradigms, as well as instrumental and player paradigms, were described as models for the interactive system. The architecture was defined in three components, the Listener, Analyzer, and

Composer. The Listener accepts and collects the input, the Analyzer makes processes, makes decisions about, and stores the data, and the Composer produces music either sequentially, tranformationally, or generatively. The incorporation of constrained indeterminacy helps to maintain an improvisational yet musically relevant nature. 69

5. Conclusion

The focus of this paper has been on the development of an interactive electronics system for improvised music. It has considered how the use of electronics has evolved over time and its role in music. There was discussion about the nature of improvisation and brain processes relating to cognition while playing, and it was learned that improvising is an automatic response based on learned elements in one’s musical “toolbox”. The concept of learning as a basis for intelligence was then discussed, along with ways that this can be achieved artificially with a computer. After these theoretical constructs were gathered, the development of the software system itself was examined.

Implementing performance-driven, player paradigms as the best approaches for interactive improvisation, Robert Rowe’s Cypher was used as a model and point of discussion. The components of the Listener, Analyzer, and Composer of my own interactive system were analyzed with reference to what was discovered about improvisation and learning. By creating a database and referencing new knowledge to it, the computer is able to learn and make informed choices. By building a “toolbox” of musical knowledge, coupled with constrained indeterminacy, the system is able to make music in the same theoretical manner as improvising musicians.

Further developments in my own system need to include expanding on the

Composer and building more compositional tools for it to use. This can become daunting as the options and possibilities are so numerous. It is important to have a diverse toolbox for the system to work from to keep the music fresh and from becoming predictable, but it is also very easy to become trapped in a state of Chapter 5 70 trying to incorporate every little thing possible, using all sorts of different generational and transformational techniques. On the one hand, the larger the toolbox, the less prone to repetition of sonic character it will be. On the other hand, using a model of human improvisers shows that this is the reality of improvisation. Although there is a plethora of recombinations from the toolbox possible, the fact remains that there is virtually nothing an improviser will play that he hasn’t played in some way before. So a compromising balance in the system development has to be struck to account for this. Once more compositional elements have been built, I need to focus again on the Analyzer and determine the best ways for it to communicate to the Composer. I still need to develop the decision-making tools of how it will use the learned data to respond in a musical manner. Further development of the analysis itself can still be done as well. I’d like to look more into the use of probability equations and neural networking as learning tools to integrate into the system. Refinements can also be made to the input chain, finding the best settings for correct data collection and responsiveness.

I am also interested in exploring non-auditory communication within improvisation. Eye-contact and other visual cues can also be important aspects to musical communication, and might be able to be included into the system via

Jitter, the visual component of Max/MSP. There are tools capable of shape and color tracking using just the built-in web-camera of a laptop with Jitter, so the possibility of integrating visual cues is certainly there. Further research would need to be done as to the best way to do this within the framework of improvisation. I imagine the research would be in regard to what visual cues Conclusion 71 different improvisers notice from their fellow musicians, and how they interpret them. I can also see this line of development as becoming extremely complex, as subtle visual cues can also be very subjective and vary between people, so the focus of how this information would be used in an interactive improvisation would need to be defined.

My goal in developing this system is initially for my own use as a solo tool, but I would also like to expand it for use in my electro-acoustic improvisation duo with a saxophonist, and then possibly for an even larger ensemble. One way to do this would simply be to use two instances of the patch, but this is more likely to result in three separate duos performing at once, that of clarinet/electronics 1, /electronics 2, and clarinet/saxophone. The two electronics systems would not be communicating directly with each other, nor with the other player. For more coherency, it would be best for all the information to be fed to a central point somewhere in the chain, and the final result either be a full trio or quartet ensemble. The difference would be whether the electronics are designed to be two separate systems, each interacting with a live performer, but as well as with each other to create a quartet; or one electronic system responding to the live performers equally and creating a trio.

I anticipate it would take about another year to fully develop the patch in the direction I’m currently taking with it, and perhaps a little more time to really test and tweak it. Expanding it for multiple players might take another few months of developmental work, and the inclusion of video, with all the possibilities it introduces and the research needed to find the best ways to include it, could easily add another year. Once the system is done I would allow it Chapter 5 72 to be distributed to other electro-acoustic improvisers to use, pending any licensing restrictions with any third party objects or abstractions that are used.

However, I also hope that this paper has been informative enough to help guide people in building their own systems, for those so inclined. As mentioned in the paper, there will be an inherent bias imposed by the developer influencing the output, so the more people that build their own systems, the broader the repertoire on the whole becomes. 73

References

Bartók, Bela. 1976. “Mechanical Music” in Bela Bartók Essays, ed. Benjamin Suchoff. London: Faber & Faber Bench-Capon, T.J.M. 1990. Knowledge Representation: An Approach to Artificial Intelligence. London: Academic Press Berkowitz, Aaron L. 2010. The Improvising Mind: Cognition and Creativity in the Musical Moment. New York: Oxford University Press Berliner, Paul. 1994. Thinking in Jazz: The Infinite Art of Improvisation. Chicago: University of Chicago Press Bilson, Malcolm. 2007. Interview by Aaron L. Berkowitz, Ithaca, NY, August 12 Carey, Ben. 2011. Email discussions throughout 2011-2012. Ben Carey Website. Retrieved March 7 2012. http://www.bencarey.net/#25f/custom_plain

Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mouton Cipriani, Alessando and Maurizio Giri. 2009. Electronic Music and Sound Design: Theory and Practice with Max/MSP, volume 1, trans. by David Stutz, 2010 Rome: ConTempoNet s.a.s. Ciufo, Thomas. 2005. “Beginners Mind: An Environment for Sonic Improvisation” in International Computer Music Conference Proceedings

Cope, David. 1977. New Music Composition. New York: Schirmer Books. Csikszentmihályi, Mihály and Grant Jewell Rich. 1997. “Musical Improvisation: A Systems Approach,” in Creativity in Performance, ed. Keith Sawyer. Greenwich: Ablex Publishing

Czerny, Carl. 1836. A Systematic Introduction to Improvisation on the Pianoforte, Op.200, Vienna, trans. and ed. Alice L Mitchell, 1983. New York: Longman Czerny, Carl. 1839. Letters to a Young Lady on the Art of Playing the Pianoforte, from the Earliest Rudiments to the Highest Stage of Cultivation, Vienna, trans. J.A. Hamilton, 1851. New York: Firth, Pond and Co. Dannenberg, Roger. 2000. “Dynamic Programming for Interactive Systems” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. : Harwood Academic Publishers Ellis, Nick. 1994. “Implicit and Explicit Language Learning- An Overview,” in Implicit and Explicit Learning of Languages, ed. Nick Ellis. London: Academic Press Eysenk, Michael W. and Keane, Mark T. 2005. Cognitive Psychology: A Student’s Handbook, 5th edn. East Sussex: Psychology Press References 74

Gass, Susan M. and Larry Selinker. 2008. Second Language Acquisition, An Introductory Course, 3rd edn. New York: Routledge Hodson, Robert. 2007. Interaction, Improvisation, and Interplay in Jazz. New York: Routledge Holmes, Thom. 2002. Electronic and , second edition. New York: Routledge

Levelt, Willem J.T. 1989. Speaking. Cambridge: MIT Press Levin, Robert. 2005. “Lecture 8” Harvard University Course “Literature and Arts B-52: Mozart’s Piano Concertos,” Sanders Theater, Harvard University, Cambridge, MA, October 14

Levin, Robert. 2007. Interview by Aaron L. Berkowitz, Cambridge, MA, September 10 Luger, G.F. and W.A. Stubblefield. 1989. Artificial Intelligence and the Design of Expert Systems. Redwood City: Benjamin/Cummings Manning, Peter. 2004. Electronic and Computer Music. New York: Oxford University Press Marsden, Alan. 2000. “Music, Intelligence and Artificiality” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Meyer, Leonard. 1989. Style and Music: Theory, History, and Ideology. Philadelphia: University of Pennsylvania Press

Michalski, R.S. 1986. “Understanding the Nature of Learning: Issues and Research Directions” in Machine Learning: An Artificial Approach, vol. II, eds. R.S. Michalski, T. Mitchell, and J. Carbonell. Los Altos, CA: Morgan Kaufmann Miranda, Eduardo Reck. 2000. “Regarding Music, Machines, Intelligence and the Brain: An Intro to Music and AI” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Nardone, Patricia L. 1997. “The Experience of Improvisation in Music: A Phenomenological Psychological Analysis,” PhD diss., Saybrook Institute

Nettl, Bruno. 1974. “Thoughts on Improvisation: A Comparative Approach,” The Musical Quarterly 60 Paradis, Michael. 1994. “Neurolingistic Aspects of Implicit and Explicit Memory: Implications for Bilingualism and SLA,” in Implicit and Explicit Learning of Languages, ed. Nick Ellis. London: Academic Press Pratella, Balilla. 1910. “Manifesto of Futurist Musicians”. Milan: Open statement Pratell, Balilla. 1911. “Technical Manifesto of Futurist Music”. Milan: Open statement References 75

Pressing, Jeff. 1984. “Cognitive Processes in Improvisation, “ in Cognitive Processes in the Perception of Art, eds. W. Ray Crozier and Anthony J. Chapman. Amsterdam: Elsevier

Pressing, Jeff. 1998. “Psychological Constraints on Improvisational Expertise and Communication,“ in In the Course of Performance: Studies in the World of Musical Improvisation, eds. Bruno Nettl and Melinda Russel. Chicago: University of Chicago Press

Reber, Arthur. 1993. Implicit Learning and Tacit Knowledge: An Essay on the Cognitive Unconscious. New York: Oxford University Press

Rolland, Pierre-Yves and Jean-Gabriel Ganascia. 2000. “Musical Pattern Extraction and Similarity Assessment” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Rowe, Robert. 1993. Interactive Music Systems. Cambridge: MIT Press Russell, S.J. and P. Norvig. 1995. Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall Russolo, Luigi. 1913. “The Art of Noises”. Milan: Open statement to Balilla Pratella Schenker, Heinrich. 1954. Harmony, trans. Elisabeth Mann Borgese. Chicago: University of Chicago Press

Simon, H. and R.K. Sumner. 1968. “Patterns in Music” in Formal Representations of Human Judgement. New York: John Wiley Sons

Toivianen, Petri. 2000. “Symbolic AI versus Connectionism in Music Research” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers Widmer, Gerhard. 2000. “On the Potential of Machine Learning for Music Research” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Wiggens, Geraint and Alan Smaill. 2000. “Musical Knowledge: What can AI bring to the musician?” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers Winkler, Todd. 1998. Composing Interactive Music: Techniques and Ideas Using Max. Cambridge: MIT Press Young, Michael. 2008. “NN Music: Improvising with a ‘Living’ Computer” in CMMR 2007, LNCS 4969, eds. R. Kronland-Martinet, S. Ystad, and K. Jensen. Berlin Heidelberg: Springer-Verlag

Zwicker, E. and H. Fastl. 1990. Psychoacoustics, Facts and Models. Berlin: Springer Verlag