Quick viewing(Text Mode)

Techniques for Automated and Interactive Note Sequence Morphing of Mainstream Electronic Music

Techniques for Automated and Interactive Note Sequence Morphing of Mainstream Electronic Music

Techniques for automated and interactive note sequence morphing of mainstream

By

René Wooller

Bachelor of Music at Queensland University of Technology, 2001

Master of Music at Queensland University of Technology, 2003

Supervisor: Associate Professor Andrew Brown, Music and Sound.

Co-supervisor: Doctor Frederic Maire, Software Engineering and Data Communications.

Submitted to:

Music and Sound Discipline, Creative Industries Faculty, Queensland University of Technology

In partial fulfillment of the requirements of the degree of:

Doctor of Philosophy

2007

© 2003-2007 René Wooller. All rights reserved. Key words

Morph, morphing, interpolation, morphology, mutation, computer music, algorithmic composition, algorithmic music, interactive music, adaptive music, adaptive audio, game music, live electronic music, sound installation, compositional, key modulation, temporal modulation, metric modulation, modulation, topology, note-level, note sequence, MIDI, medley, transition, mash-up, mix, , DJ, evolutionary art, evolutionary computing, Markov, conditional probability, generative music, transformational, jMusic, Java, Midishare, realtime, reacTIVision, morph table.

ii

Abstract

Note sequence morphing is the combination of two note sequences to create a ‘hybrid transition’, or ‘morph’. The morph is a ‘hybrid’ in the sense that it exhibits properties of both sequences. The morph is also a ‘transition’, in that it can segue between them. An automated and interactive approach allows manipulation in realtime by users who may control the relative influence of source or target and the transition length. The techniques that were developed through this research were designed particularly for popular genres of predominantly instrumental electronic music which I will refer to collectively as Mainstream Electronic Music (MEM). The research has potential for application within contexts such as computer games, multimedia, live electronic music, interactive installations and accessible music or “music therapy”. Musical themes in computer games and multimedia can morph adaptively in response to parameters in realtime. Morphing can be used by electronic music producers as an alternative to mixing in live performance. Interactive installations and accessible music devices can utilise morphing algorithms to enable expressive control over the music through simple interface components.

I have developed a software application called LEMorpheus which consists of software infrastructure for morphing and three alternative note sequence morphing algorithms: parametric morphing, probabilistic morphing and evolutionary morphing. Parametric morphing involves converting the source and target into continuous envelopes, interpolation, and converting the interpolated envelopes back into note sequences. Probabilistic morphing involves converting the source and target into probability matrices and seeding them on recent output to generate the next note. Evolutionary morphing involves iteratively mutating the source into multiple possible candidates and selecting those which are judged as more similar to the target, until the target is reached.

I formally evaluated the probabilistic morphing algorithm by extracting qualitative from participants in a live electronic music situation, benchmarked against a live, professional DJ. The probabilistic algorithm was competitive, being favoured particularly for long morphs. The evolutionary morphing algorithm was formally evaluated using an online questionnaire, benchmarked against a human composer/producer. For particular samples, the morphing algorithm was competitive and occasionally seen as innovative; however, the morphs created by the human composer typically received more positive feedback, due to coherent, large scale structural changes, as opposed to the forced continuity of the morphing software.

iii Table of Contents

1 Introduction 1 1.1 Background motivations ...... 1 1.1.1 Live MEM ...... 1 1.1.2 Computer games and multimedia ...... 2 1.1.3 Interactive installations ...... 3 1.1.4 Additional motivating contexts ...... 3 1.2 What is note sequence morphing? ...... 4 1.3 Compositional and philosophical issues related to morphing ...... 5 1.4 Details of the research goal ...... 7 1.5 Personal background ...... 8 1.6 Design of research ...... 9 1.6.1 Research design context ...... 9 1.6.2 Research through iterative software development ...... 10 1.6.3 Evaluation techniques ...... 11 1.7 Knowledge outcomes ...... 14 1.7.1 Algorithms ...... 15 1.7.2 Data gathering techniques ...... 16 1.7.3 Contextual Review ...... 16 1.7.4 Software ...... 16 1.8 Thesis structure ...... 18 2 Music and morphing ...... 21 2.1 MEM: context of choice ...... 22 2.1.1 Musicology of MEM ...... 23 2.1.2 Morphing in mainstream electronic music ...... 45 2.2 Morphing in other musical contexts ...... 50 2.2.1 Transitions ...... 51 2.2.2 Morphs ...... 53 2.2.3 Hybrids ...... 61 2.3 Summary of music and morphing ...... 64 3 Algorithmic music and morphing ...... 66 3.1 A framework for discussing algorithmic music ...... 67 3.1.1 Previous approaches ...... 67 3.1.2 Another approach ...... 70 3.1.3 Compositional approaches of algorithmic music systems ...... 71 3.1.4 Attributes of simple musical algorithms ...... 75 3.1.5 Summary of the framework ...... 78 3.2 Review of algorithmic composition ...... 78 3.2.1 Composer agents ...... 79 3.2.2 Computer Assisted Algorithmic Composition tools and toolkits ...... 84 3.2.3 Sonifications ...... 86 3.2.4 DJ agents ...... 87 3.2.5 Summary of algorithmic composition review ...... 87 3.3 Review of interactive music ...... 89 3.3.1 Meta-instruments ...... 90

iv 3.3.2 Jamming agents ...... 91 3.3.3 Adaptive music ...... 95 3.3.4 Interactive installations ...... 97 3.3.5 Summary of interactive music ...... 98 3.4 Review of note-level morphing ...... 98 3.4.1 GRIN ...... 99 3.4.2 HMSL ...... 100 3.4.3 Horner and Goldberg ...... 101 3.4.4 DMorph ...... 101 3.4.5 The Musifier ...... 102 3.4.6 Others ...... 103 3.4.7 Summary of opportunities for note-level morphing research ...... 105 4 LEMorpheus software infrastructure ...... 110 4.1 LEMorpheus overview ...... 111 4.2 High level morphing controls ...... 112 4.3 Loop editor ...... 113 4.4 Morphing parameters ...... 114 4.4.1 Parameters affecting the morph as a whole ...... 115 4.4.2 Parameters affecting each layer individually ...... 115 4.5 Note sequence morphing algorithm infrastructure ...... 118 4.5.1 Tonal Representation ...... 119 4.5.2 Rhythmic representation ...... 122 4.5.3 Extensible design ...... 122 4.6 Rendering MIDI Output ...... 123 4.7 Summary or morphing infrastructure ...... 124 5 Parametric morphing algorithm ...... 126 5.1 Overview ...... 126 5.2 Description ...... 127 5.2.1 Envelope representation ...... 127 5.2.2 Envelope combination ...... 132 5.2.3 Envelope playback ...... 132 5.3 Informal Evaluation ...... 137 5.3.1 Phase offset ...... 137 5.3.2 Inverse melodic contours ...... 138 5.3.3 Pitch and duration interpolation ...... 138 5.3.4 Music of different styles ...... 139 5.4 Extensions ...... 141 5.4.1 Self-synchronising inter-onset function ...... 141 5.4.2 Higher-level musical constructs ...... 142 5.4.3 Phase offset detection ...... 143 5.5 Summary of parametric morphing algorithm ...... 147 6 Probabilistic morphing algorithm ...... 149 6.1 Description ...... 150 6.1.1 Weighted Selection ...... 152 6.1.2 Markov Morph overview ...... 152 6.1.3 Markov Morph details ...... 155 6.2 Informal analysis ...... 164

v 6.2.1 Weighted selection ...... 164 6.2.2 Demonstrating Markov Morph through variation ...... 165 6.2.3 When Johnny comes morphing home ...... 175 6.3 Formal evaluation ...... 177 6.3.1 Focus group ...... 177 6.3.2 Focus concert ...... 182 6.4 Improvements to probabilistic morphing ...... 189 6.5 Summary of the probabilistic morphing algorithm ...... 191 7 Evolutionary morphing algorithm ...... 193 7.1 Overview ...... 194 7.2 Transforming and selecting ...... 196 7.2.1 Selection through dissimilarity measures ...... 198 7.2.2 Speeding through the scenic route...... 199 7.2.3 Putting it together ...... 202 7.2.4 Summary of transforming and selecting ...... 204 7.3 Specific compositional transformations: their process, parameters and dissimilarity measures ...... 205 7.3.1 Divide/merge ...... 205 7.3.2 Rate ...... 208 7.3.3 Phase ...... 209 7.3.4 Harmonise ...... 210 7.3.5 Scale pitch ...... 210 7.3.6 Inversion ...... 213 7.3.7 Octave ...... 214 7.3.8 Add/remove ...... 214 7.3.9 Key/scale morph ...... 215 7.4 Informal evaluation: listening and analysis ...... 216 7.4.1 Tuning TraSe parameters ...... 217 7.4.2 Morphing Backwards and Forwards ...... 225 7.4.3 Key/Scale morphing with TraSe ...... 228 7.4.4 When Johnny comes morphing home ...... 233 7.5 Automated evaluation ...... 234 7.5.1 Computation time for a single frame ...... 234 7.5.2 Number of notes and number of frames generated ...... 236 7.5.3 Future improvements to automatic testing ...... 240 7.6 Formal qualitative evaluation: web questionnaire ...... 241 7.6.1 Method: music creation ...... 242 7.6.2 Method: questionnaire ...... 244 7.6.3 Results and analysis ...... 247 7.6.4 Discussion: musical controversy ...... 255 7.6.5 Conclusion of formal qualitative evaluation ...... 256 7.7 Extensions ...... 258 7.7.1 Note thinning ...... 259 7.7.2 Note clustering ...... 259 7.7.3 Transformation chain ...... 260 7.7.4 Structural morphing ...... 261 7.7.5 Automatic adjustment of parameters ...... 261

vi 7.7.6 Optimisation ...... 262 7.8 Summary of evolutionary morphing ...... 268 8 Conclusion ...... 271 8.1 Summary ...... 271 8.2 Demonstrations of potential applications ...... 276 8.2.1 Concerts ...... 276 8.2.2 Interactive Table Installation ...... 276 8.2.3 Computer Game ...... 278 8.3 Future research ...... 280 8.4 Concluding remarks ...... 283 Appendices A Glossary of terms ...... 285 B Pseudocode of methods for combining envelopes and generating notes ...... 289 C Printed output from the Markov Morph algorithm ...... 291 D TraSe algorithm ...... 296 E Results from the online morphing questionnaire ...... 303 Bibliography ...... 307

vii List of figures

Chapter Two

Figure 1 Five different layers of loops in the Fruity Loops sequencer ...... 24 Figure 2 Diagram showing the positions of the downbeat, backbeat, upbeat, onbeat and offbeat in the 4/4 bar...... 32 Figure 3 The portions of the bar that can fulfill the various beat roles for the kick drum: primary (P), primary-complementary (PC), primary-leading (PL), secondary (S), secondary- complementary (SC) and secondary-leading (SL). An example of one possible configuration is shown by the red squares...... 36 Figure 4 Screenshot of a macroscopic track layout from the Fruity Loops sequencer...... 44 Figure 5 The CF combined with the CC and linear pitch space ...... 58

Chapter Three

Figure 1 Self-synchronising function for inter-onsets (reprinted with permission)...... 99 Figure 2 Summary of research opportunities and the four major note-level morphing systems. “Y” indicates that the opportunity was investigated to a significant extent by the system, while “P” indicates that the opportunity was investigated partially. Blank indicates that no investigation has occurred. H&G stands for “Horner and Goldberg”...... 108

Chapter Four

Figure 1 Overview of the infrastructure for morphing in LEMorpheus ...... 111 Figure 2 High-level user interface for LEMorpheus...... 112 Figure 3 Layout of the loop editor interface...... 113 Figure 4 Parts that appear on the same vertical slot in source and target will be morphed together into the same layer. They may have different MIDI channels and instruments...... 114 Figure 5 Volume cross-fade functions. Solid lines for source MIDI channel volume, dashed for target MIDI channel volume. Blue shows the linear cross-fade, red shows a change in gradient, green shows a change in offset...... 116 Figure 6 Example of the morph index being quantized into four discrete values. The dashed line is the original morph index, while the solid line is the quantized morph index...... 117 Figure 7 Example interpolation curves applied in parallel for different values of meta-data parameters: cross-over point of 1 and gradient 1 (orange), crossover point 0.5 and gradient 2 (blue), crossover point 0.5 and gradient 1 (green), exponential curve for note morph index (red)...... 118 Figure 8 Representing MIDI pitch 65 (E#) as the 3rd of C# Major...... 119 Figure 9 DePa enables accurate representation of passing notes. Counting from one, scale degrees that are odd are in the scale, while those that are even are passing note pitches. Starting with a passing note in the major scale, between the Major 2nd and the Major 3rd, it will be represented as the 4th degree of a diatonic DePa scale. In a minor scale, there is no passing note between the Major 2nd and Minor 3rd, leaving four options available to interpretation: keep the note as a passing note and raise or lower it to the closest existing

viii passing note; or turn the passing note into a scale degree which is either higher or lower (in this case, “higher” has the equivalent MIDI pitch)...... 121 Figure 10 Overview of system for rendering MIDI data...... 123

Chapter Five

Figure 1 Example envelope representation ...... 127 Figure 2 Example of an inter-onset envelope representing a crotchet (quarter-notes) followed by four quavers (eighth-notes) spanning a three beat loop. The nodes representing the latter three quavers are redundant, as the zero gradient line continues regardless...... 128 Figure 3 Pseudocode for the algorithm that converts notes into envelopes ...... 131 Figure 4 An example of weighted combination of a source (round dots) and target (solid line) envelope, to produce a morph envelope (dash-dot). The morph index is , which is why the morph envelope is always exactly half way between the source and target...... 132 Figure 5 Top: an inter-onset envelope (dotted line) with the notes (red) that were used to create it. Bottom: Generating a note from the inter-onset envelope – on the fourth play cycle, the area under the inter-onset envelope since the last note was generated (shown by the grey) will be equal to the distance from the previous note to the current position squared, thereby generating a note (shown by the red arrow). That is, combining Equation 4 and 5, ...... 135

Figure 6 Phase shift interpolation, made obvious by morphing from a pattern starting on the first beat to the same pattern starting on the second beat...... 137 Figure 7 No phase shift detected - pitch interpolation is used...... 138 Figure 8 Interpolating between inverse melodic contours creates a mostly flat ‘unmelodic’ contour...... 138 Figure 9 Pitch and duration interpolation. In the source pattern (bar one), the notes start with a long duration and become shorter, while in the target (bar 6), the notes start short and get longer. The pitches of the target are the same as the source, but transposed up a perfect fifth...... 139 Figure 10 Constraining the pitch to values that occur in source and target, favouring the more commonly occurring values...... 139 Figure 11 Morphing between a two bar loop from Freŕe Jacque to a two bar loop from Chameleon (Herbie Hancock)...... 140 Figure 12 Interpolation between different styles is harder when they are also in different keys, in this case, C Major and F# Pentatonic Minor. Source and target are two bars long, as in the previous example...... 140 Figure 13 Mathews and Rosler’s “self-synchronising function” (reprinted with permission) ...... 142 Figure 14 Diagram showing one possible example of the line that goes through the exact centre of the inter-onset function . The loop length, , is the period of the function...... 145 Figure 15 Example of note onsets generated from a note onset envelope (white squares) compared to the note onsets generated by the same envelope that has been transformed into a flat line with no variation (grey squares)...... 146 Figure 16 With the objective of phase shifting the modified note onsets in such a way that results in the original note onsets, this diagram shows all of the potential points at which a new phase offset envelope node should be considered for the example note onsets given...... 146

ix Figure 17 The final phase offset envelope should be calculated to have the least variation, as with this example...... 147

Chapter Six

Figure 1 An example of how similarity measurements between segments of the selected source or target sequence and the seed are created. In this simplified case, only monophonic pitches are being compared and the order (length of seed) is only two. It is clear that the first two note pitches are the same as the seed, and , so the similarity measure relating to the tail of the segment (index ) is . None of the other two-note segments in match the seed and so the other similarity ratings are less than . Note that the sequence wraps around – the segment to , and , is fairly close to , and , which is apparent in the relatively high similarity rating for index of ...... 154 Figure 3 Morphing, through weighted selection, between two different drum rhythms of the same length. The audible notes are in red, while the semi-transparent black and white notes underneath are provided as a reference to the source and target respectively. The first bar is the source and the last bar is the target...... 164 Figure 4 Swing can be dealt with easily, but different dynamic levels make the random nature of the weighted selection more obvious...... 165 Figure 5 Example of a variation of a two bar bassline of cheesy nineties dance (first two bars). The top piano roll contains bars one through sixteen and the bottom piano roll continues with bars seventeen to thirty-two. The contrast is decreased from bars three to seven and it remains low until it is increased again over bars twenty-four to twenty-eight...... 166 Figure 6 Transcripts of two different variations on a simple major scale run, up and down. “A” uses only the CC for the similarity measure and “B” uses only the CF. The depth for both is . For “B”, the contrast was increased so that the two could have a comparable rate of variation, as the pitches of diatonic scales are more closely related in the CF than the pitches of chromatic scales...... 168 Figure 7 Variation of a four bar loop from Take On Me by a-ha, from bar five and onwards. Due to using only modulus eight beat space as the similarity measure, the notes from bars three and four of the original loop are measured as a perfect match to the notes in bars one and two. The notes highlighted in green are the actual notes played. For reference, the notes in black are the notes of the second two bars and the notes in white are the notes of the first two bars of the loop...... 169 Figure 8 Top is a numbered transcript of the Take On Me loop that can be used as a reference for the occurrence of the original notes in the generated material. In the two panels beneath this is a variation generated from the Take On Me loop of the previous example, using modulus three and four beat space for onset similarity. The actual notes played are highlighted in green. For reference, the notes from the first two bars of the original are overlaid in black while the notes from the second two bars of the original are overlaid in white. New notes that do not appear in either are red. Numbers are written in above or below notes refer to the position in the note-list (counting from zero) of the original sequence. “L” indicates a note generated through stream loss. The blue band at the top can be used as a reference to modulus three beat space...... 170 Figure 9 Example of variation of Take On Me by a-ha where linear pitch similarity is weighted maximally, Markov order is one and contrast is maximum. To assist analysis, the number of the original note, from which the generated note is copied, is notated in red above each note. If a note is generated through weighted selection due to stream loss, an “L” is used instead. For reference, the numbers of each original note is included above...... 172 Figure 10 The Take On Me variation using pitch similarity at ; onset similarity at ; Markov order ; moderate contrast; inter-onset space of modulus , , and ; and CF pitch space.

x The cycle of is included in blue numbering along the top for reference, as is the original pattern (red notes, top). The numbers indicate the original note. The L indicates stream loss...... 174 Figure 11 Transcript of a Markov Morph between The British Grenadiers and When Johnny Comes Marching Home. The source and target occupy the first four and the last four bars respectively...... 176

Chapter Seven

Figure 1 Pseudocode overview of TraSe...... 196 Figure 2 Each of the seven note sequences that would be in a selection pool for rate (j=1), given the original (top-left)...... 197 Figure 3 Pseudo-code description of the Transform-Select function that is used in the TraSe algorithm...... 202 Figure 4 Pseudocode of the Merge-Forward algorithm ...... 205 Figure 5 A bi-directional NN comparison. The red box holds two notes (red 1 and 2) from the input sequence. The blue box holds two notes (blue 1 and 2) from the target sequence. The blue arrows show the NNs from the backward computation and the red arrows show the NNs from the forward computation. The distances between NNs are adjacent to the arrows...... 206 Figure 6 Pseudocode of Nearest Neighbour dissimilarity measure ...... 207 Figure 7 Two pitch envelopes (red and blue) and the difference in area between them (in pink). The difference in area between the onset and pitch envelopes are used as the dissimilarity measure for rate...... 208 Figure 8 Pseudocode for the algorithm to find the tonic of the central octave...... 210 Figure 9 Pseudocode for the scale pitch compositional transformation...... 211 Figure 10 Pseudocode of the inversion transformation...... 213 Figure 11 Close up of Source (A) and Target (B) ...... 217 Figure 12 TraSe morph with all weights equal. Add/remove has two cycles, the transform speed of add/remove is and the threshold of similarity to target is ...... 217 Figure 13 TraSe morph that demonstrates the use of weights that favourably bias the compositional transformation of “merging” notes over that of “splitting” notes, thus creating a less chaotic morph compared to the previous example. Add/remove transform speed is and dissimilarity cut-off is ...... 219 Figure 14 Example of a slower transform-speed yielding faster convergence. Settings are the same as previous example, except for the transform speed on add/remove which was reduced to ...... 221 Figure 15 The number of transformations per frame has been limited to two. Merge is biased over split. Add/remove transform speed is ...... 223 Figure 16 Morphing from target to source. The add/remove transform speed is set to and merge is bias over divide. The dissimilarity cut-off is zero...... 225 Figure 17 The same morph of the previous example, played in reverse order of frames...... 226 Figure 18 Comparing the pitches in C Major with the pitches in F# Major ...... 228

xi Figure 19 Key/scale morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using CC, key-root distance and scale dissimilarity...... 228 Figure 20 Output from morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using CF, key-root distance and scale dissimilarity...... 228 Figure 21 Output from morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using key-scale dissimilarity as the only measure...... 229 Figure 22 The pitch-classes (rows) present in each key/scale frame (columns) during a morph from C Ionian to F# Ionian, unexpectedly finishing on D# Aeolian. The “X” marks where there is a difference between the previous scale and the scale with the X, in terms of the pitch- class that the X is on...... 229 Figure 23 Output from morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using absolute pitch-class similarity combined with a small (0.01) weighting of key-root distance...... 230 Figure 24 The pitch-classes (rows) present in each key/scale during a morph from C Ionian to F# Ionian using absolute pitch-class similarity combined with a small (0.01) weighting of key-root distance. The “X” marks where there is a difference between the previous scale and the scale with the X, in terms of the pitch-class that the X is on...... 230 Figure 25 Modulating from C Ionian to Bb Ionian. Key-scale dissimilarity is , key-root distance , CF , CC , scale distance , track VS consistency and transform speed ...... 231 Figure 26 Modulating from C Ionian to Bb Ionian. Key-scale dissimilarity is , key-root distance , CF , CC , scale distance , tracking VS consistency and transform speed ...... 231 Figure 27 Modulating from C Ionian to Bb Ionian. Key-scale dissimilarity is , key-root distance , CF , CC , scale distance , tracking VS consistency and transform speed ...... 232 Figure 28 Modulating from C Ionian to Bb Ionian. Key-scale similarity is , key-root distance , CF , CC , scale distance , tracking VS consistency and transform speed ...... 232 Figure 29 The computation time (y) for a single cycle of un-biased Add/remove, applied to various numbers of notes (x), which were randomly generated and evenly distributed between source and target...... 234 Figure 30 The computation time (y) for a single cycle of the un-biased transform-chain, applied to various numbers of notes (x), which were randomly generated and evenly distributed between source and target...... 235 Figure 31 Number of frames generated with two cycles of add/remove only, monophonic, quantised to , source constrained to C4 octave and target constrained to C5. For each number of notes there were samples...... 236 Figure 32 Number of frames generated using the entire transform-chain, versus the total number of notes in the source and target patterns. Both source and target are monophonic and quantised to 0.25. Source is constrained to C4 octave and target constrained to C5. MAD is ‘Median Absolute Deviation’. AR is ‘Add/Remove’...... 238 Figure 33 With source and target being generated on separate octaves, comparing the minimum number of frames produced at each number of notes when using “add/remove only” VS “all transformations”. The yellow highlighted columns refer to occurrences where the minimum “all transformations” was smaller than the minimum of “add/remove only”...... 239 Figure 34 List of stimuli for subjective response...... 244

xii Figure 35 List of musical elements to assist musicological analysis...... 245 Figure 36 Summary of participants’ backgrounds...... 247 Figure 37 Summary of perceptions of the source and target material...... 248 Figure 38 Summary of overall approaches used by participants...... 249 Figure 39 Judgements of whether the morphs are applicable to the real-world contexts of computer games or dance music. Responses were originally in natural language, but have been condensed for display within this table. They are in the form: computer game/dance and can be y-yes, n-no or m-maybe...... 253 Figure 40 A table for NN distance between , which has notes, and , which has . The iteration for the forward (left) and backward (right) calculations are shown by the red arrows. Example NNs are highlighted in yellow...... 261 Figure 41 Finding the NN distance between and with a note added from (the note in this example) requires one operation for forward (left), and for backward (right). A_2 is an abbreviation for “ with the note from added”. Grey squares are distances that are subtracted, while squares with a red outline are distances that are added. The number of squares with red arrows going through them indicates the number of operations required...... 262 Figure 42 Recalculating the NN distance for with a note removed (the note in this example), forward (left) and backward(right). R_2 is an abbreviation for ‘ with the note removed’. The row that is crossed out is the row in that has been removed to make . The number of squares with red arrows through them indicate the number of operations needed. Yellow squares are examples of new NNs, and the grey squares are the previous NNs (continuing the example from Figure 40) that must be subtracted...... 264 Figure 43 Pseudocode for the backward NN distance, optimised for a remove operation...... 265 Figure 44 Reducing the search-space when finding the NN of the blue note amongst the red notes. In this example, the blue note is the first note in the input, , and the red notes are from the target, . This shows the two steps (labelled 1 and 2) needed to find the NN. The note with a grey outline is the note currently being considered. Notes that are shaded out have been pruned because we know that the distance of their onsets alone will be larger than the Euclidean distance ( ) to the current NN (highlighted), because is sorted. The dashed circle shows the area, within which notes will be closer than the currently considered note...... 266 Figure 45 Worst-case scenarios: A. If sorting by start-time, there would be no pruning in ‘vertical’ music. B. If sorting by pitch is also considered, the worst case scenario is a sequence of diagonal notes...... 267

Chapter Eight

Figure 1 Pictures from the Morphing table installation ...... 277

xiii List of Equations

Chapter Five

Equation 1 Phase shift ...... 129 Equation 2 The length of the loop for the morphed envelopes, derived through weighted combination of source and target loop lengths ...... 133 Equation 3 The current time, phase shifted and bounded within the loop length...... 133 Equation 4 Condition for playing a note during any given play cycle. is the average value in the inter-onset envelope since the most recent note played. is the actual interval between the most recent note played and the current position...... 134 Equation 5 Determining , the average value in the inter-onset envelope since the last note. is a function that returns the value in the inter-onset envelope at particular points in time. is the current time, bounded within the loop length. is the time interval between the current position and the most recent note that has been played. This is only for when ...... 134 Equation 6 Determining , as per the previous equation, but in situations where . is the length of the loop...... 135 Equation 7 The line that goes through the exact centre of the inter-onset envelope ...... 144 Equation 8 A function that reduces or expands the variation in the inter-onset envelope. recovers . , ...... 145

Chapter Six

Equation 1 Function to calculate the similarity matrix . is the head of the segment of length in that ends on , which is to be compared to . The is to allow for wrap around. is a normalisation constant...... 155 Equation 2 The similarity between notes as the combined similarity of pitch, duration and onset, weighted by the user-defined weights...... 155 Equation 3 The contrast function. As increases, the difference between low and high similarity ratings are compounded...... 156 Equation 4 The similarity between two note pitches, and , as the combination of similarity in linear, CF and CC pitch spaces, each weighted by user-defined weights...... 157 Equation 5 Generalized similarity in linear space is the inverse of the difference, normalised by the range of possible values, , and taken to the power, in order to exaggerate the smaller differences. For MIDI pitch space, . I settled on the magnification after experimentation...... 157 Equation 6 Similarity measure as shortest distance between pitches in the CC...... 159 Equation 7 Similarity in CF space. The function measures the shortest distance between the inputs in chromatic circle space and is explained in the previous equation...... 159 Equation 8 Function that finds the relative distance between two durations, without any upper limit imposed. It is centralized on ...... 161

xiv Equation 9 A measure of the common factor difference between two durations...... 161 Equation 10 Duration similarity is the inverse combination two separate distance measures: relative distance and factor distance (described in previous two equations). The latter is weighted less due to it being a less important measure of similarity...... 161 Equation 11 Circle distance function. and are two values being compared, and is the length of the circle space they are being compared within...... 162 Equation 12 The onset similarity function is the inverse of a weighted combination of distance in five loop spaces of different lengths...... 162 Equation 13 The similarity measure for two polyphonic note-groups and . is the similarity measure for individual notes within the note groups...... 163

Chapter Seven

Equation 1 Defining the target value of dissimilarity . is the user-defined ‘speed’ ( is slowest). is the lowest dissimilarity rating. is the dissimilarity rating of the unmodified input, that is, the previous frame...... 200 Equation 2 The index to the selected note sequence, , is determined by minimising the difference of the dissimilarity of that note sequence with the target dissimilarity (first term above), while minimising the dissimilarity between that note sequence and the source (second term). The user can control the influence of each of these terms using - the ‘tracking VS consistency’ variable...... 201 Equation 3 The average distance between each note in and its NNs in ...... 206 Equation 4 The NN dissimilarity measure...... 206 Equation 5 The average interval from the central tonic. is the sequence of note pitches, is the number of note pitches in (the cardinal). is an index to and is the central tonic. .... 211 Equation 6 Forward term simplified for addition of a single note, , from ...... 262 Equation 7 Backward term simplified for addition of a single note from ...... 263 Equation 8 Forward term simplified for removal of from ...... 264

xv List of supplementary media

Chapter one

1.1 EX2F_slowLove.wav 1.2 EX2M_loveFunk.wav 1.3 EX2T_FunkYou.wav

Chapter two

2.1 GrooveRider_05_2-34.wav 2.2 MassiveAttack_Mezzanine_03 Teardrop_0-37.wav 2.3 CD2 - Ministry Of Sound Ibiza Annual 2005_36-03.wav 2.4 Ministry Of Sound - Chillout Session - Groove Armada - At The River _0-38.wav 2.5 Portishead_05 Over_1-18.wav 2.6 Ministry Of Sound - Afterhours 3 CD2_1-02-27.wav 2.7 Anon_Minimal_House_Mix_0-33.wav 2.8 Bjork_Homogenic_5 Years_4-09.wav 2.9 Ministry Of Sound CD2 Ibiza Annual 2005_6-26.wav 2.10 -Live @ Space Miami(26-03-2006)WMC_11-05.wav 2.11 GU_Halloween Chriss Scott live at club korona_2006-part05_12-43.wav 2.12 Postishead Portishead 02 All Mine_0-0.wav 2.13 Grooverider CD2 Track No08_0-42.wav 2.14 Grooverider CD2 Track No15_0-21.wav 2.15 MOS - Clubbers Guide Summer 2006 [CD1]_Love Sensation_5-30.wav 2.16 Ministry Of Sound Ibiza Annual 2005 CD2_9-02.wav 2.18 Tiefschwarz_Ghosttrack black strobe remix_4-21.wav 2.19 Carl Cox_Thats the bass (tim_deluxe_remix)_6-40.wav 2.20 Nick warren_Live @ Space Miami(26-03-2006)WMC_56-51.wav 2.21 Paul Oakenfold_Global underground New York CD1_9-06.wav 2.22 LTJ Bukem - Logical Progression At Ministry Of Sound (2000)_2-00.wav 2.23 Board of Canada 12 - Aquarius_0-0.wav 2.24 Nick warren-Live @ Space Miami(26-03-2006)WMC_17-11.wav 2.25 Nick warren-Live @ Space Miami(26-03-2006)WMC_58-13.wav 2.26 Paul van Dyke_Ministry of Sound Radio_37-47.wav 2.27 Tiefschwarz_Ghosttrack (Black Strobe Remix)_1-24.wav 2.28 Global Underground_Halloween Chriss Scott_1-02.wav 2.29 Boards of Canada_ 15 - Smokes Quantity_2-19.wav

xvi 2.30 Boards of Canada_ 15 - Smokes Quantity_2-30.wav 2.31 Ministry Of Sound_Chillout Session_Groove Armada_At The River_1-23.wav 2.32 Global Underground_Halloween Chriss Scott_6-47.wav 2.33 Paul Oakenfold_Global underground New York CD1_3-33.wav 2.34 Global Underground_Afterhours 3_CD1_21-20.wav 2.35 Global Underground_Afterhours 3_CD1_28-00.wav 2.36 Global Underground_Afterhours 3_CD2_21-16.wav 2.37 Nick warren-Live @ Space Miami(26-03-2006)WMC_6-00.wav 2.38 Nick warren-Live @ Space Miami(26-03-2006)WMC_7-58.wav 2.39 Nick warren-Live @ Space Miami(26-03-2006)WMC_11-10.wav 2.40 Global Underground_Halloween Chriss Scott_10-11.wav 2.41 Global Underground_Halloween Chriss Scott_10-31.wav 2.42 Carl Cox_ Global Sessions_(radio_fg)-01-23-2006-nyd_2-50.wav 2.42.1 Carl Cox_ Global Sessions_(radio_fg)-01-23-2006-nyd_2-43.wav 2.42.2 Carl Cox_ Global Sessions_(radio_fg)-01-23-2006-nyd_3-26.wav 2.42.3 Carl Cox_ Global Sessions_(radio_fg)-01-23-2006-nyd_3-41.wav 2.43 Carl Cox_ Global Sessions_(radio_fg)-01-23-2006-nyd_4-54.wav 2.44.1 Paul van Dyke-Ministry of Sound_11-06.wav 2.44.2 Paul van Dyke-Ministry of Sound_11-26.wav 2.45 Global Underground_Halloween Chriss Scott_50-12.wav 2.46 Paul van Dyke-Ministry of Sound_44-44.wav 2.47 DJ Shadow_Endtroducing_Track No02_2-56.wav 2.48 Global Underground_Halloween Chriss Scott_4-51.wav 2.49 Grooverider Live Radio.wav 2.50 Mancini - Mix_57-15.wav 2.51 Carl Cox_ Global Sessions_(radio_fg)-01-23-2006-nyd_4-30.wav 2.52 Nick warren-Live @ Space Miami(26-03-2006)WMC_57-28.wav 2.53 Mancini - Mix_1-01-20.wav 2.54 Dissidenten-Instinctive Traveler_2-50.wav 2.55 Deep dish album Ministry of Sound - Chill Out Session_2-05.wav 2.56 Mancini - Mix_1-01-08 .wav 2.57 Calypso Rose & Shurwayne Winchester_Tempo (soca gold 2006)_0-32.wav 2.58 Soca Gold 2006_14 Outta Hand_0-08.wav 2.59 Amen Break.wav 2.60 Funky Drummer.wav 2.61 The Fugees_Family Business_0-04.wav 2.62 Kino Oko_Koz Ajm Bed_6-13.wav

xvii 2.63 Drum N Bass Arena_track 2_0-05.wav 2.64 Grooverider - Pure Drum&Bass CD 2 - 12 (Q Project) Champion _4-10 .wav 2.65 Stakka and Skynet_ Track No01 Decoy.wav 2.66 Groove Armada - Superstylin_1-24.wav 2.67 Drum N Bass Arena _Track 07_0-35.wav 2.68 Soca Gold_06 Take You with Me_0-09.wav 2.69 Salmonella Dub -10- Tui Dub_1-30.wav 2.70 Massive Attack_ 06 Unfinished Sympathy_0-08.wav 2.71 Rene Wooller_ LEMu1_GenDrums demo.wav 2.72 Roni Size - Formulate w DJ Krust_0-36.wav 2.73 Grooverider_track02_0-18.wav 2.74 Grooverider_track04_1-39.wav 2.75 Grooverider_track05_1-04.wav 2.76 Grooverider live_maximum overload_33-52.wav 2.77 Grooverider_track15_1-42.wav 2.78 Scratch Yer Hed - Square Pusher Mix_0-17.wav 2.79 Grooverider_Harder they come_Track 6 __.wav 2.80 Blu Mar Ten vs. Erykah Badu - You Got Me (Remix)_0-20.wav 2.81 Salmonella Dub_Tha Bromley East Roller (DJ Digital & Spirit Remix)_2-10.wav 2.82 Drum n bass Assasins_.wav 2.82 Nick warren-Live @ Space Miami(26-03-2006)WMC_56-29.wav 2.83 Grooverider_Live_36-27.wav 2.84 Grooverider live_maximum overload_13-10.wav 2.85 Boards of Canada 11 - Rue The Whirl_2-30.wav 2.86 Aim_Downstate_0-34.wav 2.87 Nick warren-Live @ Space Miami(26-03-2006)WMC_42-22.wav 2.88 Mancini - Mix Electro house minimal techno_1-04-56.wav 2.89 Mancini - Mix Electro house minimal techno_27-48.wav 2.90 Boards of Canada_10 - Roygbiv_0-0.wav 2.91 Ministry Of Sound - Clubbers Guide Summer 2006_CD1_20-11.wav 2.92 Ministry Of Sound - Clubbers Guide Summer 2006_CD1_16-18.wav 2.93 Ministry Of Sound - Clubbers Guide Summer 2006_CD1_41-52.wav 2.94 Boards of Canada_07 - Turquoise Hexagon Sun_2-28.wav 2.95 Portishead_Portishead_09 Only You_3-58.wav 2.96 DJ Sasha & - Ibiza CD 1 - Global Underground_21-27.wav 2.97 DJ Sasha & John Digweed - Ibiza CD 1 - Global Underground_28-45.wav 2.99 Ministry Of Sound_Ibiza Annual 2005_CD2_1-12-54.wav

xviii 2.100 DJ Shadow_ Endtroducing_Track 11_Organ Donor_1-29.wav 2.101 Mancini - Mix Electro house minimal .wav 2.102 _DJ Bene_Minimal_House_Mix02_19-12.wav 2.103 - Miami - 027 - CD 2_0-40.wav 2.104 Gotan Project_Santa Maria_0-36.wav 2.105 ATB_Ministry of Sound_13-55.wav 2.106 ATB_Paul Van Dyke_Ministry of Sound_47_31.wav 2.107 Jeff Mills @ Sonar 2005 - Barcelona_7-11.wav 2.108 DJ Bene_Minimal_House_Mix02_36-00.wav 2.109 Jeff Mills @ Sonar 2005 - Barcelona - 18-06-05.wav 2.110 DJ Bene_Minimal_House_Mix02_52-14.wav 2.111 psychologic - dj miky_5_27.wav 2.112 Boards of Canada_07 - Turquoise Hexagon Sun_0-10.wav 2.113 Danny Howells - Miami - Global Underground 027 - CD 2_1-10-05.wav 2.114 06-Blue Asia-Abyssinean Dub_2-38.wav 2.115 DJ Shadow_Endtroducing Track No04_4-30.wav 2.116 DJ Bene_Minimal_House_Mix02_36-00.wav 2.117 Boards of Canada 12 - Aquarius_1-14.wav 2.118 01-carl_cox_-_thats_the_bass_(tim_deluxe_remix)_4-08.wav 2.119 Blu Mar Ten vs. Erykah Badu - You Got Me (Remix)_0-0.wav 2.120 Stakka and Skynet Track No01 Decoy_0-0.wav 2.121 Stakka and Skynet Track No01 Decoy_6-16.wav 2.122 Blu Mar Ten vs. Erykah Badu - You Got Me (Remix)_7-18.wav 2.123 Jamiroquai Virtual Insanity to Blur Song 2_simple mix by Wooller.wav 2.124 Steve Lawler_Global Underground_Lights Out_so twisted_7-40.wav 2.125 DJ Bene_Minimal_House_Mix01_14-57.wav 2.126 Global Underground_Chriss Scott-live at club korona_part05_10-36.wav 2.127 Global Underground_Chriss Scott-live at club korona_part05_3-25.wav 2.128 DJ Sasha & John Digweed - Ibiza CD 1 - Global Underground_25-14.wav 2.129 Global Underground_Chriss Scott-live at club korona_part05_22-59.wav 2.130 Nick warren-Live @ Space Miami(26-03-2006)WMC.wav 2.131 Ministry Of Sound - Clubbers Guide Summer 2006 [CD1]_14-43.wav 2.132 DJ Shadow and _Brainfreeze_01_4-10.wav 2.133 DJ Tiesto - Adagio For Strings _2-03.wav 2.134 Groove Armada_super stylin remix_5-03.wav 2.135 Grandmaster Flash and the Furious Five_The Message_0-19.wav 2.136 Plan B feat Rolling Stones_Paint It Blacker_0-34.wav

xix 2.137 Weird Al_07 Polka Power_0-48.wav 2.138 Weird Al_Polka Power!.mp3 2.139 Weird Al_07 Polka Power_Wannabe Verse to Chorus_0-16.wav 2.140 Weird Al_07 Polka Power_Wannabe to Flagpole_0-32.wav 2.141 Weird Al_07 Polka Power_Ghetto to Backstreets_1-09.wav 2.142 Bagpiper_Michael Lancaster_Birthday to Scotland_Brave_0-04.wav 2.143 Drum N Bass Arena_Track No11_0-10.wav 2.144 Drum N Bass Assassins_pitchedsnare.wav 2.144 Grey Album_06 Dirt Off Your Shoulder_complete.mp3 2.145 Stakka and Skynet_ Clockwork_Track No05_4-26.wav 2.146 Aim_Cold Water Music_03 The Force_0-04.wav 2.147 Dangermouse_Jay_z_Beatles_Dust off yr shoulder_ Grey Album_0-0.wav 2.148 Wishart_redBird_0_01.wav

Chapter three

3.1 mmjohnny.mp3 3.2 harm_morph_1r_to_Cdom7.mp3 3.3 b2T-Melody-mono.wav 3.4 b2T-Long-mono.wav 3.5 b2T-Rhythm-short.wav 3.6 LightAndLonely.mp3 3.7 Mighty.mp3 3.8 LightGoesMighty.mp3

Chapter four lemu2run.bat LEMu2p0.jar msDrivers.exe /LEMu2Manual/index.htm /src /workingDirectory /externalClasses /APIs /bin

xx Chapter five

5.1 Interpolation_phaseWorkingDemo.mid 5.2 Interpolation_phase_notwork.mid 5.3 Interpolation_invertCancel.mid 5.4 Interpolation_pitchDur_notLocked.mid 5.5 Interpolation_pitchDur.mid 5.6 Interpolation_DifferentStyles.mid 5.7 Interpolation_DiffStyles_DiffKeys.mid 5.8 Interpolation_BritToJohn.mid

Chapter six

6.1 Weighted_drums.mid 6.2 Weighted_drums_swing.mid 6.3 Weighted_drums_triplet.mid 6.4 Weighted_johnny.mid 6.5 Markov_variation_amnt.mid 6.6 Markov_CFvsCC_Maj_CC_fuzzadj.mid 6.7 Markov_CFvsCC_Maj_CF_fuzzadj.mid 6.8 Markov_var_mod8_real.mid 6.9 Markov_var_mod_3_4_ovrlay_real.mid 6.10 Mrkv_vr_md_pi_de1_lin100_dis100.mid 6.11 MrkvVrPi100St66De1M234Cof100.mid 6.12 Markov_johnny_trips.mid 6.13 Swingy_main_section.mp3 6.14 Trekno_main_section.mp3 6.15 DarkDNB_main_section.mp3 6.16 Swingy.mp3 6.17 Trekno.mp3 6.18 DarkDNB.mp3 6.19 SwingyToTrekno.mp3 6.20 TreknoToDarkDNB.mp3 6.21 DarkSwingDarkShort.mp3 6.22 SwingyToDarkDNBLong.mp3 6.23 TreknoToSwingy.mp3 6.24 FocusConcert.wmv

xxi Chapter seven

7.1 TraSe_Var_source.mid 7.2 TSE_mor_cal2hse_Target.mid 7.3 TSE_mor_cplt_cal2hse_nobias_cmixo_2b_postFix3.mid 7.4 TSE_mor_cplt_cal2hse_cmixo_2b_biasMerge.mid 7.5 TSE_mor_cplt_cal2hse_cmixo_2b_biasMerge_4fr.mid 7.6 TSE_mor_cplt_cal2hse_cmixo_2b_biasMerge_5fr8frWMuLim2.mid 7.7 TSE_mor_cplt_hse2cal_cmixo_2b_biasMerge_6fr.mid 7.8 TSE_mor_cplt_hse2cal_cmixo_2b_biasMerge_6fr_bkwds.mid 7.9 TSE_mor_Johnny.mid 7.10 EX1F_DarkTech.wav 7.11 EX1T_Safari.wav 7.12 EX1M_DarkSafari.wav 7.13 EX1M2_composer.wav 7.14 EX2F_slowLove.wav 7.15 EX2T_FunkYou.wav 7.16 EX2M_loveFunk.wav 7.17 EX2M2_composer.wav 7.18 EX3F_BritPopBitSweet.wav 7.19 EX3T_DarkBeats.wav 7.20 EX3M_DarkPeat.wav 7.21 EX3M2_composer.wav 7.22 EX4F_Afrika.wav 7.23 EX4T_Ethno.wav 7.24 EX4M_Afrono.wav 7.25 EX4M2_composer.wav 7.26 questionnaire.swf questionnaire15post.fla savexml.php /responses /sounds

xxii Chapter eight

8.1 QUT_Lunchtime_Concert_Audio.wav 8.2 QUT_Lunchtime_Concert_movie.mov 8.3 Morph_Table_Documentary.avi 8.4 Beige_ComputerGame_morph.wmv

xxiii List of abbreviations

Note: for definitions, see the Glossary in Appendix A

AMS: Algorithmic Music System HMSL: Hierarchical Music Specification ANN: Artificial Neural Network Language

BB: Break-Beat IBM: International Business Machines BCBT: Binary Copy Buffer Transform IGA: Interactive Genetic Algorithm BPM: Beats Per Minute IPS: Independent Pitch Streams ILLIAC: ILLinois Automated Computer CAAC: Computer Assisted Algorithmic Composition JMSL: Java Music Specification Language CAC: Chick-A-Chick jMusic: Java Music CC: Circle of Chroma CD: Compact Disc LED: Light Emitting Diode CF: Circle of Fifths LEMu: Live Electronic Music CONCERT: CONnectionist Composer of LEMorpheus: Live Electronic Morpheus ERudite Tunes LFO: Low Oscillator CPU: Central Processing Unit CT: Central Tonic MAX/MSP: MAX (Mathews) Signal Processing DePa: Degree and Passing notes MEM: Mainstream Electronic Music DJ: MIDI: Musical Instrument Digital Interface DSP: Digital Signal Processing MUSICOMP: MUSIc Composition language DMP: Direct Music Producer NN: Nearest Neighbour EDM: EMI: Experiments in Musical Intelligence OO: Object Oriented OS: Operating System FF: Four on the Floor OSC: Open Sound Control FIFO: First In First Out PNR: Pitch to Noise Ratio GenJam: Genetic Jammer GRIN: GRaphical INput QUT: Queensland University of Technology GUI: Graphical User Interface

xxiv SARA: Simple Analytic Recombinant TS: Tonal Stability Algorithm SMA: Simple Musical Algorithm UML: Unified Modelling Language SPEAC: Statement, Preparation, Extension, Antecedent, Consequent

TC: Tonal Change TraSe: Transform-Select

xxv The work contained in this thesis has not been previously submitted to meet requirements for an award at this or any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

______

René Wooller

/ / 2007

xxvi Acknowledgements

Thank you to the universe, in particular,

To my loving family: Cerae Mitchell, Jan Buhmann, Roger Wooller, Ben & Chihiro Wooller, Marsha and Mishayla Buhmann, Nick & Jan Wooller, Dora Wooller, Joy Buhmann, Keith Mitchell and Maryanne Donnerly, Aran Mitchell, Jezaya Mitchell, Liz & Mellissa Wooller and extended family.

To my amazing and magical friends who I will hopefully see a lot more of now: Mat Petoe, Toby Gifford, Jason Laucher, Patrick King, Richie Allen, Amber Hansen, Caleb Trott, David Shaw, Natalie Jones, Dan Huzzer, Alicia King, Amanda Cuyler, Amy Batalibasi, Imogen Shields, Kate Thomas, Jacqui Vial & Alex Fitzgibbons, Alex Dixon, Svenja Kratz, Mellissa Bone, Matt Buckley, Sifu Gordon Shellshear and the Original Wing Chun family, the JEDAI crew, the Bachelors of Music Production ’01, ’02 and ‘03 crew, the State High ’98 crew, the Stick Ball Crew.

To my wise and wonderful supervisors, colleagues and sources of inspiration: Andrew Brown, Frederic Maire, Andrew Sorensen, Tim Opie, Ross Bencina, Steven Livingstone, Anna Gerber, Greg Hooper, Steve Dillon, Greg Jenkins, Jody Kingston, Roland Adeny, Larry Polansky, Danny Oppenheim, Jonas Edlund, Peter McIllwain, questionnaire participants, ACMA and ICMA.

To the animals and inanimate objects with whom I am strongly attached: Nellie the dog. Billi the cat. Anhinga the boat. Boxy the car.

Harmony, love, peace and happiness to all.

xxvii 1 Introduction 1.1 Background motivations

The delivery of electronic music rarely operates in the same spontaneous way as acoustic music and, in contexts where some form of adaptivity exists, the aesthetics are restricted and shaped by technological limitations. It is the goal of this research to investigate techniques that both enable greater adaptivity and provide a new set of aesthetic possibilities for mainstream electronic music delivery. “Enabling adaptivity” can be defined and evaluated as a technical problem, while “providing a new set of aesthetic possibilities” is an artistic problem and thus more exploratory in nature. Consequently, my research method combines technical development and reflective exploration with both informal and formal evaluations. In particular, the goal was pursued through the investigation of note sequence morphing, where a hybrid transition is generated between source and target note sequences. The musical genre within which the research operates is Mainstream Electronic Music (MEM), which I define as the popular and largely instrumental form of electronic music constructed from loops of layers such as drums, bass, lead, chords and sound effects. While the techniques of note level morphing I developed could be applied to virtually any MEM context, particular niche areas I have identified that could reap substantial benefit include: live MEM, computer game music and interactive installations.

1.1.1 Live MEM The first context is the live delivery of MEM music through a DJ or by using live music software. The skilled DJ/producer gauges the mood and taste of the audience and selects appropriate tracks to mix. However, despite the range of aesthetic possibilities within pre-produced music, the technology for mixing is usually limited to sonic effects such as , cross-fade, , time/pitch stretching and others. Unless the music has been carefully selected to ensure a degree of compatibility, the transitions can sound extremely awkward. It is particularly difficult to create an intelligible extended mix that might be perceived as ‘new’ hybrid music by the audience.

A legacy of technological limitations and affordances have forced aesthetic limitations on the live delivery of MEM that have come to be accepted by the audience as part of the musical form within which skilled practitioners operate. This is most obvious in Electronic Dance Music (EDM) where live mixing of tracks is common. Specific aesthetic features are that tracks are compatible; time-scales are long; sonic interest is more important than pitch and melody; tempo is consistent during mixes; and rhythmic repetition is more appropriate than varied phrasing. All of these and other aesthetic limitations can be correlated, at least partially, to limitations of

1 technology – tracks must be compatible because the current tools do not afford on-the-fly compositional changes and the pre-production of customised transition material is uneconomic and unfeasible if track selection is to occur in a truly adaptive fashion. Time scales are long because the change in mood between tracks proceeds in small increments. Change in is important because of the availability of realtime-manipulable (and thus adaptive when used by a sensitive human) sonic effects in audio production and mixing. Tempo is consistent during mixes because it is extremely difficult to beat-match and increase the tempo on two turntables simultaneously.

Considering the connection between the technology and aesthetics in adaptive electronic dance music delivery, new software for hybridising and transitioning between electronic dance music tracks could allow a range of new aesthetics to bloom. The ability of morphing to hybridise even very divergent tracks could enable new approaches to track selection, with tracks that would otherwise not be heard back-to-back becoming compatible. With smoother morphs, dramatic changes may become feasible over shorter periods of time. Due to the use of note sequences rather than audio tracks, transitions based on melodic and harmonic techniques could be explored, rather than simply equalisation, cross-fading and effects.

1.1.2 Computer games and multimedia

A second application for note level morphing is in computer games and multimedia, where the small amount of music, relative to the time spent playing the game or navigating the media, usually ensures that the music will become repetitive and boring at some stage (Sanger 2004). More importantly, although musical changes occur in response to narrative cues, the material must be cleverly composed so that each piece is compatible with every other that borders it (Apple 2006), constraining the musical possibilities substantially. The connection of divergent musical themes is very awkward without customised transitional material, which is uneconomical. Consider the following: a game with only 20 different states, but that requires a transition from each state to every other, would necessitate 190 transitions – too much material to produce economically and increasing exponentially with larger projects. Therefore, techniques that enable the automatic creation of such transitions would make composition processes more efficient and currently intractable game music projects possible. While some computer game music tools, such as Direct Music Producer (Fay et al, 2003), enable more efficient organisation of transitional material, a tool that exploits sophisticated algorithmic composition techniques is yet to become widely available. As well as transitions, hybridisation techniques could be used to create a huge number of ‘mutant’ pieces, thus increasing the amount of music that is able to be applied within the game overall and reducing repetition.

2 1.1.3 Interactive installations A third context for note sequence morphing is music for interactive installations that are displayed in locations such as art exhibitions, conferences, museums, theme parks and festivals. Social installation interfaces such as tabletop marker tracking systems are particularly relevant to morphing, which can be easily projected onto a topology. With only a single parameter to define the position between the source and target, note sequence morphing is simple to use for the general public. As well as this, if the source and target music are divergent, a large range of musical possibilities are nonetheless available. The potential number of musical states can be increased significantly when layers of the music – such as bass, drums and lead – are morphed independently. 1.1.4 Additional motivating contexts

Other adaptive and electronic music contexts such as accessible music therapy (Cost-287 2007) and computer assisted composition could benefit in ways similar to live MEM, computer games and interactive installations. The selection of MEM as the primary musical genre is partially motivated by the potential economic benefits of working in a popular genre, as well as a personal interest in the music. Primary motivations for extending the current knowledge of note level morphing are summarised thus:

o Increased adaptivity in the delivery of MEM. o Increasing the spectrum of aesthetic possibilities for musical transitions. o Increased efficiency in MEM composition, both adaptive and otherwise. o Increasing the possibilities and affordances for musical hybridisation.

Note level morphing is highly relevant and applicable to the research goals of enabling greater adaptivity and providing a new set of aesthetic possibilities for MEM delivery.

3 1.2 What is note sequence morphing?

Note sequence morphing is the task of integrating two separate note sequences to create a hybrid that may work as a transition between the originals. The ‘note sequence’ aspect indicates that the algorithm operates on musical notes, with attributes such as pitch and onset, rather than sound waveforms or graphics. Larry Polansky’s morphing definition (1992) can be simplified and adapted to the specific context of note level morphing as follows. A morphing algorithm, denoted by , is limited to two musical inputs. One of these is the source music, denoted by , and the other is the target music, denoted by . For example, might sound like sound example (~1.1), and might sound like (~1.3). The output of the morphing algorithm is the morph music, and is denoted by . is a kind of hybrid transition between and , which, continuing the examples, could sound like (~1.2)1. The influence of or in is controlled by the morph index, or , which is typically a user-specified weighting. is normalised such that when , and when , . We have . The three note level morphing algorithms I created all fit this general definition. The various approaches differ substantially, as explained in chapters five, six and seven, each of which deal with a different morphing algorithm.

Two artistic purposes for the use of morphing in music can be distinguished. One is to create a transition whereby , and can be positioned in sequence to create a smooth and/or coherent whole. This has potential in multimedia, computer games, live DJ mixing or any context where automatic transitioning between music is required. Another artistic purpose can be to create a mutant or hybrid, whereby has aesthetic interest in the way it shares properties of and , but may or may not function as a transition. This is more applicable to computer assisted composition and musical experimentation. Despite the distinction, these two purposes often overlap considerably, especially when considering a lengthy transition that can operate as an independent piece of music.

1 The morph example was produced by the LEMorpheus software I developed through this research.

4 1.3 Compositional and philosophical issues related to morphing

Various compositional and philosophical issues related to morphing require some introductory treatment, including: the difference between interpolation and morphing, n-source morphing, similarity measures, continuous versus discrete changes during transitions and the abstract description provided by the morph-index.

From discussions with various peers (Collins 2006; McCormack 2006), I have found it important to distinguish between interpolation and morphing. Interpolation is a technique for estimating unknown values between known points. Morphing, however, is more concerned with aesthetic integration of separate patterns. The technique of interpolation may feature to different degrees, or even not at all.

While the style of morphing discussed within this research is purely between source and target, morphing can also occur between multiple sources, which is called n-source morphing (Oppenheim 1995). There are some musical implications to this. In particular, using n-sources can shift the focus to be less on the transition and more hybridity and exploration of the much larger “morph space”. Simple source-target style morphing is easily applied to a musical work by mapping the morph-index to time. In contrast, n-source morphing is more complex, because the number of dimensions describing the morph space is increased, while the limitation of the “time” dimension in the musical work remains. That is, it is inevitable that one must consider how to navigate the morph-space if one is to generate a musical work, which in turn makes exploration more important. Because the generation of an N-source transition is thus more complex than with source-target morphing, and because multiple sources increase the likelihood that unique and possibly interesting combinations will result, N-source transitions also afford the generation of new, hybrid musical states, as opposed to the transitions between states.

Similarity measurement is pertinent, as it could be considered a kind of inverse function to morphing. That is, while morphing generates output at a similarity level specified by the morph- index, similarity measurements analyse two inputs to determine a level of similarity between them. Similarity measurements can be usefully applied to morphing algorithms to track the progress of the morph (Horner and Goldberg 1991), or they can adapted for use in reverse (Polansky 1992). If the similarity measurements are metric, that is, upholding the property of triangle inequality, they can be used to define a space, and morphing can proceed simply through interpolation through the space (Mathews and Rosler 1969).

5 There are a number of techniques for determining similarity between note sequences, the most prominent of which are briefly mentioned here. One approach is to convert the note sequence into continuous envelopes of note parameters (Mathews and Rosler 1969) and take the difference between them. The problem with this is that the similarity is based on metric distance between the parameters, and this often contravenes musical notions of similarity. Note- sequences, in being non-continuous, can be perceived as strings of characters, of which there has been significant research into similarity measurement techniques (Orpen and Huron 1992). The standard approach with strings is to allow a set of operations – for example, add, delete, shift – and count how many operations it takes to convert the source into the target (Damerau 1964). The number of operations will be proportional to the level of dissimilarity. For note sequences, the set of operations can be easily adapted to a musical context (Orpen and Huron 1992; Cope 1997).

If the source and target are considerably different, it is rare for composers to attempt a continuous morph – it is much more common for changes to be introduced abruptly and transitions proceed through a series of discrete segments, for example, A to B, through C. This is not to overlook the use of foreshadowing which can indicate some quality of an upcoming change. However, when the source and target are similar, abrupt changes do not appear necessary (see the final experiment in chapter seven). In addition, variation, for example in the episodic material of a fugue (Smith 1996), is practiced widely as a way to add interest, or to generate additional “filler” material. This suggests the possibility of a similarity threshold, below which it is more natural to utilise discrete changes and above which it is easier to utilise more continuous variation. The approach taken for this research is different, in that morphing is attempted through continuous variation regardless of the similarity of source and target. The benefit of this is that new, hybrid transitions can be created where none may have been tried previously.

Finally, some philosophical consideration must be directed to the kind of abstraction that the morph index or similarity measure provides. For instance, what does “half” really mean, when the morph index is an input to a highly complex recipe which rests on analytical reductions and musical assumptions? In a practical sense, the morph index is no more than a tool to control the music. However there is an added implication that when the morph index is “half”, the result will sound equally reminiscent of both the source and target, or in the worst case, equally dissimilar to source and target. Therefore the validity of the morph-index might be cross-examined by utilizing similarity measurements. If these similarity measurements were somehow shown to produce results that were equivalent to human judgments of similarity, the intuitive understanding of the morph-index could possibly be validated. Despite this, due to the qualitative

6 nature of music perception and interpretation, there would always remain a huge variety of different ways to interpret a morph index on “half”.

1.4 Details of the research goal

Within the broad goal of this doctorate to “research techniques that provide greater flexibility and new aesthetic possibilities for adaptive delivery of mainstream electronic music”, there are two elements of equal significance: function and style. Providing adaptivity and greater flexibility in delivery, while maintaining some degree of musical consistency, is a gain in functionality. Developing new and interesting aesthetic possibilities builds a stylistic palette, the various effects of which can be understood through user-testing.

The two artistic purposes of note level morphing can also be seen in these terms. Transitioning, although an artistic problem, is utilitarian or functional in nature: how to progress smoothly from source to target. Techniques for automatic transitioning have the practical outcome of enabling greater and more flexible adaptivity, in the sense that the system delivering the music is able to adapt to changes by shifting easily through repertoire as deemed appropriate, either by a user or another algorithm that analyses some kind of world state.

In contrast, hybridisation is more open-ended and related to stylistic exploration: how to integrate source and target in a way that is somehow reminiscent of both. While there are arguably as many approaches to this as there are composers, the use of algorithmic techniques provides new opportunities that are difficult to execute without a computer.

In summary, the broad goal of the research is to improve the function of adaptive MEM delivery while expanding style. More specifically, this is to be explored through note sequence morphing which simultaneously experiments with both the functional and stylistic demands of the research via the generation of hybrid transitions. With the goal thus clarified, the methods used to achieve this can now be related.

7 1.5 Personal background

I will now outline pertinent aspects of my personal background, so as to provide some additional context for the research. This includes musical influences, education and interests; previous studies of mine; as well as other relevant projects I have been and/or are involved in.

My musical interests began with the listening habits of my parents, which included 60s, 70s and 80s rock and mainstream electronic music. I played the trumpet briefly when I was seven and the piano and keyboard intermittently through to when I was 13. At this stage, I switched to learning classical , for which I developed a mild proficiency over the next five years. At 16 I became more interested in blues, jazz and improvisatory aspects of music.

After I discovered jamming, I began to feel that memorising scores and performing them for large audiences overlooked the fun of the musical experience, both for the audience and the performer. I felt that the experience of creating music was more fulfilling than observing or performing, regardless of the skill levels involved. This view pervaded the next decade of my practice.

Towards the end of high school I began to experiment with audio editing and MIDI sequencing, teaching myself through trial and error and creating a small folio of works. Along with my guitar skills, I used these early pieces to gain entry to the music production course at QUT, for which I majored in new media.

Over the next three years of the degree, I came to adopt the computer as my primary ‘instrument’ rather than the guitar. I produced more than an album’s worth of material although my primary interest was in the spontaneous and interactive practices of computer music. I began to develop software that could respond to changes in realtime so as to control the music in a way that felt ‘live’. This was the Live Electronic Music (LEMu) software, which I rebuilt for my Masters project. This software was mainly designed for EDM, which had become a major interest of mine over the course of the degree, due to the influence of peers and the desire to select a relevant contemporary style of electronic music to study.

The research for this doctorate is motivated fundamentally by similar interests while being directed at techniques that are applicable to a wider range of contexts than only live electronic music. Morphing became a musical interest through the early stages of the research, when it was found to be a widely applicable and under-examined area of study.

8 1.6 Design of research 1.6.1 Research design context

This research seeks to model existing approaches to music and simultaneously expand on them, using technology to enable new musical possibilities. This approach is based more on engineering than analysis and as such is somewhat isolated from much of musicology, being more related to computer music practice, which is informed by artificial intelligence.

A useful analogy is comparing the study of bird flight and plane flight – although the plane achieves flight in a totally artificial way, we can see that building jets and observing their behaviour has taught us a lot about aerodynamics, perhaps more so, than analysing bird flight. The hope with computer music methods is that building a machine with new musical capabilities and then observing its effect will provide some new knowledge of music that is applicable to any chosen musical context.

This section will explore the research designs that have been applied to similar algorithmic music objectives in the past. This is necessary for clarifying the position of this project in relation to other research and other fields. A contextual understanding is required for any informed interpretation of the research design of this project.

There is a history in computer music of under-testing, that is, building the musical machine, but not formally observing its effects, a point also noted by Pearce and Wiggins (2001). Even major works are under-tested (Hiller and Isaacson 1958; Bel and Kippen 1992; Oppenheim 1995; Cope 1996; Bencina 2004; Biles 2004). This phenomenon can be attributed to the attitude of musical autonomy (Beard and Gloag 2005) that came with some traditions of classical and art music composition. With this approach, the music is considered self explanatory and in no need of interpretation. While this is acceptable in the context of composition, academic studies require more analytical rigour in order to produce clear knowledge that is able to be used and validated.

While some exceptions to the overall trend exist (Hild, Feulner et al. 1992; Mozer 1994; Phon- Amnuaisuk, Tuson et al. 1999), the issue of under-evaluation has only been explicitly examined in recent years (Pearce and Wiggins 2001; Pearce, Meredith et al. 2002). My research furthers the movement towards evaluation by including rigorous scrutiny of the musical outcomes, drawing from a combination of critical listening (Pratt 1998), survey (Mozer 1994) and automated batch testing (Pearce and Wiggins 2001). Informal techniques include listening and subjective criticism, concert performances as well as installations and proof-of-concept demonstrations.

9

The research design is only distantly associated with computer-assisted musicology. Schüler (2002, p 125-126) has distinguished a number of branches within this field, most of which are in some way relevant but, overall, the musicological tradition is primarily concerned with analysis, rather than automatic generation, of music. This research has involved some amount of direct musicological analysis, to define the musical context and interpret the musical results. However, much of the knowledge behind the software that I developed, which itself is considered as a rigid music formalisation, came from years of subjective experience creating and listening to music, explained above (1.4).

Overall, the method of algorithmic music research tends to follow the engineering pathway of iterative software development as discussed below.

1.6.2 Research through iterative software development

Although the design of this research is more closely related to engineering and computer music practice than musicology, it is nonetheless concerned with the development of knowledge concerning musical processes and their effects. This knowledge is built by the development and testing of software formalisations of music. Iterative software development as research into musical phenomena is practiced by many, but has been discussed in detail as a methodology by only a few (Brown 2007; Desain and Honing 1992).

Iterative software development can be described as repeated application of phases of: investigation, development, evaluation, and presentation. Various other descriptions of iterative software development exist and differ superficially (McConnell 1998). The important factor is not the specific title of the phases, but the fact that they are iterated many times over the course of the research. This feature sets it apart from the other major software development method, the “Waterfall Model”, criticised by Royce (1970), who invented the term. In the waterfall model, the entire project is scoped out, planned and executed in linear succession, backtracking and remodifying the plan when the results are unexpected.

As it was anticipated that the results of this research were to be unexpected, iterative development was therefore a much more appropriate model to use as a basis for research design. The phases of investigation, development, evaluation and presentation were cycled many times throughout the project and there was a significant degree of overlap, with activities often occurring in parallel. Through subsequent iterations, the amount of time dedicated to each phase shifted over time, with a focus on investigation in the beginning, moving through

10 development, evaluation and presentation at the end. The specific details of this research process become evident in more detail as the thesis unfolds.

While iterative development was clearly a valid methodology for the research, the techniques for evaluating the musical algorithms, in terms of both music and software will now be examined in more detail. Their validity is important for the knowledge claims to be accepted.

1.6.3 Evaluation techniques

Despite the aforementioned sparsity of evaluation methodologies to follow in computer music, there are some notable exceptions (Mozer 1994; Pearce and Wiggins 2001) which have inspired the use of evaluation here. As well as this, there are a number of techniques that have been adapted from the related fields of AI, musicology, psychology and music practice which are discussed below.

Evaluating algorithmic music Pearce and Wiggins (2001) presented a framework for evaluation drawing from AI and empirical musicology. As with this research, they view an algorithmic music system as an agent, designed to create music of particular style. This view lends itself to comparison with human compositions through empirical methods. However, my own research differs in that it is not interested in a musical ‘Turing test’, the unstated objective of which is to replace the human composer. In contrast, my own emphasis is on creating tools that empower composers and producers, rather than attempting to replace them. While it is important that the music composed through algorithmic means is acceptable or ‘realistic’ music, my primary interest is the subjective musical qualities that differentiate the composer-agent and the human composer. Qualitative information seems to me to be more pertinent, both to the goal of improving musical algorithms and for understanding the new aesthetics of computational composition. Pearce and Wiggins allude to this when discussing the possibility of “an experiment asking for an aesthetic evaluation of a set of patterns containing machine and human composed music” (2001: p9). My research utilises a combined approach to evaluation of the music that covers critical listening, empirical surveys, concert performance, automatic evaluation and personal subjective evaluation.

Critical listening involves conscious awareness of the subjective listening experience and examination of the music. Subjective effects are noted and there is some attempt to understand the musical causes. Critical listening is commonly employed by composers and producers to improve their works. By playing music to peers, feedback is received as to whether the musical techniques being employed are achieving an appropriate effect, and possible improvements become evident. Pratt (1998) discusses the listening experience in terms of “effects” and

11 “effectors”. Effects are the subjective responses of the individual, and effectors are the elements of the musical surface that contribute to the subjective effects. For example, the effector “high pace” may correspond to the effect “heart racing”. Many other researchers also discuss the subjective process of listening in terms of cause and effect (Kerman and Tomlinson 2003; Beard and Gloag 2005). Huron uses qualitative responses to evaluate the qualia, that is, subjective aesthetic responses, associated with particular scale degrees (Huron 2006, p 146). Pratt’s “effectors”, or elements of music, were employed in the final questionnaire, which I describe in chapter seven. For this questionnaire, the process of critical listening was formalised and data was gathered from a range of participants.

Concerts allow computer generated music to be tested within a realistic setting, however, it is difficult to extract useful data from the audience. This is due to the fact that strangers providing feedback to the composer of the music have a tendency towards politeness, rather than criticism. As well as this, the attention that is being payed to the music may vary significantly between individuals. Nonetheless, successfully delivering the music in a realistic setting is an indicator the music is at least functional – people will usually walk out or complain if the music exceeds the boundaries of acceptability. Because of this, performance with music software has been employed by various computer music researchers as a way to add validity to their work. For example, Biles (2004) regularly performs with GenJam and the computer composed music of the “Illiac Suite” (Hiller and Isaacson 1958) has been played by human musicians at concerts. Although I employed my LEMorpheus software during a number of live events, the research does not rely solely on concerts for music credibility. I also developed a “focus-concert” format, which situates the directed questioning of the “focus group” within the context of a concert – this was applied to the second note level morphing algorithm that I created, as detailed in chapter six.

Automatic evaluation techniques, as advocated by Pearce (2001), can be applied in situations where a set of attributes can be clearly identified as leading to a desirable musical outcome. These attributes function as criteria for judging the effectiveness of the musical algorithm and because they are easily extracted, automatic evaluation of the music can occur. The advantage of automatic evaluation is that large amounts of data can be tested and so various statistics and trends can be inferred. The disadvantage is that, beyond the most basic rules, it is difficult to establish a set of explicit criteria that map to subjective musical effects. I used automatic evaluation in chapter seven by generating a number of morphs with random source and target material.

12 The questionnaires used within this research focused on obtaining subjective responses and musicological feedback from a group of individuals with musical backgrounds. This is essentially an empirical form of critical music analysis (Beard and Gloag 2005). The primary benefit of this qualitative approach is that a range of complex compositional techniques are able to be described and justified by the various respondents. This has the practical advantage that the data generated is able to be fed directly into further development of the software.

The alternative option would have been to develop a set of criteria for a ‘successful’ morph and ask the questionnaire participants to rate the morph examples quantitatively according to the criteria. I consider such an approach to be premature for the current stage of development in the field of note level morphing. This is because there are so many note level morphing techniques that remain to be conceived and explored, that any accurate quantitative evaluation for the few techniques that have been implemented will rapidly become redundant. The criteria used might also be required to shift, with new subjective responses and values associated with new techniques. This is not to suggest that the technique of quantitative assessment is flawed, however, it would be more useful in a more competitive situation. When a diverse range of techniques and systems are already implemented, quantitative evaluation of morphs according to various criteria would be practical in allowing people to select between different approaches for their particular note level morphing application.

The qualitative approach I took appears to be fairly unique within empirical musicology, which is predominantly quantitative, with qualitative techniques typically applied to social aspects (Clarke 2004, p 92). Despite this, common issues were considered, such as the chain of interpretation, the background of the participants, the statistical significance of the data, benchmarks, realism and controlling factors that could influence the outcome of the questionnaire (Clarke and Cook 2004). I chose participants with strong musical backgrounds, which qualified them to interpret their own subjective response. This was useful in linking between musical cause and subjective effect. I provided an additional level of interpretation by summarising and analysing their comments. Because the participant responses included detailed justifications, statistically significant sample sizes were less important than if the quantitative approach had been taken. Benchmark comparisons were used in all tests, from simple cross-fades to live mixes by a professional DJ, to pre-produced morphs by a human composer/producer. A real world context (Windsor 2004, 197) was attempted for one questionnaire, while the subsequent questionnaire was conducted online to allow greater control and publicity.

Composers often utilise an informal process of listening to their own work and improving it rapidly, based only on subjective observation. This is also often applied to the development of

13 algorithmic music software. The advantage with this approach is the ease with which personal judgements are formed, compared to time consuming formal evaluation techniques. The disadvantage is that no substantial claims can be made as to the wider applicability of the music and subtle subjective biases will tend to exert a strong influence over the work, due to the limits of the programmer’s experience. Typically, the process of informal development is applied where musical problems are particularly obvious, with improvements accumulating rapidly to a prototype. Formal empirical methods are then applied to the prototype to detect more subtle problems. The first algorithm I developed for this research – the parametric morphing algorithm which is detailed in chapter five – relied only on my own personal evaluation. This is because I did not consider the algorithm to have reached a standard that was suitable for larger scale formal evaluation. Nevertheless, various musical examples and my own personal observations have been recorded, and these help to understand the affordances of the algorithm.

It should be noted that no formal comparison was made between the three algorithms developed through this project and other, historical algorithms. While this would have produced some interesting results, it is difficult to compare historical algorithms from differing musical contexts, and the continuous iterative development methodology did not call for comparisons between the three algorithms generated at each different stage of development.

Evaluating software The software developed through this research is evaluated primarily in terms of whether or not it adequately demonstrates the musical algorithms I designed – this is because music is the primary focus, rather than software. Despite this, because the topic is interactive morphing, another important consideration is the time efficiency of the system, as expressed by ‘Big O’ notation. Usability of interface design is relevant insofar as I am able to demonstrate and perform using the software and thus no Human Computer Interface (HCI) evaluations were necessary. Extensibility of the software system and documentation is relevant on a personal level as this enables potential improvements in the future; however, it does not serve as a criterion for the success of the research.

1.7 Knowledge outcomes

The knowledge outcomes of this research are novel and significant. Note level morphing of MEM has been explored very little in the past, and there are a number of original aspects that have been investigated by this research, including new algorithmic music techniques, software system design and data gathering techniques. This is made clear towards the end of chapter three

14 which provides a comprehensive review of the field of note level morphing. In terms of significance, there are a range of potential applications for note level morphing, as explained in 1.1 above and throughout the reviews in chapter two and three, including electronic music delivery, computer game music, accessible electronic music tools and computer assisted composition. This suggests that extended knowledge of such algorithms would be significant. As well as this, I have prepared some proof-of-concept demonstrations within some of these application contexts, as documented in chapter eight, the conclusion.

With this in mind, the four most significant and novel contributions made by this research will now be outlined. They are: a comprehensive review of note level morphing, a new software system for note level morphing algorithms, three new morphing algorithms and new data gathering techniques for examination of note level morphing algorithms.

1.7.1 Algorithms

I have developed three algorithms that fit the definition of source-target note level morphing: interpolation morph, Markov morph and TraSe morph.

In developing the interpolation morph, I took a parametric approach to note level morphing. It is similar to the algorithm described by Mathews and Rosler (1969), however it is more adaptive to changes that occur in realtime. Mathews and Rosler’s algorithm is discussed in more detail in chapter three, while the interpolation morph that I developed is the topic of chapter five.

The Markov morph is a probabilistic approach to note level morphing. It provides realtime flexibility, is able to continuously generate new musical patterns and includes user-definable parameters that provide some degree of influence over the musicality of the output. It is distinguished from previous approaches to probabilistic note level morphing in that it utilises depths above one, that is, the conditional probability of preceding events, in conjunction with note similarity measures. The Markov morph is detailed in chapter six.

The TraSe (Transform-Select) morph utilises an evolutionary approach to note level morphing. It is successfully able to generate morphs that, for the most part, are discussed favourably in terms of smoothness and coherence. As well as this, TraSe circumvents the problem of requiring a data-driven approach – which would make it something other than “morphing” – by incorporating a range of compositional transformations to provide musical style. It is also very flexible: a number of different compositional transformation parameter values can be weighted so as to control the style of the resulting morph, the number of intervening states in the morph

15 can be influenced and there are many other parameters, explained in more detail later on. The TraSe morph is particularly apt at performing a range of automatic key-modulations.

1.7.2 Data gathering techniques

Through this research I developed techniques for the formal examination of morphing algorithms as part of a focus concert and a web questionnaire. The most powerful technique was to ask the participants to record their subjective responses to important changes in the morph and to derive a reason for the response from the musical surface. This provided rich qualitative information into the various ways the music was being interpreted, highlighting potential improvements to the algorithm under examination as well as the subjective evaluation.

Importantly, the morphs generated by the morphing algorithm were benchmarked so as to enable the results to be viewed in a relevant context. For example, in the focus concert, the benchmark was a professional DJ who attempted to mix the source and target live. In the web questionnaire, a professional electronic music producer composed the morphs manually, using the same synthesis engine and MIDI files as used by LEMorpheus.

1.7.3 Contextual Review

The review of note level morphing systems presented in the third chapter and also the basis of a paper published at the peer-reviewed ‘Australasian Computer Music Conference 2006’ (Wooller 2006) provides a useful repository of diverse ideas and approaches to the problem. The review serves as a reference point for ascertaining the state and direction of note level morphing and for identifying topics in need of future development. It gathers together a range of formerly unconnected projects and defines note level morphing as a new area of research.

1.7.4 Software

LEMorpheus, the software application I developed for this project, provides a mechanism for the investigation of note level morphing algorithms. It has a modular design which allows the user to select between different morphing algorithms for experimentation and includes music representations and a system for rendering the morphs in realtime.

At the highest level, a simple interface provides control over which patterns and morphs are playing and the most important “morph index” parameter. The morph index can be controlled externally via MIDI. “Table mode” enables the morph index for each part to be controlled individually with reacTIVision fiducials (Jordà, Kaltenbrunner et al. 2005) through Open Sound Control (OSC, an alternative to MIDI).

16

The music representation affords control over key, scale and scale degree without losing the ability to represent passing notes. It is also extensible and may be applied to non-standard tuning systems. The supporting libraries provide a range of analytic, transformational and generative tools as well as algorithms that convert between various music representations.

17 1.8 Thesis structure

Having introduced the research motivations, goals, design and outcomes, more detailed explanations of various aspects of the research can be accessed in the subsequent chapters. An overview of the chapters will now be provided and may be used as a reference throughout the reading.

Chapter one introduces the goal, motivations, topic, research design, knowledge outcomes and thesis structure. The goal is to investigate techniques that both enable greater adaptivity and provide a new set of aesthetic possibilities for mainstream electronic music delivery. The motivation is in providing new capabilities in adaptive mainstream electronic music delivery contexts such as live electronic dance music, computer games, music for people with disabilities and others. The topic is note level morphing – the automatic generation of hybrid transitions between a source and target music. The research design is based on iterative software development and qualitative evaluation techniques. Particularly significant knowledge outcomes were the contextual review of morphing, the software system design, three new note level morphing algorithms and qualitative evaluation techniques specific to note level morphing.

Chapter two situates the thesis within the wider musical context. Firstly this involves a musicological definition of MEM and a review of current MEM practice that uses morphing or is morph-like. This is limited to composition or music production that is performed mostly through manual, rather than automated, processes. Following the discussion of MEM and morphing, the search for instances of manual note level morphing and similar practices is extended to genres of music outside of MEM.

Chapter three is also a contextual review, but with particular focus on algorithmic music systems rather than non-algorithmic music. The first two sections provide a terminological framework that is useful for comprehending and describing algorithmic music systems at the high and low- levels of encapsulation respectively. The framework is then applied to a series of three increasingly focused reviews, beginning with algorithmic composition, then to interactive music and finally note level morphing. The unique position of my own research within the field is made clear from this review.

Chapter four explains relevant aspects of the LEMorpheus software infrastructure that I developed to support experimentation with note level morphing algorithms. This includes explanations of: high level controls over morphing; the loop editor; parameters that affect the morph as a whole as well as each individual layer; music representations and extensible designs

18 that support the note sequence morphing algorithms; and the system for rendering MIDI output in realtime. Some future developments for the software infrastructure are also summarised.

Chapter five details the parametric morphing algorithm, the first note level morphing algorithm I developed. It converts note data into separate continuous parameter envelopes which are then combined and weighted on the morph index during the morph and converted back to note data for playback. An overview of this process is given, followed by a more detailed, fully implementable description. Some informal evaluation and the audible results of the parametric morphing algorithm are then provided. The chapter finishes by detailing some extensions to the parametric morphing algorithm that could be implemented in the future.

Chapter six is concerned with the probabilistic morphing algorithm which I developed subsequently. This algorithm compares the most recent history with either source or target to create a matrix of similarity values, which is then used to predict the next note. Within chapter six an overview of this process provides some overall clarity. Following this, it is described to the level of detail necessary for implementation. The probabilistic morphing algorithm was evaluated both informally and formally though a qualitative “focus concert” study and the methods, results and analysis of these evaluations are also presented, along with audible examples. Problems and possible improvements to the current probabilistic morphing algorithm are then suggested.

Chapter seven explains the final, evolutionary morphing algorithm, TraSe. With TraSe, the source music is put through an iterative process of mutation and selection until the target music is arrived at. At each iteration, an array of mutants are created by a set of compositional transformations, one of which is selected based on a measure of similarity to the target. After the overview and detailed discussion of TraSe, the informal and formal evaluation of it is presented, along with audible examples. The formal evaluation was conducted through a qualitative online survey and the methods, results and analysis of which are included. Following this, possible improvements to TraSe are suggested and an optimisation technique that would allow greater realtime interactivity is scoped out in detail.

Chapter eight concludes the thesis, beginning with some demonstration examples of potential applications of note level morphing that I have prototyped, namely, computer games and live collaborative music making. Some future research possibilities that would either extend and/or complement the current study are then discussed. Finally, some concluding remarks are expressed, relating the contributions that have been demonstrated through the thesis back to the primary research objectives.

19 With the structure of the thesis summarised for easy reference, the research can be explained in detail. As the research is fundamentally an exploration of musical processes, the logical starting point for this explanation is the surrounding musical context, the topic of the following chapter.

20 2 Music and morphing

This chapter examines background musical contexts and their relationship to morphing. It involves musicological description and analysis of a range of contextually relevant music and morph-like composition practices. It is accompanied by audible musical examples and transcriptions. The focus here is on music that is directed, produced or composed manually - automated composition techniques are reserved for the following chapter.

The first section (2.1) explains Mainstream Electronic Music (MEM), the musical genre within which the research is situated. This is fundamental for comprehending the musical intentions, relating to the musical outcomes of the research and understanding various decisions that informed the software outcomes. Occurrence of morphing and morph-like practices within MEM is also examined.

Investigating music outside of MEM (2.2) was also important, but not so much for contextualising the research as for providing developmental inspiration – ideas from other genres that might be combined, modified or recontextualised into MEM compositional morphing algorithms. As well as this, the study of morphing in a broader context established a baseline against which the novelty and significance of new techniques and approaches could be judged.

Overall, the most diverse examples of morphing were found in the avant-garde, where musical boundaries are pushed aside. In contrast, morphing in other genres, both inside and outside of MEM, was less varied, due to stylistic and technological norms. Technology was found to be a major limiting factor, particularly in interactive situations and electronic music contexts, due to the onus on computation and the requirement that the processes are responsive in realtime; an observation that further highlights the demand for this research. That said, the situation is complex, as style and technology both motivate and influence the development of the other. What is seen by one person as a technological limitation might be seen by another as a cherished “feature” of the genre.

21 2.1 MEM: context of choice

MEM was the musical context of the research. The music generated by the morphing algorithms fell within this category, as did the source and target music that was fed into them. Occasionally, for the sake of experimentation, other styles were applied, but all of the algorithm designs were tailored to MEM. Because of the centrality of this genre to the research, it is important to clarify exactly what is meant by MEM and summarise key musicological features, as is done in 2.1.1 below. With the knowledge of this musical context, the various decisions regarding system and algorithm design, as explained in later chapters, can be shown to be consistent with the music and therefore justified.

Briefly, MEM includes any form of metric electronic music based on loops and layers of different synthesised or sampled instruments and sounds. Styles include electronica, downbeat, instrumental hip-hop, break-beat, and many other genres. Electronic Dance Music (EDM) genres that feature strongly in the research include drum and bass, house and trance. Despite the ‘underground’ origins and status of much EDM and other genres, it is clearly more mainstream than the classic forms of ‘electronic music’ such as those of the electro-acoustic and acousmatic traditions, which, with the typical absence of pulse, loops and instrumental layers, are not included within the scope of this thesis but are mentioned in passing in the following section (2.2) that deals with genres outside of MEM.

The term ‘mainstream’ was chosen over ‘popular’, due to possible confusion with the much- discussed corpus of ‘pop music’. While pop music is not necessarily excluded from MEM, a number of the styles mentioned would clearly be outside of pop music. Musically, the focus of pop music is more on vocals and lyrics, while MEM is defined here as being more concerned with rhythm and instrumental elements. Following the tradition of EDM (Shapiro 2000: p 77-78), a piece of music will be referred to as a ‘track’ rather than a ‘song’ due, by and large, to the lack of traditional, lyrical style ‘singing’.

There exist a plethora of styles and, as MEM is relatively new and constantly evolving, the terminology of sub-genres is problematic. In practice, tracks usually possess features of multiple genres and so categories are often used as adjectives to describe a particular piece of music, for example: ‘dark-trancey-two-step-raga-hardcore’. Rather than attempting to work with these loosely defined terms, or attempt to define them, a brief musicology of MEM has been carried out, to define the general features in musical terms that can usefully inform the development of musical algorithms and representations.

22 Following the musicological discussion, a review of morphing in MEM is presented, with examples of both transitioning and hybridity. As multiple-source hybridity (syncretism) is intrinsic to many compositional practices, an exhaustive review would be intractable; therefore only a few key examples that are indicative of more widespread occurrences are shown. Transitioning from one known state to another is less common than hybridisation, but nonetheless widespread, as there are many situations that require such shifts and as with hybridisation, only a few key examples will be used. Morphing appears less common still: however, a complete search is impossible.

This review is important because it provides some idea of the existing, informal morphing techniques, which allows those formalised ones that I developed to be positioned and assessed musically within a larger field. As well as this, the act of surveying provided general inspiration for algorithmic developments. Overall, this serviced the twin goals of the research: to formalise existing approaches and explore new ones. It was found that, despite their abundance, few MEM morphing techniques have been formalised to the point where algorithms could be developed and for realtime situations, the range of aesthetic possibilities are much smaller due to this. Those few formalised morphing techniques that exist are dealt with towards the end of the next chapter (3.4).

2.1.1 Musicology of MEM

Having introduced MEM as the context within which the musical outcomes of the research are located, a more detailed examination of the general musical traits of MEM will now be given. The analysis of MEM in musical terms is pivotal to an understanding of the research as a whole, as every developmental decision was informed by my intuitive understanding of the genre. My knowledge of MEM is made explicit during this section so as to familiarise the reader and also to define the genre more clearly.

Although sound manipulation and vocals are prevalent within MEM, the discussion below pertains mostly to instrumental and note based aspects of the music. This is because the research is on algorithmic music rather than sound synthesis. Due to timbre being pervasive within MEM but only of secondary relevance to the study, it is discussed briefly where relevant, rather than separately and in depth.

An overarching influence on the composition techniques of MEM is technology, which means that an appreciation of the music is assisted through knowledge of the tools that are used. Throughout the following discussion, fundamental aspects of music technology are raised and it is assumed that the reader is already familiar with these. For clarification, the glossary can be

23 consulted. Despite the role of technology, music remains the primary focus of this section and the discussion shifts through key topics that define MEM: Loops and layers, rhythm, tonality and structure.

MEM: Loops and layers

MEM is constructed from repeated loops (ostinatos) of different layers (instruments), with cues to mark the addition or removal of the layers and other changes. This can be represented conveniently through the layout of any sequencing or sound-editing tool.

Figure 1 Five different layers of loops in the Fruity Loops sequencer

All sequencer tools have multiple horizontal layers, each showing a visualisation of the music that plays on that layer, be it a block representing a loop, a pattern of notes or an audio signal. There is usually the ability to loop or copy aliases of the musical patterns, thus affording the use of repetition, which is widespread in MEM. Usually, the loop lengths are powers of two above two, although other lengths based on ratios of three and (much more rarely) other numbers are used. Technically, a producer may implement any number of different layers for ease of manipulation; however, often there are only a few that are easily distinguished by the listener, who is oblivious to the visual layout preferred by the tracks’ creator.

Typically, each layer will perform a different role in the music and be allocated a different instrument. This is in a similar way to much ‘homophonic’ music, which can be defined by the two roles of melody and accompaniment. Common roles for MEM are: percussion, bass, accompaniment, lead, sound effects and cues. Each of these functions can encourage a vast range of effects (Pratt 1998) in the listener, including: energy/intensity (Wundt 1896); predictability or chaos (Meyer 1956; Huron 2006); mood (Huron 2006) and atmosphere (Butler 2006). However, particular roles afford different effects, often indirectly or through combination with other roles.

24 Percussion

Percussive, or ‘rhythmic’ (Butler 2006: p 180), layers are comprised of short events with fast attacks that vary chiefly in dynamic and, to a lesser extent, duration – rather than pitch, timbre and other dimensions. Because of this, the percussion is apt to express the overall level of energy in the track and, through highlighting the rate/tempo and metre, can set up expectations that other roles work around. In doing this, the percussion can also partially indicate the function of the music, that is, whether it is to be danced to (~2.1) or listened to (~2.2). Having said this, the music of styles such as ‘’ (IDM) , ‘braindance’ or ‘intelligent techno’ can blur such boundaries, and people will listen or dance to whatever they please.

The length of the percussion loops are typically quite small, for example, two beats (~2.3, listening to percussion only), four (~2.4) or, less commonly and usually less obviously, eight (~2.5) or more. If they are long, this is usually counterbalanced by a substantial amount of repetition, or looping of fundamental percussive elements. For example, the kick and high hats might follow a repetitive pattern within the scope of one beat, but the snare and clap might vary towards the end of an eight or sixteen beat cycle (~2.6). This kind of variation fulfils the role of a hypermetric cue (discussed below).

The percussive layer can itself be perceived as containing a number of sub-layers, or segregated streams. Typically, there is a sub-pulse metered out in the high frequency spectrum, for example high-hats, shakers, bells, ride cymbals and/or equivalent; a backbeat pulse kept through the mid to high frequency sounds such as the snare, clap, conga, cowbell and/or equivalent; and a downbeat and onbeat pulse in the low frequency, for example kick, toms, a descending chirp and/or equivalent. This multi streamed view is enhanced by polyrhythmic and otherwise contrasting patterns and diminished by unified patterns and audio compression. Cymbals and fill patterns are usually on a separate, much longer loop and, despite often being from the same drum kit, perform a different function – that of the cue, which is dealt with further below.

Percussion is typically expressed using various short sounds with fast attack, often sampled from drum kits and manipulated, or synthesised through an array of techniques, classically for high-hat, snare and clap and descending sine wave for kick and toms. Non-standard sounds which do not replicate the drum kit, but nonetheless bear some resemblance to the musical functions of the various items in the kit, are used in more abstract or experimental styles, for example minimal techno (~2.7) and alternative electronica (~2.8).

25 Bass

The bass, as a lower register, monophonic tonal part, is particularly suited to provide the root pitches of the chord sequence (~2.9) or drone (~2.10), but also often works with the beat to help define the metre (~2.11) or fatten up the kick drum (~2.12, ~2.13); or against the beat to generate more complex patterns (~2.14).

The length of the bass loop is generally longer than the drum loop, particularly when the bass defines a chord progression (as opposed to a drone). Often, particularly in styles where the metre is reinforced by the bass, the rhythmic pattern is repeated after as little as one beat, typically on the offbeat so as to hocket with the kick drum (~2.15), while the pitch pattern can change to define the root of the chord, usually over four (~2.9), but often also eight (~2.16) or, less commonly, sixteen or more bars. As with all parts there will often be variations to the pattern towards the end of a long cycle, which is effectively a cue.

When the bass acts as a drone, the pitch and rhythmic pattern are both repeated. In strict definition, a drone occupies only a single pitch, but often a kind of drone can be set by short looped oscillation between two pitches, usually with the implication of one being dominant (~2.18). Often, bass loops are small and do not indicate a chord progression so much as a drone-riff. This is clear in (~2.19), where the bass riff is repeated throughout the entire track.

Bass tones can be sampled from instruments or synthesised, classically using combinations of saw, square and sine waves in subtractive synthesis. The synthesis techniques used in other tonal parts, such as the lead and accompaniment, are similar to those used for the bass, but in different registers.

Accompaniment

The accompaniment generally aims to complement the other parts through background harmony. This provides the listener with chord-type and, over time, key/scale information which can symbolise different moods. It typically takes the form of a polyphonic tonal instrument, mid- range, playing chords (~2.20). This includes “synth pads”, which are sustained harmonic and textural sounds (~2.21), and chord stabs (~2.24). Pads often feature in arrhythmic and textural sections, such as breakdowns and intros (~2.23). Chord stabs, with the fast attack and decay, have rhythmic significance and therefore can be used to reinforce or contrast the metre, or, if applied sparsely, can serve as punctuation cues for the structure. The loop length of the accompaniment is often is the same as the bass, as they both define the tonality and chord progression. Despite this, they can often differ in length, for example when the bass is a short- looped drone, the accompaniment might still vary over a longer period (~2.25). 26 Lead

The lead is typically tonal, melodic and monophonic, in a mid to high pitch range. It is mostly used to express musical gestures, melodies, tunes or riffs through sequences of notes. It can work with or against the metre (with, ~2.26, ~2.29; against, ~2.30 [ritardando], ~2.28 [a layer of the lead is 3 against metre 2]), harmony (with, ~2.26; against, ~2.27) and patterns in other parts. Due to the range of musical possibilities that the lead may cover, it is well suited to manipulating the listener’s sense of musical expectation, which is fundamental to musical enjoyment (Meyer 1956; Huron 2006).

As a ‘free-ranging’ part, the lead loop is typically no shorter, and often longer, than the length of the bass and accompaniment (~2.31), although sometimes involves heavy repetition. It is the electronic equivalent of the lead vocal or guitar that is prevalent in acoustic genres. This is not to say that lead vocals or guitar do not occur frequently in MEM, but when they do, they will fulfil the same role of the synthesised lead (hence it is unnecessary to include a description of vocals as a role in itself).

The various used for lead are not mutually exclusive and, as mentioned earlier, there may be more than one lead, each with more or less lead-like qualities.

Sound effects

Sounds effects are usually arrhythmic, textural and often created from samples of found sounds (~2.143), voice (~2.33), and any combination of Digital Signal Processing (DSP) and synthesis techniques (~2.32). It is equivalent to what Butler (2006: p 180) calls ‘atmospheric’. They often add extra-musical meaning and/or character to the track through literal (~2.34) or abstract (~2.36) means.

Sound effects on a short loop tend, over time, to integrate musically into the metre and pitch space of the track (~2.41). Sound effects on a medium length loop, around eight to sixteen beats, may take on some qualities of the accompaniment or lead, while sound effects on a longer loop tend to perform the function of cues, which are explained directly below.

Cues

Cues mark significant points within the music, often allude to some form of change and always occur with a length of loop that is longer than most of the other roles. They usually occur at the end and the beginning (~2.37) of a section, often marking an increase or decrease in intensity through the addition (~2.39) or removal (~2.38) of a layer. Like any other role, cues can play with

27 the listener’s expectations. For example, within drum and bass, the backbeat snare is often used as a cue to foreshadow an upcoming section (~2.83). Sometimes they foreshadow a change that does not occur, defying the expectation. In this case, the cue nonetheless adds significance to the point in the cycle at which it strikes and punctuates the passage of time. Sometimes it is difficult to distinguish between a cue and a new layer (~2.86). Cues can take the form of any sound, for example, cymbals, reverse cymbals, reverse drums and sound effects. In most parts, drums, bass, lead and accompaniment, some form of variation will occur towards the end of the cue-loop and this itself acts as a cue.

Ambiguous roles

The roles of bass, accompaniment, lead, sound effects, cues and rhythm are sometimes difficult to distinguish, especially when they are purposefully blurred and inconsistent, being swapped back and forth between instruments. For example, note how this lead synthesiser (~2.42.1) becomes delayed and reduced to the background so as to temporarily become tonal accompaniment, first in a subtle way (~2.42.2) and then stronger (~2.42.3). Another example is the “blip” in (~2.43) which could be interpreted as fulfilling the roles of both lead and percussion. Lead arpeggios, while technically containing no chords, often fulfil the role of background harmonic accompaniment. This is clear in the following example when the chords end and the lead arpeggio takes over (~2.44.1). Later, an additional lead is added over the arpeggio (~2.44.2). The bass can temporarily become a lead if it is shifted up to higher registers (~2.45) or brightened with a “squelchy” filter and varied melodically (~2.145). The snare drum can be pitch- shifted in a melodic way (~2.144). As well as this, not all of the roles will be present all of the time and often there will be multiples, for example rhythm and auxiliary rhythm, lead and second lead, multiple bass lines. Many examples abound that highlight the difficulty in generally defining the aforementioned roles; after all, music that is difficult to define is an inevitable result of the drive towards uniqueness and innovation. However, such examples usually do not deviate drastically – tracks with many attributes that differ markedly from the norm are classed as “experimental” rather than “mainstream”.

Rhythm in MEM

As outlined above, MEM is a style built from looped layers of differing roles. In most cases, the repetitive timing and tempo of the loops in MEM is almost flawlessly consistent, although the material itself is not necessarily precise and mechanistically quantised. The overarching regularity reinforces underlying rhythmic constructs such as beat, metre, hypermeter and accents of special meaning including the downbeat, backbeat and others. It is often observed that rhythm in MEM is of greater significance than in many other styles of music (Shapiro 2000;

28 Neill 2002; Butler 2006: p 4-5), however it is nonetheless possible for the nature and ambiguity of such idealised rhythmic constructs to change radically from piece to piece, over the course of a single track and from listener to listener; an observation that is supported by Huron’s discussion of mental representations of expectation (Huron 2006: p 231).

Within a particular layer of a particular passage, the patterns of event inter-onset, event duration, articulation and other, non-temporal, qualities, can suggest phrase or grouping boundaries of various strengths at particular points in time (Lerdahl and Jackendoff 1983; Krumhansl 2000; London 2007). The density of events correlates to levels of intensity (~2.46) or pace (~2.47). Both surface level rhythmic patterns and the metres that appear to be abstracted from them are conceived on a continuum from even through to irregular. Following Butler (2006) and the colloquial language of MEM, I represent the continuum of regularity through the two contrasting styles of “four on the floor” (~2.48) and “” (~2.146).

That metre is built from observations of rhythmic events is strongly supported by a range of psychological studies, summarised by Krumhansl (2000: p 163) and Huron (2006) and continuums of rhythmic irregularity have been explored in psychological investigations (Krumhansl 2000: p 164). While even rhythms are fairly narrow in definition, irregular rhythms have been described and conceived in multiple ways, without any apparent unified approach (Thaut 2005: p 11-13; Butler 2006: p 81; London 2007). While the listener generally has the ability to attend specific streams or compound them, particular cues and patterns within and across layers can direct attention, thus further influencing the perception of rhythm.

Beat

The beat or “tactus” is fundamental to rhythmic perception in MEM and is defined here as an imaginary periodic event, the frequency of which resonates with the frequency of audible events in the music, at a rate that is comfortable to tap or “beat” along to. This view is supported by some music psychologists (Krumhansl 2000: p 160; Huron 2006: p 176), however, variations have been proposed by others, for example, Thaut notes that some authors perceive the beat as the “audible (rather than imaginary) pulse markings” (Thaut 2005: p 8). Butler goes further in stating that beats, in an EDM context, are “heard, felt and enacted” (Butler 2006: p 91). While this is clearly a popular understanding of the term, it is taken here to be a colloquial definition, secondary to the formal idea of an imaginary beat as defined above, due to it being less broadly applicable to MEM and styles that aren’t “four-on-the-floor” or dance-oriented. Like the beat, “pulse” is taken to be an abstract periodic event, the frequency of which is related to the frequency of perceived events, but is generalised so as to be applicable at any cognisable rate, not only the rate which is most easily tappable (Parncutt 1994).

29 The beat can be divided into various sub-pulses, typically at rates that are a half, quarter or a third of the beat, however there are no doubt many deviations from this general trend, for example, when they are overlaid together (~2.61). Complicating the perception of sub divisions are the various forms of quantisation – groove-templates and swing/shuffle. While standard quantisation will align all events to the closest beat or subdivision specified by the producer, swing will shift every second note later in time (after/) a certain degree ranging from on the beat/pulse to almost on the next (hard swing, ~2.51). Often, the amount of swing is set to imply a triplet subdivision (~2.50). Groove templates have an “offset” value for each quantisation point, enabling the grid to be distorted in any way, but often inducing values from human performances, as is the case with DNA Groove templates (Chokalis 1999) and Desain and Honing’s induction of expressive timing variation (1989; 1992). Obviously swing is encompassed by the groove template style representation. The technique is typically applied over multiple beats, encompassing one or more bars and is thus also relevant to the concept of metre.

Metre

The contemporary concept of metre is an imagined pattern of emphasis within the bar, a hierarchy of different accent strengths at various points in the bar. Computational models of metre that automatically derive the emphasis at each position of the bar from adding all of the note occurrences and their strengths (Desain and Honing 1992; Huron 2006) have tended to match analytical expectations (Lerdahl and Jackendoff 1983), giving credibility to this notion of metre induction. However, in practice, metre is obtained not only from the surface, but also from schemas – as musicians will know, the underlying metre known by the performer can easily be hidden from the audience. Butler (2006) examined a kind of metrical obfuscation that is common to EDM, “turning the beat around”, whereby the first layers that are introduced imply a metre that becomes untenable when the primary rhythmic layers enter (~2.52). Temperley (2001: p 217) and Huron (Huron 2006: p 279-281) describe a similar technique in classical music which they call “the garden path” phenomenon. Clearly, the metre inferred by a listener is dynamically revaluated as the music progresses as well as being informed by previously recognised templates (Desain and Honing 1992; Hannon, Snyder et al. 2004: p 957). If we take metre to be a kind of idealised periodic expectation of rhythmic emphasis, the following observations of expectation are illuminating (Huron 2006: p 231):

“Schematic expectations represent broadly enculturated patterns of events. Different schemas may exist for different styles or genres, as well as for common patterns that cross stylistic boundaries …

30 Dynamic expectations represent short term patterns that are updated in real time especially during exposure to a novel auditory experience such as hearing a musical work for the first time.”

From this, a continuum ranging from schematic to dynamic metres can be conceived. Underlying these expectations is a seemingly inbuilt ability to conceive simple rhythmic patterns based on ratios of 1:1 and 1:2 – that is, the ability to replicate rhythms based on other ratios such as 1:3 and 1:4 appears to be learnt (Krumhansl 2000: p 162).

Traditional music theory has a set of metric schemas, such as 2/4, 3/4, 4/4, 6/8 and so on that apply fairly well to a corpus of pre 20th century classical music. Theory surrounding MEM has not reached the stage where an appropriate set of metric templates can be formalised and widely agreed upon and it is obvious that the traditional models are somewhat inadequate, when arguably the entire genre of EDM, in all its rhythmic complexity, is most accurately described simply as 4/4 (Butler 2006: p 76). In MEM more generally, odd time signatures confusing syncopation (~2.54), skipped beats (~2.53) and macroperiodic polyrhythms (~2.55; ~2.56) do occur, but clearly remain in the minority. Although the metric schemas of MEM are far from canonisation, it is clear that particular patterns are widely used within particular styles of MEM and thus likely to be compressed into a metre by listeners that are highly exposed to the music.

Metre in Four on the Floor

Perhaps the most common metre is indeed 4/4, which is typical of ‘Four on the Floor’ (FF) style dance rhythms. Musicians commonly refer to different pulse cycles within the 4/4 metre (Wikipedia 2006; Kernfield 2007; London 2007): downbeat, backbeat, onbeat, offbeat and upbeat. The downbeat is the first beat in the bar. The backbeat is the pulse on beats two and four. An onbeat is literally any event on the pulse of the beat (four beats per bar). The offbeat is any subdivided pulse that is not on the onbeat, including the eighth or sixteenth note pulses in- between the beats. The upbeat is any event leading directly into the next downbeat (see diagram).

Downbeat Backbeat Upbeat

1 2 3 4 1.5 2.5 3.5 4.5

Onbeat Offbeat 31 Figure 2 Diagram showing the positions of the downbeat, backbeat, upbeat, onbeat and offbeat in the 4/4 bar.

Some variations to the typical pattern of downbeat-offbeat-backbeat-offbeat are notable because they are widespread and thus stylistically recognisable. Ragga, Soca and many other electronic music styles found from the African Diaspora will have a FF kick drum but emphasise the offbeats and subdivisions much more than the standard 4/4, often through the snare (~2.57, ~2.58).

Intense styles of dance will have a ‘galloping’ offbeat pattern (~2.62) which highlights a 16th note pulse. In psi-trance and gabber, the beat is often subdivided into triplets while maintaining four per bar. The extent to which these patterns are perceived as schematic or dynamic ultimately depends upon the listener’s familiarity with the music. It should be emphasised that these are general observations of trends rather than rules and that many tracks are designed to break the standard patterns of their sub-genre in some way.

Metre in Breakbeat

Breakbeat (BB) patterns would be formally classified as 4/4 but cover a plethora of distinctive grooves that contrast with the typical 4/4 pattern more markedly than FF styles. The biggest difference is the change in role of the kick and snare. In FF, the kick keeps the onbeat pulse and the snare keeps the backbeat pulse and the two collide regularly on each backbeat. With BB patterns, the kick and snare rarely occur simultaneously and in many cases could be heard as two different sounds on the one stream. The perception of the high-hats, kick and snare as being a single voice is further enhanced by audio compression of drum loops, so that at any point in time only one of these layers is pushed to the foreground. In some styles of BB the rhythmic patterns can change so frequently that this voice takes on lead-like qualities, playing against the underlying metre to create metrical dissonance. Despite this, there are many repetitive styles of BB, as well as particularly widespread samples such as “funky drummer” (~2.60) and “amen break” (~2.59) that, I suggest, have contributed to some kind of ambiguous metric template of BB. However, I do not consider these to be veridical (Huron 2006: p 275), as most BB listeners have not heard the original tunes from which they are derived. I distinguish between five interpretations of BB – polyrhythms, additive rhythms, syncopation, beat-roles and transformations, each of which are implied to greater or lesser degrees by particular patterns.

Polyrhythms

The definition of polyrhythms used here follows Arom (1991), as the “ordered and coherent superposition of different rhythmic events”. This is favoured over the term cross-rhythm which

32 appears to be controversial (Chapman 2006; London 2007) and hemiola appears to be specifically related to the case of three over two or four polyrhythms in 3/4 or 6/8 time signatures (London 2007), and thus does not apply to BB which is in a 4/4 signature. Importantly, polyrhythms may be truncated, mid-macroperiod (Arom 1991: p 231), to fit with underlying metric and hypermetric cycles and still be considered polyrhythmic1. For BB rhythms, polyrhythms usually only apply to four or two over three type patterns – for example, the two over three kick drum in the following phrase (~2.63):

1---2---3---4--- k k k k k k

Similarly, the eight beat snare pattern in (~2.67) contains two, two over three type patterns:

1---2---3---4---5---6---7---8--- s s s s s s s s s s

Soca and raga rhythms, although closer to FF than BB, also often exhibit a polyrhythm four (snare) over thee (kick) (~2.68).

Such polyrhythms often have the property of maximal evenness and individuation and may thus also be considered “diatonic” (Butler 2006: p 84-85), however this property appears to be more of an interesting side-effect than a rule that can be schematically applied to MEM.

Additive rhythms

An additive rhythm is viewed as a string of atomic rhythmic elements, each consisting of a certain number of beats. The notion of rhythm as being either additive or divisive was originally coined by Curt Sachs (Sachs 1953) who related divisive rhythms to bodily feelings of movement and additive rhythms to speech, a notion that fits well with the idea of the BB drums taking on lead-like qualities. That is, if the kick and snare are able to be seen as two different kinds of accents of speech-like voice, additive rhythms can often seem like a more appropriate interpretation. For example, the beat from the BB classic Unfinished Sympathy by (~2.70) can be interpreted as 2+3+1+2. This is obtained by compounding the primary

1 Although not limited to BB, it is worth noting that extended, full macroperiod polyrhythms are often introduced into the music through delay effects set to two thirds and other divisions of the beat or bar. This is a technique endemic to dub and other reggae influenced electronic music genres (~2.69), but used throughout MEM.

33 kick and snare attacks into the one rhythm, using the eighth-note sub-pulse implied by the primary high-hats as a counter. Some analysts might collapse the 1+2 at the end into a 3 on order to fit theoretical notions of diatonic rhythms, however, considering the strong emphasis of both the kick and snare I feel that 1+2 is more appropriate. As 2+3+1+2 the pattern reflects the widespread tendency to increase the density or rate of events towards the end of a cycle, which is, in this case, a single bar.

Syncopation

Syncopation has slightly different definitions throughout the literature, however it is taken here to be accentuation of weak points in the metre, following Butler (2006: p 85). With this definition, the notion of a syncopated metre may seem somewhat nonsensical – how can a metre, as a pattern of emphasis, be defined by accentuation of weak points? However, if we consider that a rhythmic pattern can be syncopated, and that after repeated exposure to the minds of listeners, this pattern becomes a style eventually attaining the status of schema, it would be plausible to say that a new metre has been spawned that is syncopated in comparison to the original metre. Classic examples are the funky drummer pattern (~2.60) and the amen break (~2.59), which have become almost ubiquitous to BB styles. Butler makes the case for the syncopation interpretation of BB by noting how the:

“second and third (snare) hit seems to dance about beat three without actually landing on it … Through these behaviours, however, both attacks call attention to where they should be, in so doing invoking the presence of the unarticulated beat” (Butler 2006).

I call this a “Chick-A-Chick” (CAC) (Wooller 2003).

The CAC over the third beat is a widespread characteristic of BB and is perhaps becoming part of an underlying schema for BB metre. This speculation is fuelled by the new forms of stronger CAC emphasis that have begun to emerge over the years. From early drum and bass, an example of the CAC being heavily accented by bass squelch parts is evident (~2.64). Stakka and Skynet occasionally reinforce the CAC with a kick instead of a snare (~2.65), without losing the BB feel. This was done to even more popular acclaim by Groove Armada recently in 2006 (~2.66).

Beat roles

I have previously proposed the notion of “beat-roles” (Wooller 2003) and formalised this into a “generative grammar of breakbeat”, initially with the idea of being able to create all the possible patterns of BB kick and snare, and only these, from a small set of rules . These rules constitute a formalised schema for BB and thus must be considered when discussing notions of BB metre. 34 Accepting the limitations of sixteenth-note pulses, the results are quite variable, being able to produce a huge range of different patterns. At the same time, the patterns sound very much like BB and have been applied successfully to a number of live performances. It is important to note that this particular piece of software (LEMu) has already been assessed for my Masters and is therefore not part of this research. However, it is still relevant to mention the concept of BB “beat-roles”.

Any event occurring in the first two beats of the bar is considered “primary” (p) and any event in the second two beats is considered “secondary” (s). An event can either be p or s, “complement” (c) the p or s, or “lead” (l) into the subsequent beat. Each role of p, pc, pl, s, sc and sl has a chance of appearing and a chance of occurring on two or three different sixteenth note slots, depending on the role and the rhythmic layer (kick or snare).

35 Beats in the bar 1 2 3 4

Beat roles for the kick P PC P L S SC S L An example kick sequence

Figure 3 The portions of the bar that can fulfill the various beat roles for the kick drum: primary (P), primary-complementary (PC), primary-leading (PL), secondary (S), secondary-complementary (SC) and secondary-leading (SL). An example of one possible configuration is shown by the red squares.

The precise probabilities for this grammar were determined through music analysis and transcription of a variety of club drum and bass. Another approach could be to train the system on real examples. The simple rules defined by this grammar can represent and generate many fundamental BB patterns. Interestingly, a key feature of BB appears to be the downbeat and the first backbeat. As long as these two features are preserved, a huge variety of other rhythms, particularly with the kick and snare appear possible, while remaining within the BB style. For styles as rhythmically diverse as BB but nonetheless recognisable, templates such as these may be suitable for representing the metre, as periodic rhythmic expectations. An example of the output of the “beat roles” generative grammar for BB is available (~2.71).

Transformations

Finally, it seems obvious that much of the new patterns that have emerged are the result of certain transformations, afforded by sonic and symbolic (MIDI) editing techniques available on the computer. These include phase-shifts, repetitions, rate changes, reversals and others. Many of the patterns in BB are obviously the result of such processes (~2.72; ~2.78), and so it is worth considering the implications of these to BB metre. As for formal schemas, Nick Collins has created software that automatically splices BB samples in the style of studio drum and bass (Collins 2001), while my own LEMu software includes functionality to transform rate, phase and to repeat particular sections. Despite this, I suggest that transformations have actually had a small impact on the metre itself. Particular transformations may have been used so often as to become well-recognised, but still not a predictable part of the one or two bar periodic cycle that would constitute the metre of that sub-genre. However, when it comes to the hypermeter of BB, and also more generally of MEM, such transformations are endemic and come in the form of drum fills at the end of long cycles.

36 Hypermeter

Hypermeter is the pattern of emphasis over periods beyond the metric cycle (Lerdahl and Jackendoff 1983; Huron 2006) and has been touched on by the previous discussion on cue layers. Within MEM, the form of hypermetrical structure is fairly consistent across most sub- genres – the period leading up to the end of a cycle deviates in some way from the patterns that are repeated throughout most of the cycle. I refer to this period of deviation generally as a cue, however, the term fill is often used to refer to a rhythmic cue, which is the most common form of cue in MEM. The fill is usually comprised of either a removal (~2.81); addition (rhythmic layer ~2.82); sonic transformation, including reversal (~2.74); rhythmic transformation (~2.79) including phase and rate changes (~2.84); rhythmic variation (~2.73), repetition (~2.75), substitution (~2.85) or any combination of these. Rhythmically oriented fills create metrical tension which is generally resolved in some way as they lead into the next cycle. Sonic and tonal fills serve to pique curiosity or foreshadow a particular mood.

After the fill is the standard point at which structural changes in the music, such as breakdown, occur. While this pattern is a general trend, there are a number of exceptions. For example, occasionally the fill is on the first bar of the next cycle (~2.87) or start point of cycles in different parts are offset so that a fill in one layer will occur halfway through the cycle of another layer. Hypermeter is also relevant to musical form, however further discussion of this topic is reserved for the section on “Structure” below.

Tonality of MEM

Tonality is defined here as the way a sense of tonic pitch is or is not suggested in the music and the way other pitches, if present, are organised with respect to it and each other to achieve particular musical effects. This is a slight broadening of the term originally coined by Fétis (in Reti 1958: p 7) due to the surprisingly diverse nature of tonality in MEM which does not always conform to the narrower definitions typically applied to classical music that deal primarily with chords and scales (Schenker 1935; Reti 1958) (Huron 2006: p 175). While it would be possible for examples of MEM to be described purely in these terms, in a significant number of cases, if this were the only focus, there would be very little to say, despite the obvious widespread appreciation of the music. In a similar way to the low information classification of EDM as simply “4/4”, it would be a vast oversight to declare that “minimal techno is drone-based” and leave it at that.

Despite the overarching influence of a western harmonic heritage, MEM producers, simply through using new technology and ignoring conservatoire knowledge, are often pushing the

37 boundaries of traditional music practice. In such examples it is more revealing instead to explore atonicity (the apparent lack of tonic), tonal ambiguity, the subtle introduction of tonality through intuitively non-tonal voices, or the harmonic properties of overtones and their manipulation within a single note. At the other extreme, one might imagine the unnecessarily complex tonal analysis may be prompted during passages where a chord synth is played as though it were a lead (~2.22) – it would be simpler to treat the part as it appears to have been produced and perceived.

This would obviously involve a subjective judgement call, however the aim of this section is to describe enough of the tonality of MEM so as to convey an idea of the musical genre that is the providence of this thesis. It should not be construed as a theoretical attempt to authoritatively encapsulate all that is possible and denote precise generative likelihoods.

The current musicological literature on EDM (Keller 2003; Butler 2006) tends to revolve primarily around the rhythm, due to its importance, as discussed above. Literature on pop musicology does not often deal with “musical analysis” (Beard and Gloag 2005: p 11) so much as cultural theory (Hawkins, Scott et al. 2007) and when analysis does occur, tonality does not appear to be discussed in detail (Tagg 1982). As well as this, the musical interest in pop music is primarily in the vocals and this does not adequately relate to the more instrumental (non-vocal), repetitive and drone oriented styles of MEM. The analytical musicology of acousmatic and electro-acoustic music also encounters problems of sonic analysis but to a much larger extent than MEM. While various approaches (Windsor 1995; Battier 2003; Hirst 2003) are somewhat relevant, there currently does not appear to be a framework for tonality that is suitable for MEM and so the following descriptive continuums were conceived, drawing from a variety of other music theories: rate of tonal change over time, the amount of recognisably pitched sounds, the level of harmonic coherence within the audible pitch set and the degree of polyphony.

I will explain these attributes and use them to define MEM. The objective here is to express the musical paradigm of the study and should not be construed as an attempt at defining music for any other purpose. Throughout the explanation of terms, I will present key examples from MEM and apply the descriptive tonal attributes to them in order to build evidence for the definition of MEM and express more of the genre through audible examples.

Rate of Tonal Change (Horizontal)

The rate of Tonal Change (TC) attribute relates to the level of activity within tonal parts – at one extreme, the entire track consists of a constant drone of tonic and/or pitch-set without changing over time (~2.88). A level above this we might observe drones that shift pitch only once in a whole track or at the end of a lengthy cycle (~2.89). A higher level of TC might involve typical 38 chord progressions in the bassline such as the very common four-chord (~2.92) or two-chord (~2.93) varieties. Such progressions tend to gravitate to an underlying tonic (Bukofzer 1947 in Thomson 1999). At a higher rate still, the bassline could form a riff that dances around an implied fundamental bass (Grant 1977) or “Urlinie” (Schenker 1935/1979), an imagined bassline that can be reduced from notes over a span of time (~2.90; ~2.91). Above this level, we might consider lead riffs which are changing in such a way and at such a rate as to contribute to ambiguity of the underlying tonic. This is typified by the “solo” (~2.95). It should be noted that TC is derived from the sum of activity in the various pitched parts. For example in (~2.94) three voices can be heard: the bass that doubles the kick, the mid-high register synth fulfilling the role of bass, and the higher-register lead vibes, all different but adding up roughly to a mid level of TC overall – that is, the tonic and related pitches are not constant, but also are not so wildly variable as to confuse the tonality. Over the entire track the TC does not change dramatically. Having defined TC and provided an example of how it might be roughly gauged, it is now possible to examine how MEM can be described in terms of TC.

Overall, MEM is skewed more towards the “drone” end of the spectrum than the “solo”, with most tracks consisting of two, three or four primary chords in a progression and many, particularly in EDM, consisting of a drone. The solo is, on the whole, a rare occurrence in MEM, although it occurs more commonly in sub-genres of MEM that are similar to pop music in terms of structure and emphasis on the lead-part for interest, for example, the Portishead example above (~2.95). These observations apply to whole pieces of music, whereas if the time span is narrowed onto a particular section, the level of TC may deviate drastically. For example, during a fill section, there is generally an increase in TC either through transformation of a pitched part (~2.99) or addition of a pitched cue (~2.96), while during a breakdown the opposite is often true, due to the introduction of sustained pads (~2.97). In other instances ambiguity in the breakdown is partially conveyed through higher levels of TC in a kind of solo (~2.100). The tendency for MEM to have low to mid levels of TC, to be more “drone” oriented than “solo” oriented, can be contrasted with classical music which, with continual variation and key modulation, has a relatively high level of TC. Some might argue that EDM in particular should be listened to at the macroscopic level of the DJ’s set and that at this timescale significant TC would occur. However, if one considers an orchestral work of the same length, it seems natural that the differences in TC between the two genres would remain. The broad genre of Pop music sits mostly in the middle, with complex lead elements and clichéd chromatic key shifts representing the upper boundary of TC and the more popular elements of MEM representing the lower boundary.

39 Tonal Stability (Vertical)

Tonal Stability (TS) is an estimate of how strong the sense of tonality (as tonicity) is, with primary reference to the tonic, but also to the idealised pitch schemas that the listener carries with them, for example, the minor and major scales. MEM has a mid-level of TS, but varies quite substantially. At the least stable end of the TC continuum, we could envisage pitches that do not suggest a particular tonic and do not relate to any of the scale intervals ever experienced by the listener. Above this, there may be recognisable intervals, but still no strong sense of tonic, as is often the case with whole tone scales. At the mid level, a tonic would be identifiable, but many of the other pitches may be accidentals or extraneous scale degrees that are less fundamental or less “similar” to the tonic. Above this, the tonic may be forcefully emphasised, featuring fundamental intervals such as the fifth, fourth and octave more strongly. The extreme of TS would feature only the octave.

One might ask: why are the intervals of fifth, fourth and octave given such a fundamental role in establishing tonality? Empirical qualitative research supports the claim that they are judged as “stable” and “strong” in musical terms (Huron 2006: p 145). The special fundamental role of these intervals is also apparent in the musicology of most other civilisations (Thomson 1999). These intervals are readily perceived as being similar on neurological (Weinberger 1999) and cognitive (Krumhansl 1979) levels. This also extends to the chords I, IV and V (Krumhansl 1983). On a physical level, the ratio of (the fifth), produces a shorter macro-cycle ( ) between the two than any other ratio below thus could be considered as something of a “best fit” on a physical level – while perfect ratios only exist in metaphysical realms, small inconsistencies in tuning, for example, are typically overlooked during tonal perception, an argument made earlier by Theodore Lipps in 1900 (Thomson 1999: p 89). Due to all of the musical, neurological, cognitive and physical reasons listed above, I consider music which features the octave, fifth and fourth intervals to have a higher TS over other intervals.

In opposition to this, one might argue “why is plainchant no longer popular if the perfect intervals are so important?”. Firstly, TS is not synonymous with musical popularity, in fact, low TS is often used effectively to create interest, uniqueness or uncomfortable moods in MEM. Secondly, most popular music has an underlying harmonic movement that actually is based on perfect intervals, from the classic I-IV-V, I-V-I, the ii-V-I; indeed, even chromatically descending bass lines are often arranged as progressions of ii-V, ii-V, ii-V (a fourth). Even within the most remote tribal desert music that is apparently based on linear rather than logarithmic frequency intervals (Will and Ellis 1996), in Will and Ellis’s Figure 2B, which displays a cumulative view of frequencies in the desert song, I have spotted ratios of a fifth between the most common pitch and third most common pitch and ratios of a fourth between the first and second most common pitch.

40 In addition to the perfect intervals, pitch-class-sets which are familiar to listeners will appear to have higher TS than those that are not. There is a lack of evidence for any particular scale with non-perfect intervals being more or less intrinsically viable from a musical perspective – that is, scales appear to be learnt (Thomson 1999), which is not to deny the evidence of certain affordances of the human mind to guide this learning, for example our propensity for five to nine discrete categories (Baddeley 1994). Because of this, from extended exposure to unfamiliar pitch-sets it is reasonable to assume an increase in TS for that pitch-set over time.

Within scales themselves, pitches have particular functions and can add to the TS by reinforcing a familiar pitch schema. Correlating surface pitches with key profiles is one way to assess the TS of a musical sequence and this formal approach is explained by Temperley (2007: p 53). Accidentals and pitches that are outside the dominant tonal schematic will also reduce the TS if they occur more often than is typical. This is supported by empirical music psychology studies which found that people have a notion that certain pitches fit a tonal context much better than others (Huron 2006: p 148), the individual judgements being averaged into a key profile.

As mentioned, MEM is considered as mostly having mid-level TS. This is justified as, in the vast majority, there is a clear tonic, regular scales are used, most commonly pentatonic minor, followed by minor and Mixolydian (major with flattened seventh). Tonal movement in chord progressions is often between I and V if binary (~2.104), or progressions that include V or IV if ternary or quaternary (~2.105). Sequences with lower TS have less “perfect” intervals in their basslines or a copiosity of unfamiliar accidentals (~2.102). Tracks with higher TS are pure monotonic drones which span multiple octaves (~2.106). As with the other attributes mentioned, TS is dynamic, often changing during fills and breakdowns. As argued by Huron (2006: p 160, 161) and others, the sequence of pitches also contributes to stability, however the details would be a distraction to this current discussion. Nonetheless, the principal is exemplified here (~2.103) where a random-walk and arpeggio are played together, outlining a pitch-set, but not assisting in the definition of tonic.

In comparison, classical music can be considered to have mid-level TS for similar reasons, but deviating towards less TS rather than more, particularly when considering the more recent periods of tonal complexity. In contrast, pop music has mid-high TS, due to the prevalence of standard scales, chords, I-IV-V and fifth based progressions.

Pitch/Noise Ratio

The clarity of pitches has a direct effect on the ability of the listener to develop a sense of tonality – for example, in the case of total noise where there are no discernible tones, it is impossible to conceive of the TC (rate of tonal change) and TS (tonal stability). As a result, the Pitch/Noise 41 Ratio (PNR) is considered here to be a relevant attribute of tonality, particularly for electronic music, which has always involved a significant amount of sonic expression. The continuum can be envisaged with purely untuned and/or distorted percussive sounds and noises at the lower end (~2.107). The highest PNR is music made from pure tones.

MEM overall has a mid range of PNR, but varying substantially between sub-genres and individual tracks. In particularly minimal instances, the percussive sounds are usually tuned in some way so as to suggest a basic tonality, or there is a very subtle application of tones, for example, the high-hat and kick (~2.108). In other cases, sound effects such as ring modulation are used to introduce tones (~2.109). In contrast, down-tempo artists such as Board of Canada are known for their rich tones (~2.112), although in the main sections these tones are usually accompanied by unpitched drums. Boards of Canada often “detune” their , which provides a distinctive sound and does not obstruct the identification of tones. However, some other forms of pitch shifting can disturb pitch clarity and thus would have to be considered as having lower PNR (~2.111; ~2.110). Despite this, it should be noted that foreign and abnormal tuning systems are sometimes used, and these are not considered as having any less PNR due to the tones being quite perceivable (~2.113; ~2.114). A temporary decrease in PNR is often observed during fills, breakdowns and transitions, the dissolution of tonality being associated with increased tension or intensity. For example, DJ Shadow reduces the PNR through a record slow-down (~2.115).

The mid PNR of MEM can be contrasted with the high PNR of classical music; mid-to-high level of PNR in pop music; and the low level of PNR in acousmatic and electro-acoustic music. This is justified as most orchestral voices have a distinct pitch, including some of the percussive parts such as timpanis and triangles. In pop music, there is a heavy emphasis on tonality and pitch clarity and more conventional use of sounds than in MEM, mainly due to more conventional instrumentation and less emphasis on the electronic medium to assist expression. The sound- objects used to compose acousmatic and electro-acoustic music are often not easily recognisable as clear pitches and so have a low PNR. The PNR describes the clarity of tones for a given piece of music, while the TS and TC how these tones are organised to effect the tonality.

Number of Independent Pitched Streams (IPS)

The number of Independent Pitched Streams (IPS) relates to the number of pitched voices being perceived as operating independently and simultaneously. At the lowest end of the continuum is a single pitched voice/part, at the highest end is a dense texture built from numerous voices and in the centre is the typical three to five part tonal voicing of MEM and pop music. Usually there is

42 one bass, one or two leads, and one or two accompaniments. Classical orchestral music can be distinguished by a high number of IPS.

While mid-level IPS is typical in MEM, there is often deviation from this, sometimes with extended periods of none (~2.107), one (~2.116), two (~2.118) and three (~2.117) or more voices.

A subjective judgement call is sometimes needed to determine whether a part contains multiple streams or not. As shown by Bregman (1990), a single sequence of tones, if played with alternating pitches that are related beyond a certain interval, it will be more likely to be perceived as two separate streams. Alternatively, a chord synth that always consists of the same chordal intervals in parallel might easily be classified as a single stream of an interesting ‘chord-like’ timbre (~2.22).

Structure and Form of MEM

Structural analysis of music is defined here as detecting patterns and trends on a macroscopic scale, while form deals with segmentation. Previous literature on musical structure has distinguished between three approaches: neutral, poietic and aesthetic (Nattiez 1990). The neutral approach is objective, the poietic is “emic” or oriented to the perspective of the producer, while the aesthetic is “etic” or perceptually and cognitively oriented. The approach taken here is primarily the latter, while being usefully informed by a poietic technological framework; that is, analysing structure through the tools used to create it, for example MIDI sequencers. The aesthetic focus is appropriate because the primary aim of this section is to explain enough about MEM so as to clarify the musical genre of interest. The topic of musical structure itself is of less relevance to the note-level algorithmic concerns of the thesis than the other aspects of MEM that have been described, and so this discussion is accordingly less detailed.

The most useful approach to the analysis of MEM I have observed has been to represent each layer (kick, hats, snare, bass, etc) in the music on horizontal tracks that are stacked on each other with the presence of a loop in the layer provided in colour-coded (Hill 2005) or texture coded (Butler 2006) rectangular blocks. This macroscopic track layout visualisation is available to the producers of MEM through zooming out with sequencing tools.

43

Figure 4 Screenshot of a macroscopic track layout from the Fruity Loops sequencer.

Other examiners of electronic dance music, such as Keller (2003), borrow from the structural analysis of classical music (Green 1979), drawing curves to represent the overall intensity and marking boundaries with thematic groupings.

MEM uses a wide range of structures depending on the context of the music. Butler (2006) makes an observation that is relevant to MEM:

“Form inheres within a number of different realms … On one end of the spectrum, there is the form of a single track, on the other, that of a complete set. Considerable variety exists within each of these categories: tracks can be experienced in their original versions as well as transformed and combined with other records, and sets can arise in live performance contexts or in the studio.”

I consider a similar structural continuum, from continuous through to discrete forms. Continuous structures are typified by the extended mix sessions of hardcore minimal techno where structural alphabetic segmentation (Whittall 2007) is particularly difficult and potentially fruitless. The music is continually shifting – as one layer is removed another layer is added. The music could be considered as a continuous bridge or mix with fairly consistent intensity. I previously likened this

44 effect of sustained perceptual intensity to that of Shepard’s tones (Shepard 1964) – attention is directed towards new layers, while the older layers are subtly removed, suggesting to the listener a continually increasing intensity (Wooller 2003). A DJ set is typically one or two hours, however the experience of music in a club is usually eight to twelve hours long (Butler 2006). Raves and ‘doofs’ (an Australian-style rave in the bush, typically featuring psychedelic ) often extend for longer periods, sometimes covering whole days.

Less hardcore dance music sets will include breakdowns that enable dancers to gather their breath. Although the music is continuous, there exists a clearly recognisable cycle of build-up, main sequence or further build-up followed by a breakdown. When I analysed The Drum and Bass Arena (Wooller 2003), I found there to be an average of 1.8 breakdowns per track.

More discrete forms in EDM become apparent when viewing individual tracks rather than complete sets. They are usually designed to be mixed and so the intro and outro are typically quite long (intro ~2.119, ~2.120; outro ~2.122, ~2.121 respectively). At the most discrete, where MEM overlaps with pop music, the music is built from contrasting themes that could be easily interpreted as fulfilling verse/ type roles. In this case the intro, outro and breakdowns are typically shorter.

2.1.2 Morphing in mainstream electronic music

Having provided a musicological overview of MEM in the previous section, I will now examine the current techniques and approaches to morphing in MEM. This review of existing methods will serve primarily as a point of reference for comparing the developments that emerged from the research but also as an inspiration to them and is thus directly relevant to the goals of the study as a whole. The review process involved searching online for acts of MEM morphery via peer-to- peer, internet radio, music websites, discography databases, reviews, forums and communications with DJs and avid MEM fans.

A range of examples were found, from pure transitions, to morphs, to hybrids; however the compositional techniques used were extremely limited, particularly in the case of live electronic music. Mixes between pop oriented tracks, particularly those with distinct stylistic differences were perceived as “transitions”. Mixes between tracks with compatible styles featured somewhat longer transitions that enabled a little hybridisation to occur. Extended mixes, more typical of EDM, were perhaps the most morph-like examples of existing MEM music compositions. The studio mixes that were found tended to follow the style of live mixes. Music generated by morph algorithms, rather than composer/producers, were also discovered, but are discussed in detail within the following chapter. The most prevalent practice related to morphing that I found in MEM

45 was the remix, which clearly occupies the “hybrid” end of the spectrum. The techniques used in these examples tended to operate at the level of sampled loops, with most exceptions to this being in the remix. As a result, no clear compositional techniques for note-level morphing were found, although some trends regarding how loops may be combined became clear.

Music of the transitional type occurs in contexts where a variety of songs that don’t necessarily fit well together must be strung together. Examples are pub or function/wedding DJs and, in some cases, computer game audio engines playing chart hits and/or certain pre-specified tracks. The transitions are usually very fast, so that the uncomfortable section in between the source and target is minimised (~2.123).

Live DJ mixing

Mixing tracks that are carefully selected and ordered is at the basis of EDM and Hip-Hop, which allows a greater sense of hybridity to enter the transition than would otherwise be coherent. The essential technique is cross-fading, which is typically an equal-power (logarithmic) fade-out from source while simultaneously fading-in to target.

Controlling where the tracks fade in and out allows the DJ to operate in temporal blocks – for example, foreshadowing chunks of the target track before the transition occurs (~2.124, where the warbling lead and “so twisted” vocals are from the upcoming track), substituting a fill from one record with the other and, obviously, enacting the transition itself. In conjunction with cross- fading, the source and target can be controlled temporally, rewinding or fast forwarding to particular points.

In addition to fading in and out, DJs have the ability to control volume at the bass, mid and treble for each track. This enables three layers from each track to be ‘spliced’. For example, the bass of the source layer might be cut, along with the mid and treble of the target layer – the resulting blend would be constructed from the bass of the target and the mid and highs of the source. While some DJs use external sound effects, computers and drum and bass synthesizers, the ‘two turntables and a two-channel mixer’ is typical. DJs may also ‘scratch’, though scratching is applied more as a percussive transformation rather than a way to integrate the tracks.

Harmonic tonal relationships in track selection

There is a trend within MEM for the transitions to exploit harmonic tonal relationships. EDM and Hip-Hop DJs will often select a target track with a tonic that is related to the source tonic by a fifth or fourth (EDM ~2.125, ~2.128; Hip-hop ~2.132), is a prominent pitch-class from the source and/or results in an interesting and coherent change in the harmonic function of various layers. 46 Intervals other than the fifth or fourth are also often used during mixes. In this example from DJ Chris Scot (~2.126), there is a subtle layer that fulfils the intervallic function of 3-7-3. After the mix, which involves an upward tonic shift of a third, the function becomes 1-5-1. In (~2.127), the bell is first tuned to some higher octave of the tonic. During the transition the bass cuts, which is then replaced by new kick tuned to a fundamental that is higher than the previous bass by one whole tone. The bell remains tuned to its original frequency, but changes tonal function to a 7, due to the kick taking on the role of tonic. Scot goes on to shift down a major third (~2.129) in the same set.

Signalling the transition

As well as exploiting tonal relationships, DJs often also utilise other aspects of the source and target music to assist the coherence of the mix between them, such as cues and other structural features. A typical example is the breakdown, during which a degree of chaos is expected (~2.130). Cues enhance the coherence of the transition, typically through cymbals but also through interesting noises and even vocals. For example the spoken phrase “and I am out the door” acts as a cue during this transition (~2.131) providing an extra level of purpose or intention to the transition, as in, “yes I really am transitioning now – I am out the door and into the new track!”.

Studio mixes of EDM

Mixing in the studio offers a greater degree of flexibility than live, however it does not, on the whole, appear to have been capitalised upon, especially within EDM. As remarked by Butler most studio mixes are treated as a “home-listening analogue to the live dance music experience” (2006: p 21) or as promotional tools for DJs and as a result the technical limitations of live mixing are often carried over as stylistic limitations of studio mixes. There are of course exceptions, particularly when producers, accustomed to the range of studio techniques, create mixes involving source material (for example, MIDI files and synthesiser patches), typically from their own music. However, for EDM as a whole, the technological limitations of live delivery seem to direct the aesthetics of the music – a point which adds importance to the interactive (live) nature of this study.

Hip-hop mixes compared to EDM mixes

Hip-hop mixes appear less restricted than EDM mixes by stylistic criteria such as danceability, and as a result there is the potential for more complex forms of hybridity to appear within the music. With relevance to morphing, beat juggling is a technique where two records are inter-

47 spliced live, along with scratching and other techniques. The “old-skool” Hip-hop philosophy places value on the live performance and physical virtuosity of (Toop in Shapiro 2000: p 96) shunning studio trickery, while the relatively newer Hip-hop movements take advantage of audio effects, MIDI and non-linear editing, more so than the typical EDM mix tape. However with the absence of generalised audio to MIDI conversion tools, application of sophisticated note-level techniques for integration of source and target in sample based music is rare.

Hip-hop producers tend towards audio appropriation while EDM producers tend towards synthesis, so an interesting conundrum exists for the case of EDM producers who mix their own music – Hip-hop producers are less able to create note-level transitions (for example, extended key modulation) between their own, sample-based music, while EDM producers, despite often being eminently capable, often adhere to the style of live, audio-based mixing. If the new techniques for live, note-level morphing developed through this research were widely available, this situation might change.

Computer game music

In a similar way to live and studio mixing, computer game music must also deal with transitions between tracks, however in cases of in-house composition, the task is made somewhat easier due to the fact that music can be composed specifically to be compatible during transitions. Devices already mentioned such as transposition by fifths, half-time and double-time, and layer splicing are used (Electronic-Arts 1998; Apple 2006). Other techniques include changing timbre and note-sequence independently (Rare 1998), changing chord progression and scale, as well as composing bridges when necessary (Fay, Selfon et al. 2003: p 390; Sanger 2004).

Remixes

At the hybrid end of the spectrum are , which range from using only a crucial snippet from the remix subject and composing everything else around it, to simply modifying the speed and adding a complementary layer. Remixes are considered as creative works in themselves and often more of a variation of one track rather than a merger of two. For these reasons they cannot be considered as morphing in the truest sense. The range of possible techniques is as extensive as music production itself, and there are no particular remix techniques which stand out as being particularly relevant to morphing.

Aphex Twin is well-known as a remix artist, with releases such as 26 Mixes for Cash (James 2003) while many prominent EDM producers have been invited to remix the pop singer

48 Madonna. Some artists release their master tracks for free or in competitions to encourage exposure, for example Fat Boy Slim’s Star 69. Other artists are remixed without invitation, for example The Rolling Stones in Paint it Blacker by Plan B (~2.136).

While remixing typically “frames”, enhances or pays tribute to the original, the more subtle use of sampling and recontextualisation appears more generally throughout MEM and this can also be thought of as hybridisation. For example, the Fugees superimposition of a 6/8 classical guitar loop from Requierdos De L’Alhambra over a breakbeat to create the distinctive mood of Family Business (~2.61), is considered more of an act of sampling than a “remix”. uses the term to describe the juxtapositioning of different layers from different recordings (Michie 2003).

Mashups

The underground (not for profit) remixing and bootlegging of ‘mashups’, ‘blends’ and ‘bastard pop’ music is a seemingly huge unmapped area of MEM (Anonymous 2007). Typically these involve the A Capella vocals of one source over the backing from another. The Grey Album, mashed by Dangermouse, was a popularly acclaimed (Gitlin 2004) and controversial (Synthtopia 2004) mashup restricted to only two sources of musical input – The White Album by The Beatles and The Black Album by Jay-Z. Because of this restriction, it seems at first to be particularly relevant to source-target style hybridising that is the topic of this research, although it is more a case of sample recombination from the whole of The White Album being applied to create a backing for each track of the Jay-Z A Capella rapping, a layer which remains unchanged (~2.147). Dangermouse’s detailed approach to sample recombination allows fresh compositional ideas to be constructed from the original Beatles material. Overall however, the mashup genre seems to rely on humorous or otherwise interesting juxtapositioning rather than musical ingenuity and the techniques used, such as layering, are, from a musical perspective, fairly standard. This is not to belie the ingenuity which occurs on the level of audio manipulation - isolating vocals, cutting, pasting, recombination and DSP.

Stylistic combination

At the level of patterns and styles, hybridity, or “syncretism” has some basis in many, if not all, acts of music composition that attempt to fit with a known style, and particularly with the creation of new styles and new compositions from old. There are too many examples to list, but in a MEM context there is the influence of early European electro in early Hip-hop (~2.135), the mixture of dub, soul and hardcore dance in jungle and drum and bass (~2.134), and the influence of classical harmony in tonal EDM (~2.133).

49 Summary of mainstream electronic music and morphing

MEM, as described above, is the musical paradigm within which this research is based. Defining features of the style were described, such as the construction of music from loops of various lengths and layers with various roles, and the emphasis on repetition and rhythm. In particular, BB, with emphasis on the backbeat and irregular rhythms; and FF, with emphasis on the regular, steady downbeat; were explained through various theoretical lenses. The tonality of MEM was analysed through a framework of attributes that consisted of rate of tonal change, pitch/noise ratio and number of independent pitch streams. The structures of MEM were also briefly explained.

Following the musicological investigation of MEM, attention was turned in particular to occurrences of morphing, with live mixing appearing to be the most widespread and notable of these. Although such approaches are widely practiced, it was observed that various technological limitations and aesthetic habits afford only certain forms of hybridisation to occur when transitioning. It is the goal of this research to create technology that enables new, much less limited musical morphing to occur easily.

Motivated by this, the next chapter is an investigation into musical contexts outside of MEM, in order to glean applicable compositional techniques as well as providing a backdrop against which the outcomes of the research can be framed.

2.2 Morphing in other musical contexts

The previous section described MEM and how it related to morphing, which was important for clarification of the musical context under research. This section will range further afield and present particular aspects of other musical contexts where morph-like situations have occurred. All of the examples discussed relate to semi or un-formalised compositional approaches, while formalisations that are explicit enough to be algorithms are discussed in the following chapter.

Analysis of the wider musical context in relation to morphing was important to the research, primarily serving as inspirational material from which relevant ideas could be combined, modified and re-contextualised into MEM. As well as this, it provided a backdrop from which the new techniques could be compared.

As mentioned previously, morphing is conceived as a combination of transitioning and hybridisation and the examples within this section are organised accordingly, from transitions (2.2.1), morphs (2.2.2) to hybrids (2.2.3). The discussion of transitions deals with medleys that switch directly from one song to the next, as well as the arranging techniques that are used to 50 ensure a smooth transition. The section on morphing, although necessarily limited, covers a range of influential works and theories that are directly relevant to morphing. Acknowledging that hybridisation holds a fundamental role in the creative process, discussion of this aspect is mostly limited to techniques which draw from specific musical-surface sources such as centonization, quodlibet (and the plethora of related terms), as well as newer examples that extend outside these fairly historical categories. However, there is also a discussion, necessarily brief, on syncretism, as it relates to the hybridisation of musical style.

A number of interesting musical techniques have emerged from this review of morphing outside of MEM, while the historical and contemporary coverage of “hand-composed” morphing is comprehensive enough to meaningfully position this research within the wider musical context.

2.2.1 Transitions

Outside of MEM, transitions between pieces of music, movements, and themes are a common occurrence. Transitions are defined as when the interval between the two segments is so short and structured in such a way that no convincing sense of ‘hybrid’ can emerge. Compositional techniques for enabling a smooth transition include matching the source and target, signifying the transitional event as well as reinforcing commonalities and differences when appropriate. Medley is the technique of stringing together pieces into a continuous sequence, dating back at least to the sixteenth century (Grove 2007). Many such medleys involve a sequence of simple transitions. Any forms of music with contrasting themes, such as sonata, must also transition between each of the themes. Saslaw (2007) discusses “direct transitions” in key modulation, where the key simply changes, as well as “sequential” modulation, where a phrase is restated in the new key. Bridging techniques and more sophisticated key-modulation techniques invoke notions of hybridity and so are dealt with in the section on morphs below.

A simple and effective approach to transitioning is to match the source and target as closely as possible, thereby reducing the disorienting impact of the change. This involves matching virtually every dimension of music, including tonality, rhythm, pace, timbre, dynamics and so on. For example, in “polka power!” (~2.138) by Weird Al Yankovic, we can observe consistency in instrumentation/timbre, dynamics, vocal, chord, accompaniment and rhythmic styles (“oom-pa”), as well as small distances in the key changes. The first transition is typical of pop music – within2 Wannabe by the Spice Girls, the song switches from minor to major in the transition from verse to chorus and the tonal centre shifts to the dominant (~2.139). As well as this, the lead vocals change from unpitched to pitched. Changing from minor to major in verse and chorus is common

2 Within this song, not between it and any other. 51 enough in pop music for it to be perceived as a coherent shift, rather than an unexpected clash. The tonic and dominant are closely related, as discussed previously, and so shifting from one to the other also contributes to the smoothness. The transition following this, to Flagpole Sitta by Harvey Danger, incorporates foreign material (polka clichés) as a bridge, which, in my opinion, serves to partially erase the echoic memory of Wannabe (~2.140). The tonic shifts down by a whole tone. The transition from Flagpole Sitta to Ghetto Superstar by Pras Michel (~2.137) shifts the key down another tone, while the subsequent transition to Backstreet’s Back involves another disrupting polka cliché bridge before shifting down by a minor third to the relative minor (~2.141). The medley adheres more or less to this formula the whole way through. Similar principals are involved in other medleys, such as in bag-pipe music (~2.142), with consistent tonality and instrumentation providing unity despite the abrupt transitions. Medleys that feature greater hybrid integration of source and target are considered in the following section on morphs.

While matching the source and target obviously reduces the distance between them, a substantial degree of ‘transitional shock’ can be acceptable if it is framed appropriately. This involves punctuation, earmarking3 (Cope 2005) and/or cues – signs that invoke an awareness of musical structure, alert the listener to upcoming changes or reinforce the sense of purposefulness surrounding the new change. As discussed previously, cues include variations or transformation of patterns as well as particular sounds such as cymbals. Within classical music, the distinct slow-down at the end of a phrase communicates the upcoming boundary and, at such a juncture, changing to a new and contrasting segment of music is quite acceptable and to some extent expected (Huron 2006). As well as this, cadences can be used both to mark the end of a passage (Huron 2006) and reinforce the tonality of a new one (Schoenberg 1978; Schoenberg 2006). A cadence that includes I-IV-V has the useful side-effect of highlighting all the pitches in the scale, while a I-V-I includes all but the sixth.

To summarise the discussion of transitions, the most important musical decisions are made in the selection and of the source and target so as to be similar. Additional techniques are bridge sections, cues and cadences. Bridges can utilise pivot chords and notes to emphasise similarity or incorporate unexpected material in order to ‘wipe the slate clean’ and prepare the listener for the target. Cues indicate change, while cadences reinforce the changes in tonality.

3 The ‘ear’ in earmark has no particular musical meaning – it is a form of identification, such as goats that are identified by marking their ears. A motif can be earmarked by a cymbal crash.

52 2.2.2 Morphs

While transitions achieve a degree of coherence through matching the source and target and providing contextual cues, a more powerful, smooth, or at least interesting transition is often sought by the composer, through some kind of bridge that reinforces commonalities and/or has properties reminiscent of both source and target. This means that some kind of hybrid combination inspired from source and target is occurring during the transition and thus it is considered to be a kind of morphing. Musical examples and theories of this type abound and so the discussion is necessarily limited to some of the most influential ones. This includes theories of key modulation, temporal modulation, transitions over musical topologies, the practice of folk music medleys and some key examples of sound morphing. While there are no doubt many more examples worthy in some way of comment, this coverage is detailed enough to provide a range of techniques and a broad, multi-perspective backdrop to the research which, after all, is primarily focused on automated techniques and MEM in particular.

Key Modulation

The modern understanding of modulation, as a clear change from one key to another, arose in the 18th century, while it was through the widespread chromaticism of the 19th century that theorists and composers began to rigorously investigate key modulation as a theory and technique (Saslaw 2007) and as a result there is now a substantial body of literature.

Schoenberg (in Muzzulini 1995) conceived key modulation as having three sections: A, B and C. A is in the original key of the source, but “neutral triads” (triads with the thirds tuned to be half way between major and minor) are played (presumably on a violin or similarly flexible instrument), so as to weaken the tonality of the source key. During B, the key changes to the target but “pivot root progressions” are used to mark the turning point. At C, a new cadence is used to establish the new key. Schoenberg (2006) also referred to Anchluss-Technik, the “joining technique”, likening key modulation to the fixing of wooden boards together at cross- grains giving the wood “adaptive forms” and/or using nails, screws, or roughening the ends and gluing. He also sees “condensation” as a useful modulation technique, bringing elements closer together in terms of harmony, rhythm, melody and dynamics.

In general, pivot chords are constructed from the bisection of the source and target pitch class sets. The key distance, as indicated by the number of common pitch classes shared by the source and target keys, therefore governs what pivot chords are possible (Saslaw 2007). In 1774, Johann Kirnberger published the text Die Kunst des reinen Satzes in der Musik (the art of strict composition in music), which included some suggestions for “quick modulations”, involving

53 three steps from the tonic to five different keys – D minor, E minor, F major, G major and A minor (Ferris 2000). The general principal is to shift to a pivot root pitch that is related by a fourth up or fifth down to the new tonic. Obviously, the aforementioned keys are all quite close to C in terms of key-distance, which makes the task fairly easy.

Within Arabic music, modulation between modes is of central importance to the music (Marcus 1992) and the “gradual” types are particularly relevant to morphing (Marcus 1992: p 178). Arabic music theory distinguishes between modulations that involve a shift of the tonic, and modulations that involve a change of mode, without a shift in tonic. In modulations that shift tonic, it is common to shift at an interval of a fifth, fourth – that is, “the note that starts the original mode’s upper tetrachord” (Marcus 1992: p 177). It is also possible to shift to the third and sixth degrees. A technique to effect subtle tonal change is to shift to the upper tetrachord of the target mode mid-phrase, thus holding off the more unambiguous tonality of the lower tetrachord until the melody descends again (Marcus 1992: p 178).

In his discussion on Irish national music, Travis (1938) pointed out a number of modulation scenarios in ancient Irish tunes that were similar to this, as well as others that were not, such as modulating between major and parallel minor and modulating to major and shifting tonic by a major second. Modulating by a fourth has been noted in Japanese Shamizen music (Tokita 1996). Changes in “non-nuclear” tones (not of the “nuclear” fourth tones) also occur (Tokita 1996), similar in concept to parallel major and minor modulation. As well as this, the source and target tetrachord can be layered into a composite scale (Tokita 1996).

The role of the fourth and fifth interval in modulation is also apparent within Jazz, where modulation is often achieved through cycles of ii-V-I. Iterating this pattern a number of times enables any key to be reached from any other key. For example, changing the tonic chord from major to parallel minor changes the function from I to ii without shifting the root pitch. While the function ii is not immediately apparent, the shift from ii-V-I, which does involve a change of root pitch, then serves to cement the new tonality. Russo (1968) touches on this technique when discussing modulations through the V7 of the new key. He also mentions use of the common (pivot) chord, “direct assumption” of the new key and “modulation through scale movement”, where a scale from the source or target is juxtaposed against the key.

Western theorists that followed Kirnberger examined how to achieve more obscure and difficult key changes, for example Bernard Ziehn in 1888 published techniques for changing to any key through application of nine types of modified seventh chords (Seargent 1933; Saslaw 2007). The advantage of seventh chords is that inversions can be achieved fairly smoothly, providing a close connection between four possible roots. With chords that have the property of symmetry, such as the diminished seventh, inversions can be made without changing the function. Max 54 Reger (1903) published a collection of examples that used pivot chords to modulate from C major to 41 different keys (including double flats, double sharps). Travis (1938) points out that while ancient Irish harpers are likely to have used complex tonalities compromising dissonant chords with three, four or five pitches, the music that survived imperialism achieves modulation through melodic variation rather than chordal complexity. Other techniques for modulation to difficult keys include chromatic alteration in the middle of a phrase as well as emphasising only one or two ‘pivot notes’, notes with pitches that are common to both tonalities (Saslaw 2007). This technique is also employed in Arabic music (Marcus 1992: p 189).

More recently Muzzulini (1995) developed an algorithm to generate pivot root progressions with the properties of inner symmetry, however works with this degree of algorithmic formalism are reserved for discussion in the following chapter.

Temporal modulation

While key-modulation occupies a central role in western music theory, the concept of modulating from one metre, tempo or rhythmic pattern to another is examined less often, and usually only in the avant-garde. Two distinct issues seem to occupy theorists: ‘tempo’ or ‘metric’ modulation; and ‘beat-class’ modulation or ‘rhythmic interpolation’. The former concerns the techniques to effect a smooth transition from one tempo or rate of play to another, while the latter deals with the problems of interpolating particular rhythmic patterns.

Tempo modulation (or metric modulation), is defined as a transition from one tempo to another through a common sub-pulse. No comprehensive theory of tempo or metric modulation exists, although it has recently come under closer scrutiny (Benadon 2004). More than a century and a half prior, in 1832, Fétis speculated that:

“Someday, someone would do for rhythm what had been done for harmony and melody: find the essential transitional element that would admit rhythmic modulations into music”

(Arlin 2000)

Twenty years on, both Fétis and other theorists such as Hauptman, were publishing material with the assumption that “… philosophic principals underlying metric structure are the same as those underlying the harmonic structure of tonality” (Lewin 1981) and applying this assumption to analysis of the musical examples of the time.

However, while there is some justified speculation that, for example, composers such as Brahms were applying tempo modulation to composition in the latter quarter of the nineteenth century 55 through hemiola (Lewin 1981), clear examples of ‘tempo modulation’ only came much later in the work of Elliott Carter, notably in Variations for Orchestra (1954-1955). Carter himself is unconcerned with the task of formalising his compositional process (Carter 1960). Fernando Benadon recently (2004) formalised some aspects of tempo modulation. He pointed out the need to somehow limit the set of possible pivot ratios but offered no particular criteria other than notational complexity and performance difficulty:

56 “for example, a modulation such as ‘dotted eight-note quintuplet equals sixty fourth note’ is conceivable but probably impractical”

(Benadon 2004).

More importantly, Benadon presented a useful formula for calculating the number of possible tempos given the number of pivot ratios and the number of steps (discrete changes in tempo) that are required for the modulation. In his example, the pivot set is arbitrarily limited to the six ratios between each of three, four and five (triplets, quarter notes and quintuplets).

While tempo modulation deals with bridging different pulse rates that share some common ratio, it is a different problem again to consider how to modulate smoothly from one rhythmic pattern to another, regardless of the pulse. Essentially the problem comes down to how the discrete rhythmic values can be mapped into a continuous space. As will be discussed in more detail in the following chapter, Max Mathews (1969) interpolated rhythmic patterns by first converting the inter-onsets into a continuous function, while Daniel Oppenheim (1997) paired notes together in various ways and interpolated the onsets of the individual notes. The problem is explored to some extent in the phasing of minimal music, where polyrhythmic beats cycle in and out of phase. If we consider “in phase” to be the source and “out of phase” to be the target, the intermediate rhythmic patterns (sampled at any regular interval) can be considered as a kind of morph between them. The music of Steve Reich, notably “Piano Phase”, is iconic of this technique. Analysts of Reich’s music, beginning with Warburton (1988) and then Cohn (1992) used the notion of beat classes to describe the rhythmic pattern that eventuates at the point where it is maximally out of phase. Callender (2004: p 39-41) identified a variable that can be calculated from the beat class set and sampling interval over time that, he argues, relates directly to the ability of the listener to predict the rhythmic patterns of the next sample. In analysing more recent and less metrically confined works of Reich’s, Roeder (2003) extended the notion of beat classes to deal with the effects of accent and pitch on the perception of beat classes.

Topological transitions

There have been many notable attempts to conceive of musical spaces which, once formed, could be used to compose music through navigation of the topology. This notion is relevant to morphing, which would be equivalent to moving along some trajectory between two points on the topology.

Polansky (1987) explored the notion of metric distance between patterns in a number of performance contexts. Of particular relevance to morphing was a piece called Drawing 57 Unnecessary Conclusions, where each performer would pick a pattern and draw it on their screen. They would each then pass that pattern to the person on their left and make incremental modifications of their own, until it matched the pattern they had been passed. This is as clear an example of ‘hand-composed’ morphing as could be found, however it is difficult to garner information as to the specific techniques used, due to the improvised nature of the performance.

Roger Shepard (1982) developed a model of tonal pitch space which relates well to perceived pitch relationships, by combining musical pitch dimensions: Circle of Chroma (CC), the Circle of Fifths (CF) and linear pitch space, resulting in a double-helix torus (see Figure 5).

Figure 5 The CF combined with the CC and linear pitch space4

Shepard’s representation was adapted for use in the LEMorpheus software (which I developed for this study), with weightings governing the influence of each dimension in the space.

Guerino Mazzola, has created a substantial body of work in German relating to topologies of music, covering harmony, melody and rhythm, and it has been compiled into an English publication (2002). As with Muzzulini (1995) and Noll (2001), it is computationally explicit and thus relevant to algorithmic music. Callender (2004) uses formal descriptions of trajectories through spaces of particular distance metrics to analyse music that contains continuous transformations. Tenney (1979) explored notions of distance between temporal gestalts, while Rosenboom (1982) used topological concepts to inspire his compositions.

4 Reprinted from (Deutch 1982: p 364) with permission from Elsevier Limited (copyright holder). 58 A number of examples of ‘morphing’ from the western chamber music repertoire also can be perceived as continuous interpolations along various loosely conceived dimensions, as noted by Oppenheim:

“the opening to Beethoven's ‘IX Symphony’ is a transition from chaos into order. The second movement of Berlioz's ‘Harold in Italy’ gradually morphs from the ‘Harold’ theme into the ‘procession’ theme, as a procession is portrayed moving towards, and then away from, the listener. Ravel, in ‘La Valse’ gradually morphs from chaos into a Viennese waltz, and then back to chaos.”

(Oppenheim 1997)

Cage’s Metamorphoses (1938) might be added to this list, as well as Hindemith’s 1943 adaptation of Weber themes (1989), and Philip Glass’s Glass Cage variations of Cage’s piano pieces (2000).

Without doubt, a huge number of other composers occupy this territory, however, for the purposes of outlining some important and relevant aspects of the musical and theoretic background, the overview provided above is sufficient.

Extended medley and similar notions

As mentioned previously, medley is the technique of stringing together themes into a continuous sequence and while many medleys involve rapid transitions, many others utilise more sophisticated bridges and are thus more relevant to the notion of morphing. The practice is particularly widespread in folk music. “Medley overtures” relate specifically to the technique being applied to opera and operettas, while borrowing themes from a range of sources. Medley is employed as a means to achieve continuity and danceability in live performance, for convenience, humorous effect, to consolidate the emotional impact of disparate themes, demonstrating musical cleverness and skill, and possibly other reasons.

Overall, medleys that are popular seem also to have the source and target music well-selected and well-arranged as discussed in the section on transitions above (2.2.1). There does not appear to be a comprehensive catalogue of the more complex compositional techniques, although some have briefly commented on approaches:

59 “Examples in the ‘Fitzwilliam Virginal Book’ regularly repeat each tune in a varied form, and one of the vocal medleys surviving from the 16th century is built on an ostinato bass”

(Grove 2007)

Jigs, reels and marches of Celtic and Gaelic music are commonly strung into medleys to effect continuity. The musical form of individual pieces themselves are also often a sequence of unique themes, for example ABCDEFG (Travis 1938). As with the Fitzwilliam Virginal Book, variation (Travis 1938) and alternating the varied themes from source and target is the prevalent morph- like technique for bridging the transitions. For example, a phrase from the source verse might be varied by changing to the new mode, before switching back to the chorus and then fully into the new verse. This is effectively a kind of “tonal foreshadowing”.

Operetta and musical medley (Grove 2007) such as The Phantom of the Opera Medley by Andrew Lloyd-Weber and The Medley from Les Miserables by Claude-Michel Schönberg, and the blending of Disneyland themes into medleys (Sides 1996) has the effect of binding separate emotive experiences together into a more powerful whole, which, particularly in the case of the latter, also serves to reinforce brand power.

Alongside medley appears a raft of other similar historical practices. The more motley potpourri dates back to 1711 (Wikipedia 2007), which literally means “rotten pot”. The techniques used in potpourri appear less sophisticated, designed primarily for humour. The quodlibet (literally “what you please”) is the classical version of mashup or xenochrony and is sometimes also used to refer to medley. The term has been used since 1544 (Maniates, Branscombe et al. 2007) and seems to provoke a similar sense of low-art distain as potpourri.

Other, even more obscure, musical formats similar to medley, potpourri and quodlibet in historical music throughout Europe include:

“fricassée (France), misticanza or messanza (Italy), ensalada (Spain) … farrago, rôtibouilli, salatade, fantasia, capriccio, pasticcio, and miscellany”

(Maniates, Branscombe et al. 2007)

However, rather than exploring the minutiae of cultural deviations, it is sufficient for the purposes of this section to observe that morphing has a substantial number of precedents in historical music.

60 Sonic morphing

While this research is primarily concerned with the note-level, morphing one sound into another is of tangential relevance, due to similarities in the overall form of the compositional objective. Explicit sonic morphing can be heard in modern orchestral music, early tape music and is now a widespread practice in digitally produced music.

Oppenheim notes the role of sound morphing in orchestral music:

“In modern music the concept of morphing was broadened and applied to entire sonic environments; the first movement of Pendercki's ‘second string quartet’ is a morph from non-pitched, short, noise-like percussive material into sustained notes with a definite pitch.”

(Oppenheim 1997)

Other examples of this include Ligeti’s Athmospheres and Apparitions, as well as Xenakis’ Metastasis.

Typically, however, more flexible and unusual sound morphing can be achieved through mediums that afford direct representation and playback of sound. Trevor Wishart, in the sound art classic Red Bird (1973-1977), explored sound morphing using studio tape editing techniques (~2.148). As with DJ mixing, selection of source, target and intervening material was important, as evidenced by the effort with which the large bank of recordings were carefully catalogued and labelled with a framework of morphological terms (Wishart 1996), so that Wishart could “find the correct bit of tape on the correct reel when (he) needed it” (Wishart 2000). Cutting and pasting (with razor and glue), mixing, filtering, pitch/time stretching, fading and recording techniques were also used.

Many other sound-artists such as Francois Bayle, Guy Reibel (Oppenheim 1997) and Alejandro Viñao (1996) have employed sound morphing for artistic effect.

Sound morphing can nowadays be conducted through DSP rather than manually, and some of the techniques that are used for this are mentioned in the following chapter on algorithmic music and morphing (chapter three).

2.2.3 Hybrids

Hybridity in music occurs on a number of levels, from the musical surface through to more abstract notions of musical style. Optimally, I was aiming to find instances of hybridity that 61 directly utilised the musical data from two specific works, in the style of source and target, as this would involve similar techniques to those required by compositional morphing algorithms. However I was unable to find examples in which this was the case. Despite this, many composers are renowned for appropriating and recombining the work of others, composition students learn a range of techniques that have been developed by others and the drive for musical innovation leading towards new styles through a blend of existing ones is fundamental. The blending of musical style from two or more different cultures is sometimes referred to as musical ‘syncretism’, a term adapted from religious notions of syncretism, originally by Waterman (1948) and later applied to ethnomusicology by Meriam (1964). Hybridity in music is arguably as old as music itself however for the purposes of this discussion the scope is necessarily limited to a few examples.

Centonization is a technique of hybridisation through recombinant composition applied to Gregorian plainchant (Chew and McKinnon 2007). It involves the arrangement of a number of segments of chant, according to various rules. Within the boundaries of the plainchant style, the centonized compositions are hybrids of the various plainchant songs. As opposed to other methods of musical patchwork such as the quodlibet, the juxtapositioning is not intended for humour and it seems to have garnered a little more respect from some (Chew and McKinnon 2007: p 85).

David Fanshawe’s acclaimed African Sanctus is also inspired from church music but hybridised with recorded samples of traditional African music. Choral orchestral and drumming elements that are played live are harmonised with the pre-recorded African music (Fanshawe 2007). Other examples of syncretism between African and European music and the techniques used are documented by Jim Chapman (2006). A more contemporary example African-European hybrid is Lambarena (Courson and Akendengue 1996) which weaves the rhythms, textures and timbres of Bach with traditional African music.

Hungarian composer Bartok was renowned as a synthesiser of styles and freely admitted to appropriation from an eclectic range of Hungarian, Transylvanian, Romanian and Slovak folk music that was the subject of his ethno-musicological work. Clear elements of appropriation include scales, rhythms, melodies and harmonies (Bartók 1950; Gillies 2007). Bartok also noted that other major composers such as Stravinsky must either engage in similar practices or somehow be gifted in the creation of folk music by themselves, due to the evidence of peasant music styles in their music (Bartók 1950: p 22). Later on, Schnittke coined the term “polystylism” to describe similar tendencies in his own practice and that of other composers of the time (Schnittke 1971).

62 These examples, while blending musical style, frame the cross cultural music within predominantly western contexts such as seated chamber performance and audio recordings. It is plausible that the reverse might occur – for example, playing western style chamber music outdoors, continuously over a number of days, with community participation and dancing – however, this is rare. While the nature of hybridity as it exists within the post-colonial paradigm might seem fairly one sided, more recent inventions of EDM culture, speculatively, are at the forefront of a reversal (Brewster and Broughton 1999).

Overall, techniques for hybridising music vary considerably and remain far from formalisation. Despite this a continuum can be envisaged, at one end being direct quotation and appropriation, the other end involving the use of higher-level musical constructs such as musical form,metre, scales, harmony, instrumentation and voice roles. This is related to Chapman’s categories of appropriation (Chapman 2005), which range from “borrowing” through “assimilation” and “syncretism” to “abstract conceptual appropriation”. Although music theories in general seek to explain why and how certain patterns might be seen as “coherent” or “interesting” within a particular style, there appears to be very little music theoretic work on what it takes for a particular instance of appropriation of any type to fit well with a particular context. Composers of hybrid music determine what works through emic means such as experimentation and/or musical intuition, while theorists appear unable to derive precise etic explanations. Doing so would require more general theories of music that are able to resolve differences within each of the elements within the hybrid and theories of such generality are more related to music psychology than music theory and thus more difficult to apply to music composition practice.

63 2.3 Summary of music and morphing

MEM is the primary genre addressed by this research and it can be defined as a popular electronic instrumental practice involving the layering of loops. Typical roles of the layers include percussion, bass, accompaniment, lead and . Cues, which can occur in any layer, are used as structural signifiers. Often the roles are ambiguous and may change over time.

Rhythm in MEM is particularly important and a number of techniques are used by composers to play with the sense of beat. Metre is understood as an abstract rhythmic pattern that is used as a reference point when listening and is constructed from patterns that are both ingrained and learned. Within MEM, two fundamental metric patterns are evident – FF and BB. FF explicitly emphasises the beat, while BB does not and can be interpreted in a number of ways, including polyrhythms, additive rhythms, syncopation, beat roles and transformations. Hypermeter, as the pattern of emphasis over longer, macrostructural time-spans is suggested by the placements of cues.

Tonality is the sense of importance attributed to various tones. Despite the importance of rhythm in MEM, the approach to tonality is sometimes surprisingly complex and requires a new set of terms. I divide the tonality of MEM into four dimensions: TC, TS, PNR and IPS. TC is the rate at which the tonality changes, from drone through to unpredictable solo. TS is the degree to which the music emphasises fundamental tone and tone set, from perfect intervals through to intervals that obfuscate the tonic. PNR is the degree to which tones themselves can be distinguished, from total noise to pure tones. IPS is the number of independent streams of pitch that can be distinguished by the listener.

Structure and form is an important part of MEM, defined through the build-up and breakdown of tension over the entire length of the track. A pragmatic view of structure and form is obtained through the production tools used to create and arrange the music. Structure and form are of little relevance to this research however, which deals with segments of MEM rather than entire pieces.

Morphing is apparent throughout a number of MEM compositional practices including live DJ mixing, studio mixes, computer game music, remixes and mashups. DJs working within a variety of genres tend to select source and target music so as to exploit harmonic relationships when mixing. Producers and composer make similar decisions that involve not only the selection, but also the arrangement and manipulation of source and target music. Transitions are enhanced by appropriate cues. Style combination is another related aspect of MEM composition, however as it is abstract rather than explicit, it is more difficult to formalise and thus apply to this research.

64 Musical contexts outside of MEM also have relevance to morphing. Simple transitions between source and target are often witnessed in medleys and through direct modulation of keys. More complex techniques for integrating the source and target that are more relevant to morphing include indirect key modulation techniques, temporal modulation, topological transitions, extended medleys and sonic morphing. Hybridisation of music occurs through a continuum from direct appropriation through to synthesis of musical style and practice.

Collectively these examinations of morphing in various musical fields showed that while much material existed for inspiration, there were many opportunities for novel and applicable research to formalise and recontextualise current compositional techniques, extend musical possibilities and allow new forms of adaptivity through compositional morphing. This review covered a fairly broad swathe of music and music theory, and theoretic formalisations were mostly at the level of natural language. Complementarily, the following chapter reviews relevant theories and systems that are so explicitly formalised as to be executable on a computer.

65 3 Algorithmic music and morphing

This chapter continues to review the surrounding context, but shifts the focus to explicitly formalised algorithmic music systems. As the goal of the research was to develop new musical algorithms, a review of the field of algorithmic music was necessary so that research opportunities could be defined and influential works highlighted. As with the review of music and morphing in the previous chapter, this acts as a baseline from which the originality and quality of the research can be assessed, as well as providing inspirational ideas. However, in this case, because the objects of investigation are algorithmic, the concerns are technical as well as musical.

Before a detailed comparative study of the various algorithmic music systems can be presented, some terminology and a clear conceptual framework is required. This is provided in section 3.1, which examines previous approaches, both implicit and explicit, before explaining the framework, which was created as part of this research. The new framework operates on two levels – compositional approach and function. The examination of previous frameworks provides some contextual background within which the new framework can be positioned.

Following this, sections 3.2, 3.3, and 3.4 are a series of increasingly narrow reviews where the terms of the framework are employed. Section 3.2 is a broad examination of algorithmic composition, the practice of composing music through a series of formal instructions. While many algorithmic composition projects exist, only the most seminal and pertinent ones are described in detail, while many others are mentioned briefly in passing. This broad review of algorithmic music was relevant to the research insofar as the ideas and techniques for writing musical algorithms could be adapted to more specialised applications such as morphing, as well as providing a baseline from which my own research could be distinguished. As well as this, the range of works cited demonstrate a comprehensive examination of the field and provide a sense of the wider context within which the research operated.

Following this broad review, section 3.3 narrows the scope to interactive music. This is important because of its more specialised relevance to the project. Interactive music, being a younger field, has less established works compared to algorithmic composition but a great deal of recent activity. From the review of interactive music in 3.3 it is clear that many applications could benefit from new techniques for note sequence morphing.

As the novelty and usefulness of note-level morphing is clarified, the question ‘what systems for note-level morphing exist?’ becomes more pertinent. This is answered in the subsequent review

66 of note-level morphing in section 3.4, providing a historical context which highlights a number of research opportunities.

3.1 A framework for discussing algorithmic music

This section defines a number of terms and concepts that assist in descriptions and comparisons of algorithmic music. The framework presented here essentially combines two perspectives, that of the composer and that of the programmer, discussed in more detail in 3.1.2. The half of the framework that deals with high-level algorithmic music, viewing the music algorithm as a composer-agent, is presented in 3.1.3. Various compositional approaches are discussed. Following this, 3.1.4 presents the other half of the framework which is concerned with low-level algorithmic music. This involves analysing simple algorithms, which may or may not operate within a larger system, in terms of how they relate to musical data.

The framework was an important part of the research in that it enabled the field to be carefully reviewed, which in turn resulted in the discovery of new research opportunities, inspiration for further developments and a baseline from which the novelty of the research could be compared. Before the framework can be fully explained however, it is useful to provide some contextual background about other conceptual frameworks in algorithmic music.

3.1.1 Previous approaches

In this subsection, some of the literature will be reviewed to clarify various existing schools of thought regarding algorithmic music frameworks. A clear conceptual framework does not exist, however there are many publications that discuss algorithmic music to greater or lesser extent and all of them assume some kind of conceptual basis.

Barry Truax (1976) proposed a conceptual framework for music theorists based on careful observations of musical practices. The predisposition of realtime computer music to this form of research was made clear. Truax felt the procedural focus was primarily about relationships between system components (including the human) and could be termed “communicational”.

The Generative Theory of Tonal Music (Lerdahl and Jackendoff 1983) examined the psychological basis of hierarchical structures conceived in homophonic tonal music and has significantly influenced the formalisation of music generally. Lerdahl and Jackendoff are careful to present the theory as an aid to music analysis, rather than an “algorithm that composes a piece of music” (Lerdahl and Jackendoff 1983, p 5). The connection to the work of Chomsky is in the “combination of psychological concerns and the formal nature of the theory” (Lerdahl and Jackendoff 1983, p 5), rather than any particular aspect.

67 Before any further reference to linguistics, I will briefly explain some relevant terms, initially presented by Chomsky (1956), where the focus was on structural forms. For Chomsky, a transformation “rearranges the elements … to which it applies, and it requires considerable information on the constituent structure” (Chomsky 1956, p 121). Generative was conceived as the property of a finite set of rules that could potentially create, via recursion, a huge or infinite output with a characteristic structure. For example, he examined different forms of rules, seeking to find one that could “provide simple and revealing grammars that generate all of the sentences of English and only these” (Chomsky 1956, p.113). In music compositional terms, generative grammars capture the essential logic of particular “top-down” approaches.

In reviewing a number of sound and music programming languages, Loy and Abbott applied programming language and linguistic concepts (1985). The potential of Object Oriented (OO) concepts applied to computer music was identified (Loy and Abbott 1985, p 262-263). Algorithmic music endeavours were distinguished by the level at which the composer influences the code (Loy and Abbott 1985, p 238), for example tinkering with programs, writing libraries or creating languages. This approach highlighted the amount of compositional control but not the different relationships between musicality and procedural forms (which was not the intent of the authors). Pope (1991) also reviewed systems that utilise OO technology, thus focusing on various programming tools and their idiosyncrasies, rather than musical affordances. In contrast, the framework developed for this research addresses the relationship between music and process.

Roads (1985) compiled articles to represent the views of composers working directly with computers. The range of techniques and aesthetic considerations were organised into various topics. Of these, “procedural composition” is the most relevant to this research, where the composer/programmer is less concerned with audible outcomes as they are with the algorithms behind them. However, to Roads at the time, the musically innovative aspects of composing with computers dealt with sound structure rather than traditional concepts such as note organisation (Roads 1985, p xvii). The framework presented here is primarily concerned with note-level algorithmic music due to the potential for innovation within a mainstream musical context.

Desain and Honing (1992) adopted a perspective combining music theory, music psychology and Artificial Intelligence (AI) primarily based on music cognition. The insight that “music is based on, plays on, and makes use of the architecture of our perceptual system” (Desain and Honing 1992, p 6) led this and subsequent studies (Desain, Honing et al. 2005) into the development of generalised musical representations and algorithmic musical processes, especially concerned with analysis. Without being prescriptive, Desain and Honing are careful to limit the scope of their search, necessarily ignoring many aspects of music. However, their view

68 facilitates deep examination of musical processes by liberating them from idiosyncrasies and enabling comparisons in a general conceptual space, grounded in the architecture of the mind. While also adopting aspects of AI, for example, viewing algorithmic systems as agents, the perspective behind this framework is musicological rather than music psychological.

Roads explains algorithmic composition systems and representations in The Computer Music Tutorial (Roads 1996, p 821-909). Systems are discussed in terms of history, presentation to the user, interactivity and level of composer responsibility. A special distinction is made between deterministic and stochastic algorithms. Roads also states that, apart from simple cases, there is no perceivable difference between the two (Roads 1996, p 836). Dodge and Jerse (1997, p 341) also examine some “simple cases” and draw the same distinction.

Contrary to this, I argue that some musical mappings are easier to implement than others, for example one-to-one is relatively trivial, which is why particular processes afford, but are not limited to, particular musical outcomes. More importantly, when a system is interactive, normally hidden processes become much more transparent to the audience/user. For example, if a user controls parameters that affect Markov depth or morph index, they will become conscious of their influence and thus more aware of the algorithm underlying the music creation. Thus, for adaptive and interactive systems, addressing the effect of process in music is no longer merely a matter of “compositional philosophy and taste” as Roads stated (Roads 1996, p 836), but an aesthetic and usability imperative.

Laurson introduced the laudable but unreachable aim of musical neutrality for a music software environment in his discussion of PatchWork (1996, p 18), as a state where there is no difference between compositional goals and outcomes. I broaden this term to also mean an absence of unintended musical side effects in any implementation of a musical process.

Central to the model of interactive music systems presented by Winkler (Winkler 1998, p 6), are domains of algorithmic music, namely: music analysis, interpretation, composition. Winkler presents his ideas relating to algorithmic composition through the Max (Puckette and Zicarelli 1990) visual programming environment, which serves as a specific, idiosyncratic framework for discussing musical process.

Schwanauer and Levitt (1993, p 1-6) conceive algorithmic musical processes as machine models of music and useful tools for scientific examination of theories. Unlike Desain and Honing, emphasis is placed squarely on AI as an increasing source of inspiration for procedural music, with little acknowledgement given to the influence of music theory, practice and psychology. AI, as a computer science perspective, facilitates discussion of efficiency but neglects the musical purposes of algorithms.

69 In reviewing AI techniques applied to music, Papadopoulos and Wiggins (Papadopoulos and Wiggins 1999, p 1) admit to difficulty in distinguishing works according to the techniques used within them.

“The categorisation is not straightforward since many of the AI methods can be considered as equivalent: for example Markov chains are similar to type-3 grammars. Furthermore some of the systems have more than one prominent feature, for example ‘Experiments in Musical Intelligence (EMI)’ was categorised as a grammar, but it can also be seen as a knowledge-based system or even a system which learns.”

This was the problem that led to the development of the framework presented below, to investigate more appropriate ways of conceiving algorithmic music.

Overall, discussions of algorithmic music are problematic due to the essentially multidisciplinary nature of the field. Algorithmic music sits at the crossroads of musicology, music practice, psychology, artificial intelligence, engineering and potentially other areas. Scholars have not explicitly addressed the problem of a unified framework adequately and so discussions of algorithmic music tend to be idiosyncratic, confused or skewed towards one particular discipline.

3.1.2 Another approach

As shown above, algorithmic music is discussed from a variety of perspectives, often combined, such as linguistic, musicological, sociological, user-centric, artist and programmer-centric (to name a few). The new framework I have developed and presented here is tailored exclusively for the last two, that is, composer/programmers of algorithmic music. Composer/programmers are necessarily occupied both by the musicality of a system and the executably explicit formalisations of musical process. The former can be understood through the compositional approach behind the system and the musical styles exhibited by it, while the latter can be discussed according to how particular algorithms deal with musical data. This is essentially two different levels of encapsulation – Algorithmic Music Systems (AMSs) and Simple Musical Algorithms (SMAs).

An SMA, in the OO sense, is an algorithm at the lowest level of encapsulation, that deals with musical data of some form in either input, output or both, for example, an algorithm that transposes all input pitches by a third (a more detailed definition is given in section 3.1.4). Contrastingly, an AMS exists at the highest level of encapsulation, typically built from a network of SMAs, for example, EMI (Cope 2005) or any other major work. Each of these two levels require a different set of descriptive terms. SMAs can be understood according to their direct relationship to musical data, while much larger AMSs deal with different forms of musical data to

70 such a great extent and variety that it is more profitable to compare them in terms of the overall compositional approach that they model.

The framework for compositional approaches that are built into AMSs is presented first in section 3.1.3, providing a bridge from the discussion of music composition in the previous chapter. The three major compositional approaches that were identified are abstraction, heuristics and trial. Following this, section 3.1.4 explains the terms needed for the more detailed analysis involving SMAs, in particular, the function and scope of the algorithms.

3.1.3 Compositional approaches of algorithmic music systems

This subsection provides the terms and concepts necessary for a comparative dissection of AMSs in the field and for distinguishing this research from others. Viewing AMSs as “composer- agents” enables meaningful comparisons between these often complex and difficult to categorise entities. While some systems can be differentiated according to their music representations (Wiggins, Miranda et al. 1993), complex works will often use different representations for different parts of the system. As well as this, comparing AMSs in terms of techniques and/or representations tells us little about the nature of musical creativity that exists within the system.

However, if we view AMSs as composer-agents, more musically meaningful questions are afforded such as ‘what is its style?’ and ‘what is its compositional approach?’. These questions are eminently applicable in differentiating systems, as AMSs tend to be designed by people that are inspired by particular theories of creativity and their own personalised approach to composition. While style can be addressed musicologically as per the previous chapter, the notion of compositional approach requires some further explanation. Three different compositional approaches have become apparent – abstraction, heuristics and trial.

Abstraction involves parameterisation and arithmetic, viewing music as parallel continuous parameters upon which arithmetic functions can operate. The parameterisation scheme controls how the music is represented while the arithmetic determines how the data is manipulated. Modelling heuristic composition typically utilises a complex network of rules for recombination which are given probabilities extracted from learnt musical experiences. Trial involves many iterations and a similar process to evolutionary computing or ‘generate-and-test’. During each iteration, some source material may be mutated to generate many possible candidates, from which only a few are accepted.

71 Even with these terms there is some degree of overlap, but more so when considering algorithmic composition toolkits that encourage multiple compositional perspectives. The toolkits, despite having a broad range of capabilities are less agent-like, being not as focused on higher level decisions as the more specialised algorithmic composition applications which are of more relevance to this research. Frameworks for discussing toolkits are covered by others (Miranda 2001; Ariza 2005).

Overall, the compositional approaches explained below present some clear concepts and terms with which complex AMSs can be compared and contrasted with musically meaningful results, however it should be noted that both composers and composer-agents often combine the various approaches in some way.

Trial: generate and test

Perhaps the most obvious approach to music composition is to engage in an iterative process of generation and selection, trialling patterns and searching for a result that fits a set of criteria. Due to the use of an explicit criteria, it can be viewed as more of a “goal-oriented” form of creativity than the other approaches. This is commonly seen as the basis for many forms of creativity, an attitude epitomised by Thomas Edison’s popular quote: “genius is one percent inspiration and ninety-nine percent perspiration”. As noted by Gartland-Jones (2002) the most prominent figure from musical history characterised by this approach is perhaps Beethoven, “who left evidence of his constant exploration and reworking in his many note books” (2002: p 14.2). In computer music composition, Hiller and Isaacson (1958) were probably the first to incorporate this approach in the “generate and test” component of MUSICOMP.

This approach is particularly well modelled by algorithms that utilise evolutionary techniques. That is, the musical data may be transformed and/or combined (as is the case with genetic algorithms) in a variety of ways, producing a pool of possibilities that may or may not be selected. Each of the candidates are rated according to some kind of fitness function and only those that fulfil the criteria are chosen for either the next iteration or for the final result. There are a number of examples of this that are discussed in the review later, most notably, GenJam (Biles 2004).

Within the research I conducted, the final and most successful morphing algorithm was modelled on the ‘trial’ approach to composition – the Transform-Select (TraSe) algorithm, which is detailed in chapter seven.

72 Heuristic: estimation and rules

Despite the straightforward simplicity of trial and error, some of the most notable composers appear to operate through intuitive heuristics rather than exhaustive variation. The heuristic approach emphasises fast, implicit thoughts and actions that fulfil the intention of the composer – a subtle difference to achieving the goal, which is the more explicit attitude of the trial approach (explained above). The historical composer most emblematic of composition through a heuristic approach is Mozart, who was able to conceive major works on the instant (Gartland- Jones 2002, p 14.2). Interestingly, one of the early examples of this approach being formalised in algorithmic music is Mozart’s Musikalisches Würfelspiel in 1787 (Roads 1996), while the most well documented and possibly earliest example is Kirnberger’s ‘Method for Tossing Off Sonatas’ (Newman 1961).

While both Gartland-Jones (2002) and Jacob (1996) view this form of creativity as being difficult or almost impossible to comprehend, I contend, along with David Cope (2001) that composer- agents that operate through recombination and complex sets of rules learnt from numerous observations are already on this path. A clear example is EMI (Cope 1996), which essentially extracts note-by-note probabilities from a huge database, enabling it to compose new works in the style of existing ones. While experts can sometimes detect stylistic inconsistencies, I have found the examples to be surprisingly convincing to most people. While there have been a great number of informal demonstrations, there still has not been any formal “Turing test” of Cope’s work. It is clear that Mozart had an exceptional memory for music, being able to replicate entire works from only one hearing, and therefore is likely to have had some kind of internal “database” of his own (Slodoba 1985). Composing from probability involves “note-by-note prediction”, along with complex rules that can take into account the musical context on many different levels as it unfolds, so as to ensure that the result has some form of coherence. While the database analysis may take some time, the generation is virtually instantaneous, with one operation per musical unit (note or beat). As a result, there is no evolutionary component and no “perspiration” – only implicit heuristics. This model of musical creativity is further supported by the evidence that enjoyment of musical listening results primarily from the interplay between prediction and perception, as purported by computational musical psychologists such as Huron (2006). Examples of algorithmic music that could be said to model the heuristic style of composition, including EMI, are discussed within the reviews (3.2, 3.3 and 3.4) below.

The second algorithm that was developed as part of this research, the Markov Morph, was based on the heuristic approach and is detailed in chapter six.

73 Abstract: parameters and arithmetic

Abstraction in musical composition involves defining relationships, implicit or explicit, between different forms of data, both musical and non-musical. The emphasis is more on exploration rather than goals or intentions. The approach has been employed occasionally by composers throughout history, however it became much more widespread since the adoption of the computer as a compositional tool in the latter half of the 20th century.

The mapping can work both ways, for example, one might map syllables to pitches, an approach used by Guido d’Azzero around 1026 (Kirchmeyer 1968); or one might take musical elements and manipulate them as numbers, as was practiced by the “total serialists” such as Boulez and Webern. For example, pitch, dynamics, duration and onset were all able to be serialised and manipulated through arithmetic. The computerisation of music since the 1950s has enabled a virtual explosion in this approach to composition, for example, the adoption of MIDI in 1983 provided a widespread standard for the parameterisation of note-based music. When dealing with MIDI, one cannot help but be concerned with issues of mapping, whether on the level of sound-to-gesture or, less commonly, algorithm-to-music.

“Top-down” composition (Miranda 2001) is another form of mapping. With this approach, the composer plots the intended levels of intensity through the duration of the piece and demarcates contrasting themes – high-level parameters. From these parameters will be crafted sections of music that correlate more or less to the intensity, according to the composer’s judgement. Music has been conceived in this hierarchical fashion by many composers, for example, Burke, Polanky and Rosenboom’s Hierarchical Music Specification Language (HMSL) (2005) allowed for two dimensional Cartesian “shapes” to be applied to different levels of the musical hierarchy. Other approaches that involve mapping as part of the compositional approach are discussed in the reviews in sections 3.2, 3.3 and 3.4.

Within this research, the algorithm most related to the compositional approach of mapping is the “parametric morphing algorithm”, which is explained in chapter five.

Summary of compositional approaches of AMSs

AMSs, when viewed as composer-agents, can be compared and contrasted according to the compositional approach that they model. I discern three different approaches to musical creativity – trial, heuristic and abstract. Trialling involves transformation and filtration of the music; heuristics involves estimation, rules, recombination and probabilities; abstraction involves arithmetic and parameterisation.

74 It might be argued that this framework is unnecessary – ‘trial’ could just as well be understood as ‘evolutionary’; ‘heuristic’ as ‘stochastic’ and ‘abstract’ as ‘deterministic’. However, these labels position the musical algorithms as computational, rather than musically creative processes. As well as this, they are partially inaccurate – abstraction can involve parameterisation of probabilistic data, while heuristic approaches often include deterministic rules, particularly in clearly defined musical styles such as polyphonic harmony, as explored by Hiller and Isaacson (1958) and Koenig (in Roads 1996). Nor does the trial and error process necessarily involve biological evolutionary concepts such as reproduction, mutation and recombination, particularly at the most primal level of “generate and test”.

All of the AMSs that are reviewed in sections 3.2 to 3.4 can be described and differentiated according to these three approaches. However, while the terms are useful for descriptions of AMSs at a high level, discussion of the low-level SMA components requires some more detailed terminology which is provided below.

3.1.4 Attributes of simple musical algorithms

The previous section presented some terms for discussion of compositional approach, however, it did little to aid discussion of low-level music algorithms. To fulfil this need, suitable descriptive terminology for Simple Music Algorithms (SMAs) will now be explained. In particular, two continuums are introduced: function and contextual breadth. The function continuum describes what the algorithm does to the music and ranges from analytic, through transformational to generative. The SMAs are defined as algorithms that occupy a single point on this continuum. This is in contrast to the more complex AMSs that are able to occupy the whole range of the function continuum and were the topic of the previous section. The contextual breadth continuum relates to how much contextual information is at the disposal of the algorithm, ranging from narrow to broad.

In order to provide a tight definition of the functional continuum, it is first necessary to examine the various ways music is represented and, in particular, how the representations can be changed to be more or less predisposed to the conveyance of “musical” information.

Musical predisposition of representations

At a conceptual level, musical predisposition is a measure of how well a musical representation can portray the musical ideas and facilitate particular compositional processes that are required. It is an important concept, as the position of an SMA on the functional continuum is defined by the differences in musical predisposition of the input and output. If one were so inclined,

75 predisposition could be derived formally by ascertaining the lack of complexity in applying the representation to specific compositional processes. Gauging algorithmic complexity itself is a well defined area of computer science (Russell and Norvig 2004) which need not be reiterated here.

A representation may become more predisposed to a certain musical process by adding relevant parameters or structuring the representation more appropriately. For example, in a note representation that only consists of duration and pitch, adding a vibrato parameter would increase the predisposition towards musical processes that incorporate expression. Representational structure is important also; consider the difference between a sequence (ordered) and a set (unordered) of note events. Without some additional computation, it is impossible for the set to represent the musical structure of a sequence containing the same events. Therefore, sets of individual notes have less predisposition to musical tasks such as “create an eight bar phrase” than do sequences which may already be part way organised as a phrase. To reiterate this claim on a macroscopic level, a large database (set) of licks is not yet a piece of music but a sequence of licks can be. Many other sonic, performative and structural elements affect the musical predisposition of representations and have implications for the nature and potential of the algorithms using them.

It is acknowledged that musical predisposition, if implemented as an absolute measure, fails to represent the specialised stylistic aspects of disparate musical implementations. Instead, this framework only ever uses predisposition as a relational indicator. That is, a representation can only have “more or less” musical predisposition. A systematic method for obtaining a more general musical predisposition of a representation could be to define the musical predisposition of a large group of AMSs and compare the single representation to the average; however, for the purposes of this research, such definitions are not needed.

Analytic Algorithms: reduction

Analytic algorithms tend to reduce the potential data size and the general musical predisposition of the representation by extracting specific features. For example, taking a set of sequences and outputting a set of notes can be considered analytic. The set of notes could be a scale that has been distilled from a database of riffs. Another example could be Schenkerian analytical processes such as deriving a single ‘Urlinie’ or ‘fundamental bass’ line (Lerdahl and Jackendoff 1983; Cope 1996), or even just a chord (Thomson 1999), from a complete work.

76 Transformational Algorithms

Transformational algorithms tend not to have a significant impact on the musical predisposition of the data representations but can alter the information. For example, an algorithm that transposes individual notes retains the parameters and structural relations of note collection as representation, but alters the pitch value of each note. A retrograde algorithm that reverses the order of a phrase retains the sequential note representation while transforming the structural pattern, but the music representation of the transformed phrase is unaltered and thus the musical predisposition is unaffected.

Generative Algorithms

Generative in the context of this framework means ‘musically generative’ – when the resulting data representation has more musical predisposition than the input. For example, a chaos music algorithm that takes a single number as a seed and returns a sequence of notes that can be played is generative.

Contextual breadth

Context is the surrounding information that influences the computation of an algorithm and therefore an algorithmic music system. In OO programming terms, the context is the world-state data and arguments that the algorithm has access to. Context has a breadth or size which can extend along a continuum from localised processes that are context-free to processes that are highly context-dependent. This definition of context-dependency encompasses Chomsky’s (1957), but is applied more generally to continuous as well as discrete symbolic data.

An intuitive way of visualising contextual breadth is to imagine a note positioned on a musical score with a circle drawn around the note that defines the scope of its context. The circle would include previous and future notes in the same part and concurrent notes in adjacent parts. The size of the circle would influence the breadth of context. In practice, the contextual breadth has more than these two (temporal and textural) dimensions. For example, an algorithm where notes are influenced by four parameters has a broader context than a similar algorithm that considers only one parameter.

Summary of attributes of SMAs

SMAs exist at the lowest level of encapsulation and can be conceived along two different continuums, function and context. Function ranges from analytic, through transformation to

77 generative; while context ranges from narrow to broad. The terminology that was defined here has been put to use throughout the reviews in sections 3.2 to 3.4 and later chapters. It is particularly applicable to algorithms that occupy a single point on the function continuum, rather than the larger scale AMSs which often span whole sections, however it can also be used when dissecting AMSs and discussing their lower-level constituents.

3.1.5 Summary of the framework

Most of the previous frameworks for conceiving algorithmic music have been expressed only implicitly, or skewed towards a particular discipline such as psychology or artificial intelligence. The framework for this research relates to the two primary areas of interest – music composition approach and algorithmic function, that is, for dealing with both high-level AMSs and low-level SMAs respectively. Three compositional approaches of AMSs that I identified are trial, which is characterised by mass generation and selection; heuristic, which is based on estimation and rules; and abstract, which involves exploration of parameter mappings. Two primary attributes of SMAs are the algorithmic music function, which ranges on a continuum from generative through transformational to analytic; and contextual breadth, which is an indication of the amount of musical data that the algorithm has access to.

This framework was crucial to the research, as conceptualising the set of attributes described above was an important preliminary step towards achieving meaningful comparisons between various algorithms which in turn allowed for a comprehensive review of existing algorithms. This clarified the state of the art and revealed the most opportune research directions, thus enabling the informed development of musical algorithms to enhance adaptivity in the delivery of mainstream electronic music, which was the ultimate intention. The first, most wide-ranging review begins below in Section 3.2.

3.2 Review of algorithmic composition

Having explained preliminary concepts (3.1), this section presents a review of algorithmic composition. As the research was concerned with the development of algorithms that generate music, an examination of techniques that other practitioners used for algorithmically executing compositional decisions was crucial. This allowed trends and niches to be identified which guided the direction of research and inspired developments.

Algorithmic composition can be defined as the creation of music from a formal sequence of instructions. This is a broad category that technically includes all music generation via software. However, the use of the term ‘composition’ rather than ‘music’, implies a contemplative approach

78 where the process generally occurs in compose-time or through batch-processing rather than ‘realtime’. Realtime systems are reviewed in the following section 3.3 which examines interactive music in particular.

Composition also implies operation on the note-level, through the historical association of western music composition with notation. While not generative on the note-level, DJ agents can be considered as ‘composition algorithms’ on the level of musical form and are included particularly due to the relevance of mixing to morphing.

Some other application contexts for algorithmic composition include composer agents, computer assisted composition tools and sonifications. Because the application determines the music composition goals, using these categories enables an efficient comparison of divergent techniques and compositional approaches that are applied to similar goals.

Composer agents are concerned with mimicking the creative compositional act. Composition tools are designed to extend the capacity and ease with which composers create music. Performance rendering is about breathing life into otherwise mechanical music and focusing particularly on articulation, short term tempo fluctuation and dynamics. Sonifications take any form of abstract information and produce a musical or sonic result that in some way is a sonic reflection of the original source. DJ agents create playlists from music audio databases and aim to mix between the tracks seamlessly.

3.2.1 Composer agents

Composer agents simulate the music composition aspect of human intelligence. Different applications vary in the level of human interaction and the source of musicality. Four works which are significant for various reasons are examined below: MUSICOMP, EMI, CONCERT and Madplayer. The first three generate classical music and Madplayer deals with electronic music genres relevant to this research. While almost any algorithmic composition could be framed as a composer agent, the examples discussed within this subsection explicitly pursue the goal of human-like composition, while those discussed in other subsections are based on other goals.

MUSICOMP

Likely the first example of algorithmic composition with computers was a system designed by Lejaren Hiller and Leonard Isaacson (Roads 1996, p 830) which produced The ILLIAC Suite for String Quartet in 1957. There were four sets of experiments conducted throughout the project, each exploring different aspects of the problem of applying the computer to music (Hiller and

79 Isaacson 1958). MUSICOMP was generative and transformational, without any analytical component. The compositional approach modelled by MUSICOMP was partly heuristic and partly trial. The general procedure in the first two experiments was to modify and constrain “random white-note music” using musical rules that were mostly inspired from counterpoint. Some rules use pre-composed material, for example the insertion of a C chord at the beginning and end, or cadence completion. Other rules were “No parallel unisons, octaves, fifths” or “no more than one successive repeat”. The third experiment applied the same generate-modify- select (Alpern 1995) structure to a contemporary musical aesthetic. The fourth experiment used various Markov processes and in some cases a system for intervallic tonality. By controlling the influence of various probability distributions, musical features such as consonance and dissonance could be influenced. The tonality system provided an additional increase in contextual breadth by marking pitches for tonal reference.

David Cope (Experiments in Musical Intelligence)

David Cope has arguably produced the most intensively developed body of research into music composition algorithms through a number of projects, the most well-known being EMI (Cope 1991; Cope 1996). In 2003, EMI was ‘killed’ by Cope, apparently to instil the body of work that had been generated up to that point with a greater degree of uniqueness that could potentially enhance its musical value. Other projects include SARA, Sorcerer, Serendipity, Apprentice and the most recent, Emily Howell. EMI replicates historical music styles through sophisticated recombination; SARA is a scaled down version of EMI; Sorcerer analyses music for allusions to other pieces; Serendipity creates a database of music by trawling the web for certain forms of MIDI files; Apprentice can extract and generate musical form; and Emily Howell is an experiment in the creation of new styles, rather than replication of existing styles (Cope 2005). Throughout these projects and publications, Cope has contributed a range of effective techniques to algorithmic composition.

From 1980, EMI was used to generate a huge number of classical music scores of differing styles (Cope, Bach, Mozart, Chopin, Brahms, Joplin, Gershwin, Bartók and Prokofiev), many of which can be heard over the internet (Cope 2004). The details of the process behind this music have been explained to some extent already by Cope (1991; 1996; 2005), and so only significant or pertinent aspects will be highlighted here. EMI incorporated analytical, transformational and generative components and a broad context. The compositional approach modelled by EMI was heuristic – the database analysis forms a huge array of rules with probabilities which models the kind of subconscious repository of musical heuristics that composers may use. There was no element of trial and error or abstraction within the system itself, although, the human selection

80 of material and parameter/controller tuning no doubt had some additional influence on the compositional approach.

EMI required a large database of music, preferably in the style of a particular composer. Initially, the scores must be hand-treated to clarify ornamentation, time signature, key signature and texture. Schenkerian analysis is applied to extract the fundamental harmonic movement (urlinie) and categorise each beat (division of the bar) into a harmonic function using the SPEAC scheme: statement (s), preparation (p), extension (e), antecedent (a) or consequent (c). Each of the categories is defined by the user with pitches that are expected to be heard within that category at various levels of acuity from background (more fundamental) to foreground. The categorisation is then a process of counting the pitches on each beat that match with the pitches defined by the user for each category, weighting the occurrence of background/fundamental pitches higher (Cope 1996). The dynamic and duration of a note also contributes to the impact it has on SPEAC categorisation for a particular beat. The algorithms used in the SPEAC program itself and possibly within other programs such as Apprentice, also use attributes such as timbre, harmony, melody, dynamics, spacing, texture and others in the determination of SPEAC identifiers.

As well as a Schenkerian-type analysis, pattern matching was used to find “earmarks” and signatures – patterns, which are almost gestalt in nature, that recur at different locations throughout the database. The signature pattern is considered as a particular stylistic trait, while the earmark is a structural indicator (for example a cymbal crash). In EMI, a pattern matching process scanned through the database, finding similarities in dimensions, such as pitch, interval, harmony, rhythm and many others that are more complicated and involve removal and generation of notes. The degree of leniency that is allowed in the similarity measurements for each dimension was user-defined.

The music was generated through beat-by-beat prediction, where the probabilities of one group following another are derived through a statistical analysis of the music within the database that has been analysed as above. This is essentially an augmented Markov chain, the primary difference being that additional attributes specifying the relationship of each note to certain important notes that appear at structurally significant points in the future are included, thus imbuing the music with a degree of structural coherence.

While global musical structures were manually incorporated into EMI, they are automatically generated in the program Apprentice (Cope 2005). Apprentice generates structure for a piece through a generative grammar that uses the set of SPEAC identifiers as non-terminal variables and particular rules defined by Cope as the set of productions.

81 CONCERT

Mozer (1994) published experiments into note-by-note prediction using the CONCERT system. The test was to see if a complex connectionist system, without explicit structural representations, could learn structural trends and utilise them in coherent compositions. Note-by-note prediction, like Cope’s beat-by-beat prediction, is a heuristic compositional process. CONCERT utilises analytic and generative, rather than transformational, algorithms.

The system is initially trained to predict the next note in a sequence or training set. To compose music the first few notes of a known sequence are then given and the strongest predictions are fed back recursively into the input, thus predicting an entire composition. The Artificial Neural Network (ANN) consists of an input layer which is provided by the pitch, duration and chordal data, a layer containing a distributed or combined representation of the input; a layer containing pitch, duration and chordal information separately, and a layer containing output probabilities for predicting the next note. In an attempt to provide CONCERT with a sense of musical structure, an additional layer contained a distributed reduction of the musical history or ‘context’.

The representation scheme was interesting in that it attempted to reflect the psychoacoustic similarities between certain values of parameters. For example, pitch was collectively represented through pitch height, a Circle of Chroma (CC), and a Circle of Fifths (CF). Using non-chromatic representations such as the CF meant that in some cases the distance between, for example, C and G, would be recorded as closer than the distance between C and C#. This concept was extended in the harmonic and rhythmic domains, using the tone series and a “circles of beats” respectively. The circle of beats seemed equivalent to using a swing/shuffle dimension.

Tests with listeners found that, while CONCERT’s compositions were preferred over that of a third-order Markov chain, they lacked global coherence. Although Mozer concluded by doubting the compositional ability of note-by-note prediction, some suggestions were offered for future research. While work in ANN composition has been carried out since (Chen and Miikkulainen 2001; Eck and Schmidhuber 2002), Mozer’s recommendations have not been fully investigated and global structure remains the most significant obstacle to this form of composition (Chen and Miikkulainen 2001). Musical examples that illuminate the problem of global coherence can be accessed online (Eck 2003).

Madplayer

More recently, the Madplayer (MadWaves 2004), a hand-held music device with a composition algorithm and synthesiser, demonstrated the use of algorithmic processes to create mediocre

82 music in a range of electronic music genres. Some aspects of the Madplayer have been discussed in the patent application by Georges and Flohr (Georges and Flohr 2002), however this does not provide explicit algorithmic detail. The patent mentions “compositional algorithms applied to musical data” (2002). The compositional approach is likely to be heuristic – it appears that tunes are generated instantaneously and from musical material, rather than through intensive trial or from abstract, non musical data.

The rhythmic patterns of the drum, bass, riff and lead are always two bars or less in length. The harmonic progression can be longer than this depending on the style (two bars long in Drum and Bass and eight bars long in Ambient). The harmonic shifts influence the tonal parts, making the perceived pattern length longer through transposition. Sonic changes such as dynamic filter cut- offs also operate beyond two bars, which also affects the perceived cycle length of the pattern. Generally as the piece or section progresses, layers are added. Some layers seem beyond the control of the user. Fills can occur during transitions between sections or within a section itself depending on the length. There are a few different fill types that are applied during these sections depending on the style: part muting, snare rolls, cymbal crashes and transient musical events such as chaotic lead patterns.

For melodic composition a probabilistic grammar, as opposed to a single probability set, seemed likely. In the “trance” style there were a few different forms the bass line could take, with perhaps two or three different types of stochastic algorithms for bass generation within this style. Clear three over four rhythmic passages within phrases were often present. Random compositional transformations, such as changing the rate of the pattern or quantising it to three over four or two over three, may have been used to achieve this. This is supported by the similar re-assemblage of audio drum patterns. If transformation techniques have been used on drum samples, then it is quite likely that similar techniques would have been applied to MIDI patterns. The first clue that points to this is the fact that although the instrument/timbre used by the melodic part is subject to change, it is not possible to change the instrument for the drums. The drum part must be completely recomposed in order to obtain a change in the drum sound (which of course will change the pattern as well). This indicates that drum samples have been used. If the drums consisted of MIDI patterns then it would be easy and probably desirable to allow the user to change the instrument. Despite this, some sounds are recognisable in other patterns. When the drums are recomposed, various parameters are reshuffled to create a new drum loop that sounds unique: spatialisation (reverb level and panning), delay, pitch, pitch bend contour, volume, volume contour and reversing of samples. Especially in styles such as Drum and Bass, it sometimes seems that different forms of cut and paste type compositional changes have been applied to drum parts. This is due to the strange shifts of dynamic present in some of the drum sounds.

83 Overall, it is difficult to understand exactly how the Madplayer works, however, the techniques are convincing within the bounds of mediocre music composition and it is clearly a substantial development.

3.2.2 Computer Assisted Algorithmic Composition tools and toolkits

Computer Assisted Algorithmic Composition (CAAC) tools are designed to enhance the creative musical abilities of composers and music producers. The existing software has varying degrees of algorithmic assistance, programmability and usability – values which define this form of algorithmic music. A range of CAAC software is available (IBM-Computer-Music-Center 1999; Farbood 2000; Amitani and Hori 2002) and is reviewed in most computer music books (Roads 1996; Dodge and Jerse 1997; Miranda 2001; Rowe 2001).

The first commercial sequencer, aptly titled Sequencer (Spiegel 1987), developed in 1985 by Dave Oppenheim of OpCode Systems, provided basic manipulations – add and subtract – over note parameters such as pitch and duration. M (Zicarelli, Offenhartz et al. 1986) is one of the earliest examples of a commercial composition tool with a range of extended algorithmic features that is still being developed. Music production applications such as Cakewalk and Logic are semi-programmable, utilising Cakewalk Application Language (CAL) scripts (Cakewalk 2004) and the Environment page (Emagic 2004) respectively. The TS-Editor (Hirata and Aoyagi 2003) is a unique application that partially embodies the Generative Theory of Tonal Music (Lerdahl and Jackendoff 1983).

Open source programs such as Musical MIDI Accompaniment (MMA) (Poel 2005) and Pymprovisator (now abandoned) (Álvarez 2005) as well as commercial programs such as the original Band-in-a-Box (BIAB) (Amitani and Hori 2002; PGMusic 2004) and its more recent competitor Jammer (SoundTrek 2005), are programs designed to facilitate the rapid creation of standard music. The user defines the chordal structure of a piece and the various parts that play it. As well as representing a MIDI channel and instrument, the parts contain a large bank of short musical phrases which are combined and harmonically adjusted to match the chord progression. These programs are fundamentally recombinant, like Mozart’s Musikalisches Wurfelspiel (Roads 1996) although a basic kind of supervised probabilistic grammar is often employed to enhance the process. For example, in Jammer the non-terminal parts or musicians contain the non- terminal musical fragments or riffs which are labelled either as grooves or transitions, and which contain the terminal notes. Transitions can be further defined using three sliding scales that indicate whether it is the kind of transition that leads well into riffs, notes, or chords which is

84 interpreted by the recombinant algorithm. Within grooves, the probability of various riffs occurring can be defined by the composer.

Music Sketcher is a technology preview released from IBM in 1998 (IBM-Computer-Music- Center 1999) and abandoned since. It provides substantial assistance with tonal and harmonic operations. The user inserts and arranges pre-composed blocks of musical data called riffs which are modified parametrically – there is no access to the notes themselves. This could be thought of as a supervised recombinant and transformational system. “Modifications” to standard musical parameters such as velocity, duration, rhythm offset and pitch are applied to the blocks via contour graphs. The different operators (plus, minus, multiply and divide) used to affect these parameters seem to have been chosen for mathematic simplicity rather than musical relevance. In a similar way to BIAB and Jammer, the harmony track describes the chords and tonality of the sequence. The Smart Harmony system (Abrams, Oppenheim et al. 1999) performs an analysis on the original riff, and re-maps the pitches to suit the tonality and chord progression (IBM- Computer-Music-Center 1999). Many musical examples have been created with Music Sketcher and can be viewed within the program itself, which is available on the CD accompanying the book Composing Music with Computers (Miranda 2001).

Lower-level tools and libraries such as Nyquist (Dannenberg 2002), CSound (Vercoe 2000), Common Music (Taube 2004), AC Toolbox (Berg 1992) and jMusic (Sorenson and Brown 2000; Sorenson and Brown 2004) are the most programmable and algorithmically sophisticated, although, they typically lack usability in that they require the composer to have knowledge of programming. Extensions to lower-level tools such as CSound’s Cecilia (Burton, Piché et al. 2004), my own sequencer for jMusic in LEMu (Live Electronic Music) (Wooller 2004) and the shape editors in AC Toolbox (Berg 1992) are attempts to overcome this problem by incorporating a Graphical User Interface (GUI).

Programming languages where the GUI reflects the data flow are called Visual Programming Languages (VPLs). VPLs related to music and sound include: Algorithmic Composer (Fraietta 2001), Max/MSP (Cycling'74 2004), Pure Data (PD) (Puckette 2005), jMax (Dechelle, Schwarz et al. 2004), Keykit (Thompson 2007), Autogam (Bachmann 2001), Patchwork (Laurson 1996), OpenMusic (Truchet, Assayag et al. 2001), Plogue Bidule (Plogue 2004), Reaktor (Native- Instruments 2007) and Karma (Klippel 2005). Both Max/MSP and PD are well supported and contain a constantly growing list of objects (Chamagne and Ninh 2005; Puckette 2005). These programs become particularly powerful when using externals which are objects programmed in C according to a specification that makes it usable from within the VPL. Some objects in Max that are relevant to the techniques I have used in this research are the implementations of Markov Matrices: Algo1, markov, markov-harmony, markov-rhythm, markov~, mchain as well as

85 the genetic algorithm toolkit, gak. Objects related to morphing are either related interpolation of presets, for example, l.preset and plugmorph or audio, for example, lp.tim~, lp.vim and morphine (Chamagne and Ninh 2005). OpenMusic has been developed primarily as a visual tool for experimenting with musical constraints. Bidule includes basic stochastic note generation capabilities and the more complex “particle arpeggiator” that models physical particle properties to create a stream of MIDI notes. It is effectively a chaos note generator with controls over speed, density and pitch tendencies. Many of the other systems mentioned include similar algorithmic processes.

CAAC tools that deal specifically with the application of adaptive music such as DirectMusic Producer are dealt with in the section on adaptive music and audio (3.3.3).

3.2.3 Sonifications

Sonifications are attempts to derive music or auditory data for scientifically motivated listening from non-musical processes such as mathematical formulas or brain wave signals. The musical success of a sonification depends on the level of compositional involvement, the design of the mapping from raw data to musical data and, to a lesser extent, the design of the abstract processes themselves. Most sonification algorithms are an abstract rather than heuristic approach, but some works incorporate evolutionary procedures and thus could be seen as trial based, in the sense of the compositional approaches described above (3.1.3)

Pioneering work in sonification was conducted by Iannis Xenakis from the 1960s (1971) who applied stochastic processes to waveform and note data. A sense of musicality was expressed through compositional and the parametric control of constraints such as frequency range and amplitude. Within the development environments of the VPLs mentioned above, a plethora of idiosyncratic mathematical sonifications have been created. More specialised applications for sonification such as Tangent, Texture, MusiNum, A Music Generator, FractMus, and Vox Populi have been reviewed by Miranda (2001), including his own CAMUS (Cellular Automata MUSic). These and others such as LMUSe (Sharp 2006) and MetaSynth (UI-Software 2005) utilise processes ranging from number theory, fractals, cellular automata and genetic algorithms. Other systems sonify and visualise complex evolutionary processes. For example Gakki-mon Planet (Berry, Rungarityotin et al. 2001) generates music from non-musical world state parameters. Those systems that apply evolutionary procedures such as Vox Populi and Gakki-mon Planet could also be viewed as embodying the trial approach to composition, in addition to the abstract approach.

86 3.2.4 DJ agents

DJ agents aim to mimic some or all of the musical tasks utilised by DJs, including track selection and sequencing, beat matching, cross-fading and mixing. Automatic DJ algorithms can be seen as part of “algorithmic composition” insofar as DJing is recognised as a compositional activity, which, without diverging overly, I assume is at least partially the case.

Cliff (Cliff 2000) created a dance music DJ agent which, given a small set of indicative tracks or some qualitative suggestions, could select and sequence an entire DJ set from a database and seamlessly mix between them. What is described by Cliff appears to be a sophisticated DJ agent, although demonstration software could not be found. Through the connectionist analytical techniques, this system predominantly models the heuristic compositional approach. To a lesser extent, the mapping of qualitative data could be considered part of an abstract approach. Fujio and Shiizuka’s (2003) DJ agent employed an interactive evolutionary computing approach in an attempt to adjust the cross-over point to suit the personal aesthetic human sensibilities of the user, which is a trial approach.

The other systems that were examined (Fraser 2002; Andersen 2005; Jehan 2005) tended to focus on technical aspects of DJing and were thus unable to be adequately discussed in terms of compositional approach. As low-level music algorithms, they are mostly transformational and analytic, rather than generative. Jehan (2005), as part of the Skeleton software, developed an automatic DJ with tempo and downbeat extraction, alignment, pitch-independent time stretching and cross-fading. An algorithm was employed to find rhythmically similar sections within the two pieces that would be the most suitable for a transition – a fairly broad context that is available to the algorithm. Jehan introduced the option of mid-mix tempo adjustment, a technique which results in an aesthetic which is fairly alien to electronic dance music. This is due to the limitations of traditional turntable technology, as I discovered through the qualitative “focus concert” (see chapter six). Fraser (2002) created an automatic DJ, with algorithms for dealing with tempo extraction, beat alignment and cross-fading. The cross-fade volume envelopes used during the transition could be drawn by the user, in contrast to typical DJ cross-fades which are limited to “equal-energy”, that is, logarithmic or inverse-exponential curves.

3.2.5 Summary of algorithmic composition review

An overview of algorithmic composition has been presented, highlighting a number of application contexts and broadly examining the field. Most of the significant composer agents analysed in 3.2.1 appeared to model the heuristic approach to music composition, implying a niche for

87 composer agents that model other compositional approaches. The works all involved generative, transformational and, to a lesser extent, analytic algorithms.

The range of CAAC tools and toolkits that were examined in 3.2.2 highlighted the level of diverse activity within the area and the kinds of basic algorithmic techniques that are used. The compositional approaches and music algorithm attributes mentioned in the framework (3.1) were evident, however, the level of sophistication was fairly low, due to the focus on enabling the user to create their own, more complex and specialised algorithms. It is clear that CAAC, as a diverse amalgamation of tools for composition and musical experimentation, can be a niche for any kind of algorithmic music technique, including note-level morphing.

The sonification systems that were examined in 3.2.3 mostly embodied the abstract approach to composition; however, some could also be seen partially as trial, based on evolutionary processes in the world-state being abstracted. Most sonifications tended to utilise generative and, to a lesser extent, transformational algorithms. The musicality of the sonifications I studied were mostly related to the mappings that were chosen. Sonifications are experimental in nature, and therefore have no economic need for specific techniques; nonetheless, novel techniques add to the experimental capabilities of sonifications. Note-level morphing could be applied to sonification as a method of combining or progressing through the musical results of different mappings.

Most of the DJ agents studied in 3.2.4 modelled only the low-level technical skills of DJs, with predominantly transformational and in some cases analytic algorithms. In the cases of higher- level, more artistic skills such as track ordering and determination of cross-over point, some works utilised the heuristic or trial approach to composition. Note-level morphing in DJ agent research is not apparent – despite the obvious correlation between morphing and mixing. This is understandable, as without the source MIDI sequence and synthesisers used to create particular tracks that are being mixed, it is currently very difficult to apply note-level techniques to the waveform-level practice of mixing. Nonetheless, the examination of DJ agents provided some insights into mixing techniques that might be applied to morphing, for example, automatic determination of the cross-over point and logarithmic crossfades.

In summary, a review of the field of algorithmic composition has been presented, comparing a number of algorithmic composition works in various application contexts. The process of reviewing the field has enabled particular trends and research opportunities within each context to become visible. This information was used throughout the research for inspiration and to gauge the novelty of the techniques that were subsequently developed.

88 However, algorithmic composition is a broad field and a narrower review that covers the newer field of interactive music in more detail is also necessary due to its particular relevance to the interactive nature of the research.

3.3 Review of interactive music

Having covered algorithmic composition, the review focus will now shift to interactive musical algorithms. Compared to the review of the previous section, it comprises a collection of relatively newer works with narrower application focus. This is particularly pertinent to the interactive nature of the project. Discussion of interactive musical algorithms that are specifically related to note sequence morphing is reserved for the review of note-level morphing in the following section (3.4).

Interactive music contexts involve transfer of information between some external source and the algorithm which can influence how the music is generated while it is being heard. Rowe (1993) developed a taxonomy of Interactive Music Systems, involving three attributes: score driven vs performance driven (level of sponteneity); transformative, generative or sequenced (compositional techniques); instrument vs player (the level of autonomy).

While Rowe’s taxonomy is useful for description and analysis of interactive music systems in general, I have instead arranged this review in terms of contexts to which note-level morphing could potentially be applied. Such application contexts include users guiding meta-instruments (3.3.1); musicians jamming with jamming agents (3.3.2); computer game players influencing the adaptive music of games (3.3.3); or participants experimenting with interactive installations (3.3.4).

A meta-instrument is similar to a musical instrument in that it is controlled directly by a musician with the intention of producing a particular musical effect. However the meta-instrument provides high-level parameters that are used to guide the overall music rather than requiring the musician to specify each individual note. Jamming agents take on the role of an improviser – listening to the music of the other musicians and formulating a response in realtime. Adaptive music systems or ‘stage pit agents’ observe a scene that is unfolding and formulate an appropriate response in realtime. The term adaptive music or adaptive audio is used in computer game music literature. Interactive installations are artworks involving physical interaction with visual and/or sonic processes.

By reviewing each context, numerous opportunities for applying note-level morphing became evident and some approaches to interaction that were found in the various works were inspirational to developments within this project.

89 3.3.1 Meta-instruments

Meta-instruments are designed for the live delivery of algorithmic music, taking instructional input from a player and rendering it into musical output. For systems that achieve this through a direct, narrow context mapping of gesture to sound, interactive function is more similar to traditional instruments. Meta-instruments are generally more sophisticated, broad context systems that take advantage of the algorithmic possibilities of software. In situations where the instruments, “meta” or otherwise, are combined with a suitable physical interface, the system is cybernetic (Pressing 1992).

The KARMA (Kay 2004) keyboard includes generative and transformational algorithms that the user tweaks in realtime to conduct a performance. Jam2Jam (Brown, Sorenson et al. 2004) is a meta-instrument based on stochastic generative techniques. Users control compositional parameters such as note density which affect weightings within each part. A basic system of harmony has also been implemented. In generating notes rapidly from musical rules and probabilities, both of these reflect the heuristic approach to composition.

LEMu (Wooller 2004) uses transformational and generative algorithms for the live compositional manipulation of electronic dance music. It is based primarily on deterministic transformational algorithms, but also incorporates a probabilistic grammar for generation of breakbeat style rhythmic patterns (Wooller 2003), a heuristic approach to composition. The influence of the transformational algorithms on the music is directly controlled by the user. Because of this, whichever approach the user adopts is reflected, be it heuristic, abstract or trial. A similar situation exists for live music production environments such as Live! (Ableton 2004), which are also less “meta”, having more direct forms of interactivity.

The Metasurface feature of Audiomulch enables any number of parameter snap-shots to be taken and positioned on a two-dimensional plane. The user can then morph between them through natural neighbour interpolation (Bencina 2005). Although the parameters relate to audio manipulation rather than music composition, some aspects are borderline, for example a Low Frequency Oscillator (LFO) mapped to amplitude will effectively control the rate of occurrence of a sound event in a rhythmic way. Despite being on the audio-level, the Audiomulch example is worth mentioning as it refects an abstract compositional approach – in the mapping of Cartesian co-ordinates to parameter snap-shot interpolations, the focus is less on matching particular musical criteria (as per the trial approach) or being guided by heuristics, but on exploring the emergent musical properties of the mapped surface.

Overall, the occurrence of note-level morphing in meta-instruments appears to be rare and those meta instruments that do (Momeni and Wessel 2003) are discussed in the review of note-level

90 morphing below in section 3.4. The techniques of note-level morphing support meta-level control of the music through the morph index and could therefore be easily applied to a meta- instrument. This is demonstrated by the live concert performance of LEMorpheus, the software I developed through this research, as discussed further in the conclusion.

3.3.2 Jamming agents

Jamming agents fulfil the role of a music ‘player’ (Rowe 1993) involved in an improvisational situation. They take musical input, analyse it, and respond with musical output, necessitating both analytic and generative algorithms. Transformational algorithms are not strictly necessary, but often used.

George Lewis (Lewis 1999) developed jamming agent software from 1985 to 1987 to respond to his solo trombone playing in the CD Voyager (Lewis 2007) and has continued developing the software to present day. Robert Rowe, in the seminal Interactive Music Systems (1993) and later work (Rowe 2001) discussed techniques and approaches related to his Cypher software.

Winkler (1998) conducted similar explorations using his Followplay software. The HybriD patch (Adam 2002) is designed to respond and resynthesise the performance (MIDI) and acoustic (audio) data gathered from a large ensemble. The fiddle object (Puckette and Apel 1998) in Pure Data is used to track pitch in jamming agents. None of these jamming agents have been submitted to any kind of testing and, from listening to the music generated (Lewis 2007; Rowe 2007; Winkler 2007), they appear likely to fail “Turing test” style conditions1. This is not unexpected, as these works are intentionally experimental, being conducted in the exploratory spirit of the abstract compositional approach.

Score-following is the analysis of live input to gauge the appropriate tempo at which to play a backing score and has received significant attention over the years (Dannenburg 1984; Vercoe 1984; Raphael 2003). While tracking techniques are relevant, the lack of generative algorithms excludes score-following from classification as a true jamming agent, being more of an ‘accompanist agent’.

1 The test would determine if the output of the jamming agent is indistinguishable from that of a human musician.

91 BoB (Band out of the Box) is a jamming agent that learns to emulate the characteristics of the musical style of the human player (Thom 2000). GenJam (Biles 2004) is a jamming agent originally based on Interactive Genetic Algorithms (IGAs). The Continuator (Pachet 2002) ‘continues’ half finished patterns that are played and learns musical styles from the musician and/or a database. Due to the high level of musical inventiveness, these last two systems will be analysed in more depth.

The Continuator

Recently developed by François Pachet, The Continuator is a virtual jammer that continues musical phrases (Pachet 2002; Pachet 2004). Promising informal tests have been conducted, finding that the human player is indistinguishable from the virtual in most cases. This must partially be due to the fact that the same timbre is used as the human and also that the generated response continues instantly from where the human left off. However, as well as this, global musical coherence is not an important consideration when the system is only required to produce short continuations (Pachet 2005). In this interactive context, the use of note-by-note predictive composition proves quite applicable.

The Continuator learns a Markov model of possible note sequences from a database of music, which can be built in realtime. A key feature is the use of a realtime fitness function to influence probabilities. The weightings of nodes can be adjusted in favour of notes with similar properties to the input stream provided by the interacting musician. The degree to which this occurs is determined by a user controlled variable.

Pachet has approached the problems related to the application of sequential models to polyphonic music by clustering notes within the same temporal region. When generating phrases the user can select from four different algorithms to dictate the rhythmic structure:

o Natural rhythm: negates the clustering, so the rhythm is exactly as it occurred in the database.

o Linear rhythm: streams of eight quantised notes of a fixed duration with no rests.

o Input rhythm: the realtime input is mapped onto the rhythmic structure of the output.

o Fixed metrical structure: clusters notes into beat long segments, regardless of their rhythmic structure

Each of these modes has strengths and weaknesses that are accentuated when applied to different musical styles. Natural rhythm mode can sometimes produce ‘unnatural’ rhythms when

92 notes occur outside of their original context. Linear rhythm is useful for certain styles of music where rapid sequences of short notes are common, such as be-bop. It is curious that one should dictate the rhythmic aspects of this style directly, rather than allowing the learning algorithm to extract them from a be-bop database. Input rhythm mode seems especially useful for the realtime jamming situation, providing a greater sense of interaction through rhythmic imitation.

In summary, The Continuator uses analytic, transformational and generative techniques. Analysis of musical sequences creates sets of note sequences and probabilities; rhythmic re- mappings transform the music and continuations are generated from probability sets. Because The Continuator is influenced by a musical history, musical input and a large number of instructional parameters, the contextual breadth is high. The approach to composition is clearly heuristic – there is a definite musical intention that is implicit in the Markov model. There is no evidence of explicit, goal-oriented trial and error, nor exploration of abstract mappings.

The system has been used in live improvisations by Gÿorgy Kurtag Jr., Alan Silva and Bernard Lubat at music festivals such as Fetive d’Uzeste, Sons d’hiver, Festwochen and Budapest festival (Pachet 2002). Musical examples can be downloaded from the website (Pachet 2004).

GenJam

GenJam (Genetic Jammer) (Al Biles 2004), is a virtual improviser and accompanist of “straight- up jazz”, with the capacity to perform solos, trade fours and provide harmony to a human player. A repository containing structural and motific data for the entire standard jazz repertoire provides the algorithm with a predetermined global structure around which to operate.

Note-level structure is derived from recombination and selection, in the tradition of Genetic Algorithms (GA). Depending on the mode of operation, material for recombination is either derived from a general lick database, a database of licks specific to the current song, or human input. The pitch intervals are applied to a chord progression and scale to determine the real pitch value. The original version was an Interactive Genetic Algorithm (IGA), where the user determined the fitness of the lick combinations. Depending on the mode and level of operation, the newer version of GenJam can randomly combine phrases or uses a function to ascertain the most rhythmically appropriate crossover point. A number of other heuristic approaches have been used during combination to ensure a certain musicality in realtime. Biles claims that this avoids the use of a fitness function, however it is not clear to what extent the population pool is reduced and, if it is not reduced to a single member, how the final lick is selected.

An interesting feature of GenJam is the set of mutations used in the “trading fours” mode:

93 “GenJam measure mutations include playing the measure backward, playing it upside down, playing it backward and upside down, transposing it up or down a random amount, and sorting the new-note events to create an ascending or descending melodic line. Phrase mutations include playing the measures in reverse order, rotating the measures, and repeating a measure.”

(Biles 2002)

Bile’s use of these techniques show how well chosen transformational processes can impart a sense of musicality as well as providing musical flexibility. When I developed the TraSe morphing algorithm, which is also based on transformations, the previous success of GenJam inspired some confidence that the approach would be feasible. GenJam also highlights how data-driven recombination techniques can be used to great effect when the stylistic boundaries are narrowed to a single genre such as “straight-up jazz”.

In summary, transformational algorithms are central to music creation in GenJam, while generative and analytical processes are important in relation to various musical representations, the generation of lick sequences from lick sets and specific tasks such as determining the cross-over point. The compositional approach is heuristic and, to a lesser degree, trial, although it is admittedly difficult to assess on the information that is available – the approach seems to have changed through different versions of the software. The augmentation of the fitness function with a set of musical rules clearly brings the approach towards heuristic, without trial and error. In the original version of GenJam, the use of supervised selection to iteratively filter a population is clearly a trial approach.

Biles and GenJam gig regularly and some demonstrations – which, in my own opinion, are at least moderately convincing – are available from the website (Biles 2004). In these recordings it is interesting to note that a more straightforward recombinant algorithm is used as accompaniment, in the style of Band In A Box (Gannon 2004).

To summarise discussion of jamming agents, despite many significant developments, note-level morphing is not at all apparent in this field, suggesting a possible niche application. Techniques for note-level morphing can be used to create new musical material for the jammer to output, whether morphing the realtime human input or a database of music. In this way, the hybrid aspect of morphing is more pertinent than the transition, although transitioning could also be applicable in situations where an “accompanist agent” is required to perform medleys.

94 3.3.3 Adaptive music

Adaptive music occurs when the interactive algorithm generates music in response to data that concerns a world-state, rather than musical or instructional input. It is perhaps the most commercially relevant field of algorithmic music practice, being applied primarily to the computer game industry (Neil 2005; Clark 2007), but is paradoxically one of the least developed in terms of algorithmic sophistication (Sanger 2003). That is, most adaptive music systems recombine sections of pre-produced electronic music, rather than utilising formalised musical knowledge to generate the musical backing.

The game Need for Speed 3 (Electronic-Arts 1998) uses car speed to select the electronic loops with the appropriate intensity. Games such as Frequency (Sushi 2002) and Amplitude (Perry 2003) are especially interesting in that music is the central aspect of gameplay, however, even when this is the case, they do not go beyond the simple one-to-one mapping of button strike to playback that is typical in music games. Game music composers need to expend considerable effort in carefully crafting segments and transitions to suit the recombinant algorithms (Sanger 2003; Whitmore 2003; Apple 2006).

It is evident that individual game companies have tended to develop idiosyncratic low-level audio solutions in parallel for the platforms relevant to them (Sanger 2003), for example the Miles SDK (RAD 2004), MusyX (Factor-5 2000), Audiality (Olofson 2005) and many others that are reviewed by Brandon (2006). Despite this, there has been a slow movement towards more unified standards over the past few years (MMA 2003; Sanger 2003) and open discussion of adaptive music techniques is becoming more common (Paul 2003; Whitmore 2003; Apple 2006). Some historical context for game audio is given by Brandon (2004).

Currently, the most significant adaptive music tool is DirectMusic Producer which is detailed below.

DirectMusic Producer

DirectMusic Producer (DMP) is currently the most well developed, usable and freely available tool directed at composers who write adaptive music for computer games (Microsoft 2004). While many other engines exist, systems that aim to facilitate an interactive or continuous generative delivery are few (Factor-5 2000; Beatnik 2002; SSEYO 2004) and have less functionality. None of the algorithmic music techniques used by DMP are new; in fact they are quite rudimentary. The advantage of DMP is that the techniques are employed within a user-

95 friendly application which is integrated into a much larger game development Application Programming Interface (API).

While more comprehensive explanations are provided in DirectX 9: Audio Exposed (Fay, Selfon et al. 2003) as well as help menus from within the program itself, a short summary of the aspects most pertinent to adaptive music is provided here – specifiable random deviations in audio playback to overcome repetition; and a basic grammar for labelling and interpreting note sequence segments that includes the estimated aesthetic impact of patterns, the harmonic nature of a musical pattern and the musical role of the pattern.

DMP provides some functions that aim to alleviate repetition in computer game music – pitch, volume and duration variability of sound samples and pattern variations. Random variation of sonic parameters such as this is done much more extensively by KOAN (SSEYO 2004) and has existed within any programmable synthesiser for decades, not to mention the works of pioneers such as Xenakis (1992). Recombination of pattern variations is even older, the canonical example being Mozart’s Dice Game (Roads 1996).

DMP enhances the effect of recombination through the labelling of “variations” with a musical grammar. Variations for patterns have a number from one to 32, and in this way can be ordered or shuffled on playback. They can also be flagged as being harmonically uncompliant with certain scales and chord-types, meaning that they have less (or no) chance of being selected when the specified harmonic configuration is playing. The level of detail available is quite extensive, incorporating many relevant grammatical terms from western music theory. It is also possible to specify what is harmonically required of the next (destination) chord in the sequence for the variation to be compliant.

“Patterns” (a DMP term, meaning a bundle of variations) can also be labelled to assist the recombination algorithm. The “groove” range is used to help determine which pattern should be playing when they overlap – when the groove level (an expression of emotional intensity) is outside the groove range assigned to a pattern it will not be played. As well as this, discrete musical roles such as “intro”, “fill”, “break”, “end” and custom labels can be assigned to the pattern. However, apart from “intro” and “end”, functionality that meaningfully interprets this labelling system is undefined by DirectX itself.

Overall, DMP is designed as a tool rather than a musical agent. The elements that involve stochastic generative algorithms could be lightly regarded as the heuristic compositional approach, however most of the musicality stems directly from the composer. The fact that DMP is currently the most highly developed tool available signals a niche that may be filled by techniques such as note-level morphing – an observation that is supported by Jonas Edlund’s

96 recent efforts to apply note-level morphing in The Musifier computer game music engine (Edlund 2004). The goal here is for the game music composer to create passages that reflect the various moods in the game and for The Musifier software to create music at any particular point in time that expresses the precise combination of moods required by the world-state.

3.3.4 Interactive installations

Interactive installation covers a diverse range of approaches, all of which involve some form of physical and spatial interaction with a visual and/or sonic process. This field is in some ways more closely associated with new media visual art and design than algorithmic composition and is thus of only peripheral significance to the project. However, due to the potential for application of morphing to interactive installations, it has been investigated to a small extent. An overview of the field is given by Sommerer and Mignonneau (1998) while Krueger (1991) is of especially historical significance. John McCormack has applied sophisticated musical processes to interactive installations, involving evolutionary systems (McCormack 2003) in a mixture of the abstract and trial compositional approaches. Garth Paine is noted for development of installations based on physical gesture (Paine 2007) and establishing that it can sometimes be more appropriate to view the installation as a musical instrument rather than adaptive phenomena (Paine 2004). Most installations are fairly experimental and include interactive musical algorithms that relate to the abstract compositional approach.

Tabletop interfaces are particularly pertinent due to the topological affordance of morphing. The most comprehensive repository of information regarding current tabletop interfaces for electronic music has been compiled by Martin Kaltenbrunner (2006). This includes the reacTable* (Jordà, Kaltenbrunner et al. 2005) which uses video tracking to locate special fiducial markers on a semi-transparent table that is also projected upon. The markers control various audio effects or generators within a patch. The reacTIVision software libraries (Bencina, Kaltenbrunner et al. 2006) that drive the camera tracking system were also used within this research to demonstrate the possibility of applying the morphing software to the interactive installation paradigm. The alternative to reacTIVision was the Augmented Reality Toolkit (Kato 2006) which was discussed with Rodney Berry who has used it within a number of projects (Berry, Makino et al. 2003; Berry, Makino et al. 2006). It was not investigated directly due to the increased time that would be required to develop customised C code.

Overall, interactive installations are particularly experimental, and most techniques with a degree of novelty are applicable to the field. This includes the note sequence morphing algorithms I have developed, as demonstrated by the Morph Table installation (see the conclusion).

97 3.3.5 Summary of interactive music

I reviewed a number of significant interactive music works within a range of application contexts – meta-instruments, jamming agents, adaptive music and interactive installations. Works that are particularly notable have been analysed in technical detail and this process was useful in providing inspiration for later developments of my own. For example, The Continuator showed how Markov techniques can be applied in realtime interactive contexts, GenJam highlighted the creative nature of genetic algorithms and DirectMusic Producer demonstrated that a surprising amount of adaptivity can be achieved with only basic techniques.

A number of opportunities for the application of note-level morphing have become apparent within each interactive music context, highlighting the potential usefulness for morphing to the whole field. This includes control of the morph index as a meta-instrument, hybridisation of source and target to generate material for jamming agents, seamless transitioning between musical states of computer games in adaptive music, or table-top interfaces for morphing in interactive installations. Having established the potential for morphing, the following section will review pervious attempts at note-level morphing.

3.4 Review of note-level morphing

A review will now be presented that examines instances of automated note-level morphing – the process of generating a hybrid transition between source and target note-level music. This last review positions my research within the narrowest possible context – viewing only the projects that share, at some level, the core focus of note-level morphing.

Five particularly significant works of note-level morphing were by Mathews and Rosler in 1967, Larry Polansky in the 1980s and into the 1990s, Horner and Goldberg in 1992, Danny Oppenheim in the mid 1990s and Jonas Edlund more recently from 2004. After reviewing the works of these investigators in some detail, other works, that either do not have note-level morphing as their primary subject or are less significant, are also mentioned so as to provide a greater sense of context.

The five primary works are compared in relation to the various significant research activities that they have engaged in. LEMorpheus, the system I developed, is contrasted to the other projects. The presence or absence of each research activity in each system is tabulated, clearly showing how LEMorpheus attends to numerous nascent research opportunities. The comparison was useful as an overview and guide to the research. It should not be construed as a criticism of the

98 earlier systems, as each was limited by the technological and aesthetic boundaries of its time. This is particularly apparent in the first system, GRIN (GRaphical INput).

3.4.1 GRIN

Max Mathews appears to have created the first musical morphing algorithm in 1966 on the MUSIC IV platform (Mathews and Rosler 1969). This work was developed as a demonstration of the algorithmic possibilities of Mathews’ and Rosler’s GRIN94 program. In this system, a monophonic melody was represented with separate functions, or envelopes, for each dimension of amplitude, frequency, duration and glissando. The frequency functions were made of flat (gradient 0) segments for the tone of each note, while the amplitude function was used to accent the first beat in each measure. Glissando was not used in the morphing example. The discrete note durations, or inter-onset times, required conversion to a continuous function in order to become algebraically manipulable.

Mapping discrete start times to a continuous domain is a problem that can be approached in many ways; Mathews’ and Rosler’s technique is particularly ingenious. To generate a melody, a note would be created and the duration function would be sampled at that point. The sampled value would specify the inter-onset distance to the next note and sample point. The “self- synchronising” form of the duration function required that each segment had a gradient of -1, so that if the sampling is ahead or behind a certain amount, the next note and sample point would be in time.

Figure 1 Self-synchronising function for inter-onsets (reprinted with permission).

Halfway between each note was chosen arbitrarily as the start and end points for each of these segments. Combining self-synchronising functions with others results only in other self- synchronising functions and so the coherent quantisation of durations to known values is

99 inherent in the style of representation. Having dealt with the problem of continuous representation, morphing becomes simply a matter of combining the functions in each pattern, using the morph index to weight each one.

The compositional approach employed by Mathews and Rosler appears to be abstract. It is approached as an exploration into what might occur when the music is parameterised and combined, as opposed to designing the algorithms strongly with preconceived notions of how it should sound. At the lower level, analytic, transformational and generative algorithms are all used. The conversion of note sequence to envelopes is analytic, the combination of envelopes is transformational and the rendering of the combined envelopes into a note sequence is generative. All of these operate within a narrow context, as no external or randomly accessed data is available to the algorithm.

For its time, a somewhat convincing result was recorded and produced on the vinyl disc accompanying the book (Mathews and Rosler 1969). It morphs from The British Grenadiers to When Johnny Comes Marching Home and back (~3.1).

3.4.2 HMSL

The Hierarchical Music Specification Language (HMSL) was developed by Phil Burke, Larry Polansky and David Rosenboom from 1980 and is implemented in FORTH (Burke, Polansky et al. 2005). It is partially inspired by the musical theories of Jim Tenney (Polansky 2006), including the notion that general patterns should be easily mappable to various levels of a music hierarchy. Polansky developed code in HMSL to aid the experimental morphing of music within some of his compositions, including Distance Music (Polansky 1987), Bedhaya Guthrie/Bedhaya Sadra (Polansky 1996), 51 Melodies (Polansky 1991), Two Children’s Songs (Polansky 1992) and Road to Chimachum. MIDI renderings of these last three can be heard online (Polansky 2006).

This music was based on a theoretical framework developed by Polansky that explores and extends the application of mathematical set theory and similarity theory to experimental music. These ideas have been presented at conferences (Polansky and McKinney 1991; Polansky 1992) and covered more comprehensively in journal publications (Polansky 1996). To summarise the primary aspects, source and target are conceived as ordered sets. Given this representation, various analytical metrics can be applied to obtain some notion of distance between the patterns and, conversely, mutation algorithms can generate music at a specific distance (the morph index) between two sets. The various approaches are classified according to their foci and techniques: interval magnitude (difference between one item in the set and the next) or direction (up or down), linear (processing the set from start to finish) or combinatorial

100 (utilising intervals from each item in the set to every other item), unordered (non-structural statistics) or ordered (utilising the sequential order of the pattern). HMSL is unsupported by modern operating systems, although much of it has been ported to Java as the Java Music Specification Language (see below).

Overall, the compositional approach taken by Polansky is mostly abstract, as evidenced by the obviously experimental nature of his mappings and the music. Some works also showed the algorithmic modelling of musical stylistic intentions through heuristics, for example, Bedhaya Guthrie/Bedhaya Sadra. The piece Drawing Unnecessary Conclusions (Polansky 1987) utilised morphological metrics in analytic algorithms, and human performers as the generative component. Because there was an explicit criteria that the generated shapes be a certain metrical distance from each other, the compositional approach was somewhat related to trial, however there was no large population of candidates which are matched against the criteria. In this case, the approach to composition is ultimately carried out by the individual performer. An example of a harmonic morph created by Polansky using HSML is included (~3.2).

3.4.3 Horner and Goldberg

Horner and Goldberg (Horner and Goldberg 1991), published the first known application of GAs to morphing – probably also the first application of evolutionary computation to music (Biles 2008). Horner and Goldberg refer to the process as “thematic bridging”, where the source note sequence is transformed into the target via a sequence of operations, concatenating the result of each operation into a complete bridge. A fitness function appears to have been used at some stage to gauge the similarity of each bridge segment to the target, however it is not clear precisely where and how it was applied. Some aspects of fitness or selection appear to rely on user input, although the exact workings of the process are a little unclear as the results were only published in a short conference article. The music representation consisted of pitch and amplitude information. Horner and Goldberg’s software was used to compose a piece “Epistasis”, which I have unfortunately been unable to find.

3.4.4 DMorph

Daniel Oppenheim first published a short paper on morphing, presenting the DMorph software (Oppenheim 1995), which was implemented as a computer assisted composition tool within DMix (Oppenheim 1993) and discussed in detail as a patent (Oppenheim 1997). The algorithm was realtime, interactive and deals with n-source morphs, which extended the original definition of the morphing function to include any number (n) of input patterns. DMorph only morphed between single parts, however, as it existed within DMix, a multi-part environment, multi-part

101 morphs were possible. Despite the option of multi-part morphing in Oppenheim’s system, there was no inter-part communication, for example, various parts following the same tonality. In the implementation of DMorph, four sources were used, each relating to a corner of a Cartesian plane called the Mutant Ninja Tennis Court. The morph index was a point on this two- dimensional plane. The emphasis was less on automatic transitioning (from source to target) and more on the creation of a musical hybrid through interactive topological navigation of the Mutant Ninja Tennis Court.

Oppenheim’s procedure is to group notes from each source together based on a mapping that is selected by the user. Each group, which includes a note from source and target, relates to a note that will be generated during the morph. Once the notes from source and target are grouped in this way, the note properties of all notes in the group are interpolated and this value is used to create a new note. There are two different generic implementations of this procedure - time warped grouping creates groups based on the order of the notes, while time synchronous grouping creates groups based on the distance of note onset. The decision to lock, interpolate, or apply weighted selection for individual parameters such as pitch or onset is made by the user.

The compositional approach used in DMorph is abstract, based on interpolation of musical parameters and exploration of the topology, rather than striving towards any explicit or implicit musical goals. The grouping process is analytic, the weighted combination of parameters is transformational and the “note-groups to note-sequence” process is generative. The fact that the note grouping algorithm has access to all of the notes from source and target means that there is a moderately wide breadth of context.

Musical demos are no longer available from IBM’s website, however, Oppenheim was gracious enough to send them by email (Oppenheim 2006) and some examples are included (~3.3, ~3.4, ~3.5).

3.4.5 The Musifier

Jonas Edlund has developed an adaptive music system, The Musifier (Edlund 2004), which utilises note-level morphing as a key component. Edlund presents The Musifier as an ‘orchestra- pit’ agent – fulfilling the role of a theatre’s orchestra pit, but in the context of computer games. The Musifier performs n-source morphing on different themes provided by a human composer. The themes are linked to particular dynamic parameters within the game state, for example the health-level of the player character, the position on the map, the proximity of non-player characters or enemies. The human composer designs the musical theme to reflect the emotional significance of each game state parameter and The Musifier’s morphing algorithm hybridises

102 them into the most appropriate music, in realtime, based on the value of the game-state parameters.

The details of the morphing techniques that Edlund uses are a trade secret, however musical demonstrations are available for download. More recently a web application has been made which allows a user to specify the weights of different themes (Edlund 2006). A particularly useful technical advance is apparent simply through listening to the examples – the problem of morphing between parts of different timbre has been adequately handled in MIDI by cross-fading the volume of parts on two different channels and sending identical note events to both channels. Through personal communications with Edlund, my initial speculation of an abstract harmonic representation to provide unified movement to harmonic parts has been confirmed. Rhythmic segments appear to be treated as indivisible gestalt units. As well as this, melodic patterns sometimes also appear indivisible, switching to the new theme at an appropriate cross- over point.

Edlund uses three criteria for adaptive music and morphing – responsiveness, continuity and complexity. Responsiveness is how well the system responds to change. Continuity is concerned with matching the contour of the changes and changing smoothly. Complexity is how well the algorithm can convert the many dimensions of game state data into an equal number of dimensions of musical data and then into suitable music. For most examples, The Musifier appears to do quite well according to these criteria, although without rigorous testing it is difficult to comprehensively ascertain the various musical successes and aesthetic traits of Edlund’s work.

The Musifier most likely reflects the heuristic approach to composition, however, it is difficult to classify the compositional approach without viewing the algorithms. The musical intentions are fairly clear, as evidenced by the adherence to popular, tonal music styles and the criteria for adaptive music and morphing that were motivated by the strong commercial imperative. Because of this, the algorithms are likely to be based on the heuristic or trial approaches, rather than abstract. The algorithm does not appear to require significant computation in order to create the morph, which implies a fast heuristic over the often more computationally intensive trial. An example is included with source (~3.6), target (~3.7) and morph (~3.8).

3.4.6 Others

A number of algorithmic music systems, other than those examined in detail above have applied morphing or morph-like techniques. Momeni and Wessel developed the Beat-Space software using MAX/MSP which morphs between parameter states on a 2D surface. Gaussian kernels are used to control the prominence of each parameter state on the surface (Momeni and Wessel

103 2003). While the software was primarily concerned with morphing sonic parameters, the Beat- Space component dealt with musical material, morphing between parameters that control probabilities of beat generation within eighth-note slots. The source and target are deterministically represented such that the probability for any slot can be only zero or one, while the morph is generated from the non-deterministic interpolations.

KOAN (SSEYO 2004) can also perform note-level morphing, as boasted in Presswire (M2- Communications 1997):

“Examples of just some of what the IKMC (Interactive KOAN Music Control) can accommodate dynamically in real-time include: addition or alteration of melodic or rhythmic patterns, smooth morphing between two KOAN pieces, changing of patches, application of filtering effects to change the sound palette, modification of auto-chording, alteration of the generative rules underpinning the piece, deletion or addition of KOAN player 'voices' and rules, and semi-interactive MIDI file playback.”

The approach is similar to Momeni and Wessel’s, but deals with pitch as well as rhythm. KOAN was designed primarily for websites and hand-held music making devices with limited storage space and the requirement for constantly changing music over long periods.

Nick Didkovsky and Phil Burke (2001) have extended the capabilities of JMSL beyond the original HMSL. Particular aspects of JMSL which are relevant to morphing are the Binary Copy Buffer Transform (BCBT) (Didkovsky and Burke 2004) and an applet called the Schubert Impromptu Morpher (Didkovsky 1997). The BCBT is a function that is part of the score editing window in JMSL (Didkovsky and Burke 2004), where the user can copy segments of music into two different buffers. The BCBT function then uses a morphing algorithm to combine the two buffers and paste the result onto the score. In this way, buffer one is source, buffer two is target, and the pasted result is the morphed hybrid. The Zipper Interleave Transform is a morphing algorithm that comes with JMSL which iterates through source and target, alternately placing an element from one or the other into the morph. Through the extensible code design there is great potential for users of JMSL to create custom BCBT plug-ins for the score editor, however this did not appear to be a particularly active area of development. Didkovsky’s Schubert Impromtu Morpher applet stochastically generates music from statistics obtained by analysing a Shubert performance, as source. The target is specified from user-defined statistical values. The user controls the morph index, and can disable the interpolation of individual statistical parameters (Didkovsky 1997).

As well as being a DJ agent (see section 3.2.4), Tristan Jehan’s Skeleton software can automatically create hybrid tracks, from source and target audio (Jehan 2005). The algorithm

104 extracts a music structure (note-level, metre) description from the target, and “fits” audio snippets from the source into it, as per concatenative synthesis (Schwarz 2004; Sturm 2004). Jehan describes the process as being cross-synthesis on the level of musical structure.

In further relation to sound morphing, Ircam’s software Diphone (Rodet and Lefevre 1997) also uses an analytical and concatenative approach to morphing. Polansky and Erbe (1996) applied morphological metrics to spectral mutation in the Sound-hack software. Paul Lansky designed custom sound morphing algorithms for the composition Quakerbridge (Oppenheim 1997) and Cook has morphed with vocal synthesis algorithms (Cook 1998).

Robert Winter has developed a system in Pure Data that deals with interpolation of musical expression and structure, with a primary focus on emotion (Winter 2005). Canazza (2001) also developed a system for morphing musical expression. Algorithms that deal with note-level representations do not seem apparent. Stephen Ingham (Ingham 1996) has developed software to interpolate values in Standard MIDI Files in the Patchwork environment. Christian Renz has developed an application in Perl called MIDI Morph (Renz 2005), designed for note-level morphing. It is currently alpha and not in active development. Berger (1995) submitted an abstract with plans to develop a note-level morphing system.

There are no doubt other small scale algorithmic music systems that relate to morphing, indeed, any parametrically well-defined problem space could be the subject of morphing through interpolation. However, enough has been presented to sufficiently express a sense of the surrounding context.

3.4.7 Summary of opportunities for note-level morphing research

Through a comprehensive review of note-level morphing systems, only four works with a significant degree of relevance and capability have been found, highlighting a potential for growth in the field of note-level morphing. Examining these works in detail, I have identified a number of research niches that have not, until now, been thoroughly explored in the context of note-level morphing. These opportunities, which will be explained below, include: modelling the trial and heuristic approaches to composition, rigorous testing, musical coherence within a MEM context and contemporary computing platforms.

A large niche for research in note-level morphing is in modelling the trial approach to composition, through evolutionary algorithms. While the processes used in The Musifier are a trade secret, they appear from my observations to be based on the heuristic approach. The

105 other existing systems – GRIN, HMSL and DMorph – are all based on musical abstraction. Horner and Goldberg modelled a trial approach, however, while this work was highly novel at the time, the lack of detail, scope and application allows considerable room for additional developments.

Another opportunity for new research in morphing appears to be formal testing, which is entirely absent from all of the projects that were examined. Some researchers/projects utilised informal feedback, for example, Edlund released beta versions of The Musifier with the intention of gaining feedback and Polansky had access to reviews and personal opinions regarding how his pieces were received. However, none of the projects utilised systematic musical evaluation from a group of users or listeners. As a result there was no collection and interpretation of qualitative data, which could then lead to further refinements of the algorithms. If such methodologies had been implemented, there would be impetus towards music that was considered by survey participants – and hopefully most others – to be of a high standard.

Listening to the musical output of the various systems also presents an opportunity – the possibility of morphing algorithms which appeal to the popular sensibilities of MEM. It is apparent from listening to the When Johnny Comes Marching Home to The British Grenadiers morph created on GRIN, that it is not passable as marching music. As well as this, the work itself predates MEM as it is known today. Polansky’s morphing compositions using HMSL were appropriate for the intended context of avant-garde music, however the algorithms are unlikely, without significant extraneous production work to be successfully applied to MEM. Either way, application to MEM, as a research opportunity, was not investigated by Polansky. DMorph was tested on musical styles such as Latin and Classical. None of the demos provided by Oppenheim were in the MEM genre and, without access to the software, it is difficult to claim with any certainty for or against the applicability of DMorph to MEM. Nevertheless, it is clear that the application of note-level morphing within a MEM context has not been thoroughly investigated by Oppenheim. The Musifier is designed to cater to computer game music and therefore is partially situated within a MEM context, as with LEMorpheus. It is difficult to objectively gauge how The Musifier’s music might be received without tests; however, from the musical demos, it appears to afford music with extended chord progressions and simple rhythms. This is contrasted to LEMorpheus which is applicable to short chord progressions and complex rhythms.

The ability for multiple parts to work together, sharing data during the morph, is another aspect of note-level morphing that is in need of further research. Only LEMorpheus and The Musifier address this. GRIN was monophonic, HMSL dealt with multiple voices in ways that were specific to the individual compositions and there are no morph examples from DMorph between a source

106 and target with more that one part in each. Multiple parts could be used through DMix, which contained DMorph, however, the morphing process is parallel for each of those parts; that is, no shared inter-part information. Horner and Goldberg computed the layers separately.

In his patent, Oppenheim (1997, p 42) recognised how useful it could be to assign each part a separate musical function – for example, bass, lead, drums – and use this information to inform the morphing process. The Musifier explicitly includes functional parts: the abstract chord and scale sequence part, bass, melody, chords and percussion. I also addressed part function, but in a more flexible manner - various settings of LEMorpheus parameters afford the morphing of different part functions and I have saved particular settings of parameters and applied these to particular part functions as necessary. Oppenheim (1997, p 42) also pointed out the need for the user to be able to morph each layer independently of others. It is notable that LEMorpheus, in the Morph Table mode, is the only system where this is currently possible.

Exploring new ways for morphing algorithms to be interactive is another aspect where research opportunities become apparent. GRIN and HMSL are not realtime. DMorph and The Musifier both allow for realtime modification of morph index for each of their sources. DMorph is perhaps the most advanced in terms of interactivity, being able to independently morph musical aspects such as rhythm and pitch on the Mutant Ninja Tennis Court. However, no algorithmic designs for note-level morphing that suit interactive interfaces beyond the mouse and screen have been explored. LEMorpheus can operate in Morph Table mode where parts are morphed independently on a physical tabletop interface.

In situations where the source and target have contrasting timbres, morphing on the note-level initially presents an obstacle to integration of timbre – program changes are inadequate due to slow speed and the occurrence of stuck notes on some synthesisers. The inadequacy of timbre integration in MIDI morphing was mentioned by Oppenheim (1997). Both Oppenheim and Edlund overcame this by utilising two MIDI channels, one for source and one for target. Inspired by this, I developed a similar scheme in LEMorpheus. In personal communications, Oppenheim also highlighted the use of system exclusive MIDI messages (2007) as another potential method to control the timbres.

Finally, only LEMorpheus and The Musifier are executable on contemporary computing platforms (post Windows 98 and Mac OS 9). While this is not particularly relevant to algorithmic music techniques it is worth noting, as it hinders other researchers from replicating the results.

The opportunities are summarised in the following table:

107

Research Opportunity GRIN HMSL H&G DMorph Musifier LEMorpheus Musical trial approach Y Y Musical abstraction approach Y Y Y Y Y Musical heuristic approach Y Y Formal Testing Y Mainstream Electronic Music P Y Independent part morphing Y Inter-part communication Y Y Functional parts P Y Realtime morphing Y Y Y Interactive graphical interface Y Y Y Y Y Interactive physical interface Y Y Morph between MIDI instruments Y Y Y Contemporary operating system Y Y

Figure 2 Summary of research opportunities and the four major note-level morphing systems. “Y” indicates that the opportunity was investigated to a significant extent by the system, while “P” indicates that the opportunity was investigated partially. Blank indicates that no investigation has occurred. H&G stands for “Horner and Goldberg”.

As evident from the table, my study addresses a range of research niches that have not been thoroughly explored in the past. These are now summarised. LEMorpheus includes three different algorithms that partially explore the compositional approaches of abstraction, heuristics and trial. The latter two were the subject of formal tests, firstly a focus-concert and then an online survey. The MEM music created by LEMorpheus was found to be applicable to mainstream music contexts such as computer games and live electronic music performance. The LEMorpheus system allows for any number of layers, with a separate morphing algorithm and settings specific to the musical function of that layer. When applied to the physical interface of the Morph Table, each of the parts have a separate morph index. LEMorpheus allows any part to follow the tonality of any other part. LEMorpheus operates in realtime and is able to handle morphing between two different MIDI instruments. Lastly, LEMorpheus, being written in Java, is able to be tested with relative ease.

108 Having contextualised my own research within the fields of algorithmic composition, interactive music and note-level morphing, I shall now provide a thorough explanation of the techniques I used, beginning with the system infrastructure that is behind LEMorpheus.

109 4 LEMorpheus software infrastructure

This chapter will detail pertinent aspects of the supporting infrastructure of the software (LEMorpheus) I created to experiment with automated and interactive note sequence morphing algorithms. This provides synoptic insights into the note sequence morphing system, rather than explanations of particular algorithms. The infrastructural knowledge contributed here can be applied to note sequence morphing and, in some cases, more widely to algorithmic composition. This description of infrastructure will also assist in understanding the note sequence morphing algorithms discussed later, in chapters five, six and seven.

Firstly, an overview of the software infrastructure (4.1) explains, at a high level, how various system components for morphing relate to each other. A simple diagram is used to summarise the system workings, clarified by a written explanation. This was chosen over a diagram using the detailed modelling language UML (Unified Modelling Language), which appeared overly complex.

Following this, the aspects of the system which are easily controlled by the user are explained, from the high-level Graphical User Interface (GUI) I designed for morphing (4.2), to the layout of the note sequence editor (4.3). The “meta-morphing” parameters which are available to all compositional morphing algorithms are then detailed (4.4), including parameters to control inter- part tonal communication as well as a number of morph index transformations that operate in parallel on different parameters. Pertinent aspects of the note sequence morphing software infrastructure are then explained (4.5), including music representations, and the extensible design of the software. Lastly, the method for producing MIDI output in realtime is described (4.6).

Because compositional morphing is the focus of the research, the software infrastructure was only evaluated in simple “pass/fail” terms of whether it could sufficiently support the morphing algorithms. While it has obviously succeeded, there are numerous directions for improvement of the software infrastructure that have become apparent and these possible extensions to the architecture are expressed in the summary (4.7).

Throughout this chapter, only the system architecture components related to the topic of compositional morphing will be explained. Many other aspects of LEMorpheus required substantial development effort to implement and made certain tasks easier, however, they are not discussed anywhere within the thesis because they are not directly relevant. For example, file saving and custom-built interface components. Despite this, curious readers can refer to the source-code in the folder “4. digital appendix” on the accompanying CD.

110 4.1 LEMorpheus overview

LEMorpheus has been designed to enable investigation into interactive note sequence morphing between two MIDI sequence loops, within the musical context of Mainstream Electronic Music (MEM). LEMorpheus allows considerable realtime control over note sequence morphing and the software is flexible and extensible. It is written in the Java programming language using a personalised realtime branch of the jMusic (Sorenson and Brown 2004) open source Java music library and the Midishare (Letz 2004) open source MIDI in/out library.

Target note sequences Meta Morphed morphing meta-data Target meta-data algorithms data referred to Render as MIDI sequence Source note sequences Note sequence Morphed note Source meta-data morphing sequences Output algorithms

Figure 1 Overview of the infrastructure for morphing in LEMorpheus

Before and during morphing, the user can edit the note sequences of the source and target loops. The meta-data of the loops can also be changed, for example, the tempo (beats per minute) or the key and scale labels for a particular part. The type of algorithm that will be used to generate the notes during the morph (the “Note sequence morphing algorithm” circle in Figure 1) can be selected and various parameters specific to the algorithm can be tweaked. The algorithm for morphing meta-data (“Meta morphing algorithm” in Figure 1) is interpolation, with adjustable parameters that influence the interpolation for particular types of meta-data, as described in further detail below (4.3).

Note morphing algorithms, such as those explained in chapters five to seven, can be either realtime or non-realtime. Realtime morphing algorithms are able to respond to adjustments of parameters and source and target note sequences while the morph is progressing. Non- realtime algorithms have too much time complexity for this; instead, a list of note sequence frames is rendered beforehand and different frames are selected for playback depending on the morph index. The use of frames is explained in more detail when describing the non-realtime TraSe algorithm in chapter seven.

111 To generate the final stream of MIDI events, the morphed meta-data is applied to the morphed note sequence data. This process is different depending on which note sequence morphing algorithm is selected and the type of meta and note-level representations that are used by it, as explained below (4.5).

4.2 High level morphing controls

Figure 2 High-level user interface for LEMorpheus.

At the highest level of control the user is presented with a simple graphical interface of multi-part loops (the square boxes in Figure 2) connected to other loops via a morph (the line with a circle in the middle in Figure 2). A loop can be played by clicking a box, and morphing between loops can be initiated by clicking on a circle. While a morph is playing, the morph index is controlled and displayed by a green ball that automatically moves from one side to the other, unless dragged directly by the mouse pointer.

Loops can be arranged in progression, analogous to the way a DJ might prepare a playlist, except that any non-linear path can be created. The morph index can be moved using external input from MIDI or the reacTIVision video tracking software (Bencina, Kaltenbrunner and Costanza 2006). The morph index can also be controlled by a variable in a computer game. Both computer game and table top interfaces have been implemented, however, further discussion of these components is reserved for chapter eight, as they are more related to future applications than software infrastructure.

112 4.3 Loop editor

The loop editor within LEMorpheus allows for multi-part loops to be edited and played back and is abstracted by the layout diagram below:

Menus: file I/O, editing tools

Global meta-data that affect all layers: tempo, global key/scale

List of different layers that Meta-data for the currently selected layer: can be selected, eg: Loop length, quantize, shuffle, local key 1. Drums 2. Auxiliary percussion Note sequence editor (piano-roll) for the 3. Bass currently selected layer 4. Chords 5. Lead 6. Pads MIDI ctrl data editor for currently selected layer

Figure 3 Layout of the loop editor interface.

The loop editor (Figure 3) is similar to many other standard MIDI sequencers, with file and edit menus, global parameters such as tempo, a list of parts/layers on the left, a piano-roll note editor for the currently selected layer to the right; various meta parameters such as instrument, channel, length, quantize, shuffle, key and scale above the note editor; and a graph of MIDI controller data below the note editor.

Unlike regular sequencers, the ordering of layers within the list has an important musical effect: layers at the same level (vertical slot) in different loops will be morphed together. For example, if the drums were on the third layer in the source, the user should put the target drums, or the most drum-like part from the target, also on the third layer (see Figure 4).

113 Morph

Target Source Layer one morph algorithm Parts: Parts: Double bass

Bass guitar

Layer two morph algorithm Violin

Guitar

Layer three morph algorithm Timpani

Drums

Layer … morph algorithm … …

Figure 4 Parts that appear on the same vertical slot in source and target will be morphed together into the same layer. They may have different MIDI channels and instruments.

A similar principal follows with other musical functions such as bass, lead and pads. Unlike The Musifier (Edlund 2004), the user need not explicitly use these part-types, but any parts that have a similar musical function are able to be correlated.

4.4 Morphing parameters

Each morph (between source and target loops) has a number of parameters that can be manually configured, separately to the other morphs. Some of these parameters affect the morph as a whole, others relate to each individual layer being morphed and others are specific to the note sequence morphing algorithms selected for each layer. In this section I will discuss parameters that affect the morph as a whole and parameters that relate to each individual layer. Parameters that are specific to each note sequence morphing algorithm will be explained, along with the algorithm, in chapters five, six and seven.

114 4.4.1 Parameters affecting the morph as a whole

There are only two parameters that affect the morph as a whole – the morph length and the tonal leader/follower. The morph length, in beats, can be set to commonly occurring cycle lengths such as , , , , , , , , etc up to . Only one tonal leader can be specified, and any number of layers can be marked as a tonal follower. Each tonal follower will take the key and scale data from the tonal leader in realtime. The only other feature that affects the morph as a whole is the saving and loading of all parameters simultaneously, including those parameters that affect each layer individually.

4.4.2 Parameters affecting each layer individually

For each layer, there are 18 manual parameters that affect the morphing for that layer. Some of these are structural and one of them influences the note sequence morphing algorithm, however, most parameters are for morph index transformation functions, which are applied to interpolation of meta parameters from the source and target. In addition, the particular note sequence morphing algorithm for each layer can be selected from a list that includes parametric morphing (see chapter five), the Markov Morph (chapter six) or the TraSe morph (chapter seven).

Structural parameters

Structural parameters allow control over elements of the morph structure for a particular layer through volume, repetition and morph index quantization.

If the two parts from source and target that are being morphed together in the same layer (for example, “bass guitar” from source and “double bass” from target in Figure 4) are set to different MIDI channels, the volume of the two channels will be cross-faded and the MIDI events (notes) for that layer will be simultaneously sent to both MIDI channels. In this way, it is possible to morph between parts with different instruments1.

1 Use of the same MIDI channel and different instruments for source and target was also experimented with briefly. A program change was sent immediately before notes where necessary, however, the time taken by synthesisers to load the different instruments was inhibiting.

115 However, because of the perceived drop in loudness when using a linear cross-fade (blue in Figure 5), the gradient of the fade-in (dashed lines) and fade-out (solid lines) usually needs to be increased. For practical purposes this was sufficient, however the equal power (logarithmic) curves used in two-channel DJ mixing desks are probably optimal. In Figure 5, the red arrows show a change in gradient and the red lines show the resulting volume envelopes for the two MIDI channels. The user may prefer one timbre to enter earlier or later than usual, so that the volume envelope for each MIDI channel of source and target can be offset independently, as shown by the green lines in Figure 5, which are shifted by differing amounts. The lengths of the green arrows show different degrees of offset from the standard cross fade (blue).

max

Volume

min 0 Value of morph index 1

Figure 5 Volume cross-fade functions. Solid lines for source MIDI channel volume, dashed for target MIDI channel volume. Blue shows the linear cross-fade, red shows a change in gradient, green shows a change in offset.

Often it does not make musical sense for a morph to be completely smooth. For example, abrupt cuts are often used by composers to effectively segue into a new theme2. In response to this, a parameter can be used to quantize the morph index before it is applied to the note sequence morphing algorithm of a particular layer. This is not a complete solution, but it means that the morph can be easily split into similar sounding segments. Effectively, the morph index that is sent to the note sequence morphing algorithm remains the same throughout these similar sounding segments, even when the original morph index is changing, as shown in Figure 6.

2 As shown in the results of the web questionnaire within chapter seven.

116 quantized morph index

Input morph index

Figure 6 Example of the morph index being quantized into four discrete values. The dashed line is the original morph index, while the solid line is the quantized morph index.

The quantization does not always have a perceivable effect on algorithms that are non- deterministic and somewhat unstable. For these situations, an option to repeat the recently generated output over a section of the quantized morph index is also available. When repeat is on, the most recent output is looped, the length of the loop being specified by the length of the quantized segment. This effectively negates the instability of the non-deterministic algorithms, and contributes to a sense of structural regularity.

Morph index transformations

The morph index is split into multiple parallel morph indices which are each transformed separately before being applied to the interpolation algorithm for each meta-data parameter, as well as the note sequence morphing algorithm for that layer. These morph indices are separate so that transformation functions that are applied to them can be “tuned” independently, according to the required musical effect. The meta-data parameters are: loop length, rhythmic quantization value and rhythmic shuffle value. The morph indices for the interpolation of each of these is transformed specifically for that meta-data parameter using the simple mathematical functions like the orange, green and blue lines shown in Figure 7.

117 Value of morph index input

0 0.5 1 Value of morph index transformed for parameter interpolation

Figure 7 Example interpolation curves applied in parallel for different values of meta-data parameters: cross-over point of 1 and gradient 1 (orange), crossover point 0.5 and gradient 2 (blue), crossover point 0.5 and gradient 1 (green), exponential curve for note morph index (red).

The user specifies two values, the gradient and crossover point, for each of these morph index transformation functions, with a total of six controls over the interpolation of the three parameters: loop length, rhythmic quantisation, and rhythmic shuffle. The crossover point means the point at which the output morph index equals . That is, if the crossover point is set to , when the input morph index equals , the transformed morph index will equal . For example, the orange curve in Figure 7 has a crossover of , meaning that when the input morph index is , the transformed morph index is . The crossover point must be between and . The term ‘crossover’ was preferred rather than ‘offset’, as I perceived the values in relation to the centre of the morph rather than the start. The gradient, or slope of the curve, must be greater than (flat), for example, the steeper slope on the blue line is due to it having a gradient of .

For the morph index that is sent to the actual note sequence morphing algorithm, the transformation includes an exponential (or logarithmic, depending on the setting) curve, in addition to the crossover and gradient of the other parameters. A possible setting for this function is shown by the red curve in Figure 7. This nonlinear curve is added to provide greater control over the interpolation of the morph index for the note sequence morphing algorithm, as it is a more important morph index than the others.

4.5 Note sequence morphing algorithm infrastructure

Important elements of the supporting infrastructure used by the note sequence morphing algorithms is detailed below, including tonal representations, rhythmic representations and extensible software designs. The tonal representation includes key, octave, scale, scale degree

118 and passing note pitches. The rhythmic representation includes quantization, shuffle and loop length. The extensible design includes classes that can be extended for new morphing algorithms and non-standard tunings. These aspects constitute the supporting infrastructure for note sequence morphing algorithms, whereas the note sequence morphing algorithms themselves are discussed later in chapters five to seven.

4.5.1 Tonal Representation

As mentioned earlier, each layer within a state can be labelled as belonging to a particular key and scale; and a particular layer within the morph can be tagged as being the tonal leader. The programmer of the note morphing algorithm has the option of utilising this tonal meta-data and morphing it through a separate process to the note sequence data3. When operating in this way, the note pitch needs to be converted into scale degree. Creating the final MIDI pitch output will require the scale degree, octave4, scale (set of tonal pitch-classes) and key. This fairly common approach is made clear in Figure 8 below.

Key = C# 12 keys: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 1

Octave = 6th 12 steps/octave: 0, 12, 24, 36, 48, 60, 72, … + 60

Degree = 3rd Major scale: 0, 2, 4, 5, 7, 9, 11 + 4

MIDI pitch = 65

Figure 8 Representing MIDI pitch 65 (E#) as the 3rd of C# Major.

In this diagram, we can see how the MIDI pitch 65, or E#, can be represented by a combination of four musical components: the major scale, the third degree, the key of C# and the 5th octave.

This representation works well in theory; however, a surprising amount of tonal music utilises passing note pitches to great effect, either as part of a melodic contour or as discord. One method to overcome this could be to include information on whether the scale degree is “raised”

3 The TraSe algorithm of chapter seven is currently the only algorithm that morphs the key/scale meta-data.

4 The current implementation includes octave information within the scale degree, however they are kept separate here for simplicity.

119 or “lowered”. This would require the algorithm that converts MIDI pitch to the tonal representation to judge whether or not the passing note is a raised version of the lower scale degree, or a lowered version of the higher scale degree. However, a passing note is often perceived as neither – either as part of a run, or an “in-between” note.

It therefore would be more appropriate for the passing note to be represented as an “in-between” note, rather than a raised or lowered version of a scale degree. This way, if a judgement needs to be made concerning which pitch the passing note should resolve to in the case of some transformation, it can be made by the algorithm that renders the MIDI pitch from the tonal representation in realtime – enabling a greater, more up-to-date contextual breadth to inform the decision.

In order to achieve this, the Degree-Passing note (DePa) representation was created. Within this scheme, additional “passing note” scale degrees are added between each of the original scale degrees, which means that, in the typical case, every second degree is a passing note. Whether or not the passing note exists as a MIDI pitch will depend on the scale used.

To ensure that every possible passing note can be represented, the total number of DePa degrees, including the passing notes, is equal to the largest semitone interval between consecutive scale degree pitches multiplied by the length of the scales being used (all scales must be the same length). In the standard case of major and minor, the scale lengths are all , and the largest interval between consecutive tonal notes is semi-tones. This means that in order to represent the passing notes, degrees are needed.

120

Major Minor Degree and scale scale Passing notes

{(), {(11), Even 11 , 10, degrees (10), (9), are out of 9, 8, scale (8), (), Odd 7, 7, degrees (6), (6), are in scale 5, 5,

(), (4), th In minor scale, the 4 4, 3, degree has no pitch (3) , (), 2, 2, So, pitch selection (1), (1), will depend on 0 } 0} musical context

Figure 9 DePa enables accurate representation of passing notes. Counting from one, scale degrees that are odd are in the scale, while those that are even are passing note pitches. Starting with a passing note in the major scale, between the Major 2nd and the Major 3rd, it will be represented as the 4th degree of a diatonic DePa scale. In a minor scale, there is no passing note between the Major 2nd and Minor 3rd, leaving four options available to interpretation: keep the note as a passing note and raise or lower it to the closest existing passing note; or turn the passing note into a scale degree which is either higher or lower (in this case, “higher” has the equivalent MIDI pitch).

To further illustrate DePa, consider a major scale where the Minor is used in passing (Figure 9) and thus recorded as the DePa degree. If the scale is changed from major to minor, the minor third will no longer exist as a passing note, leaving the algorithm that renders the final MIDI pitch with a number of options. It could preserve the accidental nature of the note, an attribute especially significant in chords, by playing either of the nearest passing note pitch classes and . Or, if the note is part of a melodic contour, it may be best to choose from pitch classes or in order to match the contour as close as possible. These options are able to be manually specified in a DePa to MIDI pitch conversion function that is part of LEMorpheus.

Having the ability to represent passing note pitches within the tonal representation has worked well, however, DePa is somewhat inflexible when shifting to scales of different length and with larger tonal steps, such as the augmented tones of harmonic minor. Future research will involve a floating point scale degree representation. This will allow, for example, the harmonic minor

121 passing note pitch-classes and to be represented as degree and respectively. This is appealing due to it being in the same numerical range as the original scale degree and presents advantages to flexibility, yet to be investigated.

4.5.2 Rhythmic representation

For representation of rhythm, each note contains an onset and duration, as with most other note representations. In addition, some rhythmic transformations are included that control looping, quantization and shuffle. These are part of a chain of transformational functions that are activated when the note information of the current time slice is requested by the MIDI player thread (see 4.6). Loop crops the data to within the bar-length and the loop length parameter is also referred to outside the chain by the note sequence morphing algorithm. Quantize snaps the onset of notes into a grid as well as converting the note sequence into streamed segments. Shuffle delays notes that are on the offbeat to produce a swing feel.

Using a transformation chain for representation of rhythm affords realtime functions that can act on streamed note events. Despite this, transformation functions that deal with the entire note data of the layer can be included, but must be ordered prior to other transformations that reduce the length. For example, loop reduces the length of the sequence to the loop length, before quantize reduces it to the resolution of the play cycle ( of a beat) as part of streaming. Rhythmic representations that incorporate more descriptive metres will need to be developed in the future to enable more sophisticated morphs.

4.5.3 Extensible design

The libraries used within LEMorpheus were designed for easy customisation to unique styles of music, music transformations and morphing techniques. This includes classes that deal with non-standard tuning systems and scales, as well as classes that support the creation of note sequence morphing algorithms.

The key and scale of each layer is stored within a Tonal Manager object, which also has a number of different functions for converting between MIDI pitch and the scale degree representations mentioned above. This includes a Tonal Composite object which generates scales and pitch probabilities from note sequences as well as a Scales object which stores the names of scales, arrays of pitch-classes representing them and tracks the current scale being used. To enable a deeper level of musical flexibility, a Tuning System object is used to store the frequency ratios of each pitch-class in an array. Because synthesis occurs externally through

122 MIDI, the only data currently utilized from Tuning System is the number of steps per octave (12 by default).

New note sequence morphing algorithms can be extended from the Morpher class. The logic for note sequence morphing is implemented in a function within the extended class that is called continuously in realtime. Each cycle, it is passed: the two note sequences from source and target that are being morphed, the morph index, the current beat position and a FIFO (First In First Out) queue of notes that have been sent as output from the layer since the morph started. This information can be used in any way by the note sequence morphing algorithm to produce the morphed note sequence of the current time-slice, which is sent as an array of two. The array of two enables notes derived from the source to be kept separate from notes derived from the target, if required. The queue was included for morphing algorithms that are influenced by the recent history and is also used for the “repeat” function (4.4.2).

4.6 Rendering MIDI Output

A call-back routine has been implemented to generate the stream of MIDI events, as shown in Figure 10.

Meta-data (loop length, quantize level)

Apply the selected note Apply the sequence transformation morphing Morphed note chain (loop, algorithm for sequence segment quantize, each layer shuffle)

Morphed note sequence segment Current beat

Note Update current Convert degree queue beat to pitch history

Old beat Play cycle Morphed note sequence segment

Note events Store current events into history MIDI out

Figure 10 Overview of system for rendering MIDI data.

123 The interval between play cycles is generally a quarter beat but is automatically changed to an eighth or half beat depending on the tempo. During a morph, the note sequence morphing algorithms that have been configured for each layer are called every cycle and sent the current time. The meta-data parameters from the transformation chain (loop length, quantize resolution and shuffle delay amount) for each layer from source and target that are being interpolated will be updated with the transformed morph indices (see 4.4.2).

Realtime note sequence morphing algorithms will refer to the morphed meta-data parameters. Non-realtime note sequence morphing algorithms will apply the transformation chain with the morphed meta-data parameters to the morphed note sequence. In both cases, a segment that is the length of the current timing resolution (usually beats) results. Despite this, notes are allowed to have their onsets at any point (float value) within the segment and are allowed durations of any length.

Following this, the key and scale changes required by tonal leading/following between parts are applied (if used) and the scale degree, key, scale and octave information is combined to determine the MIDI pitch (see 4.5.1). The notes events that span each play cycle interval are converted to a sequence of Midishare events and sent to a .

4.7 Summary or morphing infrastructure

The LEMorpheus software system takes source and target note sequences and applies a user- selected note sequence morphing algorithm to create the morphed material. Meta parameters such as quantize and swing are interpolated and these influence the morphed note sequence data. LEMorpheus includes the capacity to create and arrange note sequence loops and morphs, as well as control over the morph index. There are also parameters that influence the morphing process, such as the morph length and a number of transformations that affect the morph index before it is passed to particular components.

The software infrastructure for note sequence morphing includes a novel tonal representation that retains passing note information while using scales and scale degrees, as well as a basic representation of rhythm. The design is extensible, suiting a variety of approaches to morphing and could be easily adapted to unusual tuning systems and scales. Finally, LEMorpheus renders MIDI output by calling a play cycle every quarter of a beat. During the play cycle the current beat is updated, morphed note events are generated, transformations are applied, scale degree representation is converted to MIDI pitch representation and note events are sent out to the synthesiser.

124 Having been developed in response to mainstream electronic music requirements, the system has been limited to music based on loops and layers and deals only with source-to-target morphing. The primary strengths of the LEMorpheus infrastructure appears to be its ability to easily modify or create transformations and morphing algorithms due to extensible design and supporting functions. Many parameters are configurable at the user-level and the programming specifications of morphing algorithms encourage a similar style of user configurability. The weaknesses are poor usability and learnability; however, as a piece of research software, Human Computer Interaction (HCI) is not the primary consideration.

Future experimentation for the LEMorpheus system architecture will involve floating point scale degrees for the DePa representation, more sophisticated representations of metre and rhythm, the capacity for n-source morphing and a two dimensional Cartesian interface for n-sources. The floating point scale degrees will be able to represent passing notes more accurately across a variety of different scales. Metrical representations will enable more rhythmically coherent morphs, in the same way that key/scale representations have increased tonal coherence. Just as DePa attaches tonal significance to passing notes, the rhythmic representation should flexibly deal with unusual and offbeat rhythmic patterns. Finally, it is inevitable that n-source morphing algorithms be implemented, as this will clearly add to the capabilities of the system. The two dimensional interface could be modified for more than two sources, as per The Meta-surface (Bencina 2005) or Oppenheim’s four-source Mutant Ninja Tennis Court (Oppenheim 1995).

125

5 Parametric morphing algorithm

This chapter will detail the first note sequence morphing algorithm that was developed and applied to the LEMorpheus software infrastructure: the parametric morphing algorithm. The parametric morphing algorithm was the first approach explored; both within this research and probably within note-level morphing as a whole (Mathews and Rosler 1969). Parametric morphing involves converting the note sequences of source and target into multiple continuous envelopes and combining the values of source and target envelopes, weighted on the morph index. The combined values are then converted back to note-sequence data.

Another name for this algorithm is ‘interpolation’ (Oppenheim 1995), however, the term ‘parametric morphing’ was chosen instead, because interpolation of the morph index occurs in all the morphing algorithms explored in this research. As well as this the important feature of parametric morphing is the fact that the source and target are parameterised into continuous envelopes, rather than the fact that the parameters are interpolated. That is, the choice of parameters to represent the music is a crucial decision, whereas it is obvious that interpolation of the parameters will proceed simply through weighted combination.

From a compositional standpoint, the parametric morphing algorithm described below is in the realm of mathematical experimentation; the abstract approach to composition. This is because the current low-level scheme does not afford the modelling of higher-level music composition approaches such as heuristic and trial. However, if the note sequences were parameterised into higher level musical constructs, for example, psychological descriptors such as “valence” (happy to sad) and “arousal” (intense to relaxed) (Wundt 1896), some sense of musical intention or musical goal could also be incorporated.

The musical results of the parametric morphing algorithm I developed were adequate when applied to very similar source and targets but unappealing for more difficult examples. This observation was made only through informal tests and, because of the particularly obvious lack in musical coherence, no effort was expended on formal tests. The reader can confirm this by listening to the examples in section 5.3. Despite this, some concepts for future improvement to the current parametric morphing algorithm have been developed and are included in section 5.4.

126 5.1 Overview

Parametric morphing involves the transformation of discrete note sequences from source and target into continuous parameter envelopes, the combination of these envelopes and the generation of discrete notes from the combined envelopes. Advantages of the technique are that it operates in realtime, is deterministic and simple to understand. A disadvantage is that parametric morphing is not a common musical technique and the aesthetic results are unusual. Furthermore, it is currently quite difficult to imbue the process with a coherent sense of style. It is also difficult to deal with polyphonic music, as envelopes afford a monophonic sequence.

Section 5.2 describes the detailed workings of the algorithm, including the envelope representation, combination and note generation. Following this, the section on evaluation, 5.3, provides a number of simple musical examples that demonstrate various properties of the parametric morphing algorithm. The evaluation ends with a remake of Mathews and Rosler’s original The British Grenadiers to When Johnny Comes Marching Home (1969) morph using the new parametric morphing algorithm.

5.2 Description

5.2.1 Envelope representation

Envelope information is stored in a list of nodes, each node consisting of a value for the parameter controlled by the envelope and the time at which that value exists. A continuous envelope is formed by the lines in-between each node.

Loop end point Nodes

Value

Segments of the envelope formed by nodes

Time (beats)

Figure 1 Example envelope representation

127 Because the note sequence is actually a loop, that is, the first note in the sequence is directly subsequent to the last note in the sequence, the last and first nodes will be connected by a line that wraps around from the end of the loop to the beginning. The key difference between an envelope and a discrete note sequence is that the envelope affords the generation of a value from any point on the timeline, regardless of whether or not a note (or node) exists at that time. In the current system, there is a separate envelope created for the following dimensions of note information: phase, inter-onset, duration, pitch, dynamic and polyphony.1

Inter-onset envelope

The inter-onset envelope records the distance between the start times of notes. When generating the list of inter-onset nodes, each one is made to occur at the same point on the timeline as a particular note and the inter-onset value for that node is the distance from that note to the next. A zero gradient linear function used to connect each node. When two or more nodes with the same inter-onset value are in sequence, as with a regular beat, the latter nodes are redundant and are removed.

Loop end point

Inter- Redundant nodes are removed Onset Value 1 (beats)

0.5

123 Time (beats)

Figure 2 Example of an inter-onset envelope representing a crotchet (quarter-notes) followed by four quavers (eighth-notes) spanning a three beat loop. The nodes representing the latter three quavers are redundant, as the zero gradient line continues regardless.

1 To be able to generate notes, either inter-onset or phase is needed but both is not technically necessary.

128 Another alternative linear function is the “self-synchronising” function with gradient of used by Mathews and Rosler (1969). My zero gradient technique requires constant updating during the rendering process (see section 5.2.2) and thus responds faster to realtime changes in the inter- onset envelope, but seems no more or less musically appealing than Mathews and Rosler’s. For discussion of musical examples, see the section on evaluation (5.3) below.

Pitch envelope

Each of the other envelopes: duration, dynamic and pitch, have a node for each note, with the same value. In a similar way to inter-onset envelope, when a sequence of nodes has the same value, all nodes in the sequence except for the head are removed. The function used to connect these nodes is also a zero gradient linear function. For the pitch envelope this is suitable because, unless using a degree/scale representation such as DePa (see chapter four), it does not make sense to interpolate the values between pitches because non-tonal pitches will occur.

The current system can deal with polyphonic music but not in a musically meaningful fashion. An additional “polyphony” envelope stores the degree of polyphony and, if it is greater than one, a copy of the extra polyphonic notes (pitch, duration, dynamic) are added to the node. A note is considered “extra” if it is listed in the note sequence after another note that is on the same start time. There is no specific approach to the ordering of notes that share the same start time. Obvious improvements could be to label which of the two notes is more fundamental to the tonality, or model homophony through chordal tonal representations or model heterophony through multiple parallel streams. However, as most of the tests conducted with interpolation were monophonic, it was not necessary to develop the handling of polyphony beyond this basic level.

Phase envelope

The phase envelope records positive or negative time shifts in beats, relative to the start of the loop and, like inter-onset, uses zero gradient linear functions to connect the nodes. This is analogous to the way musicians play ahead or behind the beat, or the way that the beat in EDM is “turned around” (Butler 2006) by shifting the position of the kick drum to the offbeat.

For example, if the phase shift is at a certain point, the data that is read from the other envelopes (such as inter-onset and pitch) at that point will be taken from the point directly previous by . Conversely, a shift of means the data will be read from the other envelopes at a point ahead by .

129 If is the length of the loop, is the current phase offset, ranging from to , and is the current time that is being shifted, ranging from to , the phase shift function, , is:

Equation 1 Phase shift

The is required to wrap from the range to , to the range to . 2

Currently, the phase envelope is given only a single node during the conversion of notes to envelopes. This is calculated as the time interval of the first node from the start. If the first node occurs more than halfway through the loop, the phase offset is recorded a negative interval from the end of the loop.

Phase was included because of its wide use as a transformational technique in the composition and improvisation of melody; however, considering that only one node is being extracted, phase could be applied to greater effect than at present. Algorithms for estimating phase shifts from surface note sequences would enable multiple phase nodes to be extracted and these are explained in more detail as a possible extension in section 5.4.

Duration and dynamic envelope

For duration and dynamic, zero gradient, linear functions were used; however the capacity for variable gradient linear interpolation also exists. Linear interpolation of duration and dynamic reflects musical practice more accurately, for example, if a musician had to play a sequence between a known quiet note and a loud one, we could expect that, in the standard situation, these notes would gradually go from being soft to loud. Despite this, duration and dynamic are of less fundamental musical significance for most styles than pitch and rhythm (Krumhansl 2000) and the research shifted from parametric morphing to probabilistic morphing (chapter six) before any successful algorithms for parametric morphing of only pitch and inter-onset were invented. Future research in parametric morphing will utilise the capacity for variable gradient linear interpolation that exists in the software.

2 The modulo ( ) mathematical function should not be confused with the remainder function of programming languages such as Java (where it is %). Modulo deals with negative numbers the same way time is wound back on a clock-face, whereas remainder deals with negative numbers in the same way as positive numbers, but with a negative valence.

130 Converting note sequence into envelopes

The process used to generate the envelopes for inter-onset, pitch, phase, duration and dynamic from note sequence is described in pseudocode below:

131 // Inputs: I An array of notes which have: onset, pitch, duration, dynamic. For example, the pitch of the first note in the array is I[0].pitch. s the loop length, or scope, in beats. For example, 4 beats or 8 beats // Outputs: E A collection of envelopes for each dimension. For example, E.pitch is the pitch envelope, derived from the note pitches of I. Each envelope is an array of nodes. Each node has time, t, and value, v. For example, the time that the first pitch node occurs would be E.pitch[0].t and the pitch value would be E.pitch[0].v. For any envelope, add() appends a node to the end, given position and value. // Global functions: ADD-ALL creates a node for each of the envelopes for pitch, duration and dynamic from a given note. The first argument to ADD-ALL is the envelope object (E) that the nodes will be added to, the second argument is the position of the nodes, and the third argument is the note from which the node values are to be taken.

NOTES-TO-ENVELOPES(note array I, double s) returns E { QUICK-SORT(I) // sort according to onset, from first to last

// phase offset IF(I[0].onset <= s/2) { // if it is less than half-way E.phase.add(0, I[0].onset) // phase offset will be positive } ELSE-IF (I[0].onset > s/2) { // if it is more than half-way, E.phase[0].add(0, s - I[0].onset) // it will be negative }

pno = 0 // previous note onset cno = 0 // current note onset i = 0 // a counter

FOR(; i + 1 < LENGTH(I); i++) { pno = cno; // update cno = I[i+1].onset

posi = MODULO(pno - E.phase[0].v, s) // find the current position

IF(cno == pno) { // if the current note onset is same as previous E.polyphony.add(posi, COPY(I[i])) } else { // otherwise, it will be a new onset E.interoffset.add(posi, cno – pno) ADD-ALL(E, posi, I[i]) } }

pno = I[i].onset cno = I[0].onset + sco posi = (pno – E.phase[0].v)%s

E.interoffset.add(posi, cno – pno) ADD-ALL(E, posi, I[i]); }

Figure 3 Pseudocode for the algorithm that converts notes into envelopes

132 5.2.2 Envelope combination

As part of the morphing process, the envelopes from source and target are combined, and weighted on the morph index. This is essentially linear interpolation:

source

value morph target target morph

source

time

Figure 4 An example of weighted combination of a source (round dots) and target (solid line) envelope, to produce a morph envelope (dash-dot). The morph index is , which is why the morph envelope is always exactly half way between the source and target.

Let be the morph index, which ranges between and . Let be the current time at any beat greater or equal to the start: . Let provide the inter-onset value from the source inter- onset envelope at the given . Let be the same but for the target inter-onset envelope. Let provide the morphed inter-onset value at . is derived from a combination of the source and target inter-onset envelopes, weighted on the morph index: . The morphing of the other envelopes – pitch, phase, duration and dynamic – operate in exactly the same way as inter-onset.

5.2.3 Envelope playback

Playing notes from the morphed envelopes in realtime requires the playback algorithm to judge when to create a note and what values for pitch, duration and dynamic to give it. In realtime, an envelope playback routine called the play cycle is repeated at regularly timed intervals, usually at a quarter of a beat. The current time since playback started, in beats, is updated every cycle and passed to the envelope playback routine of the parametric morphing algorithm. During any particular play cycle, the envelope playback routine may or may not create a note and send it out. This depends primarily on the value of the morphed inter-onset envelope and the morphed phase envelope.

133 Interpolating loop length for the morphed envelopes

The loop length for the morphed envelopes, let it be , is interpolated from the source and target loop length, through a combination of the loop length of the source and the loop length of the target, weighted on the morph index. That is, if is the loop length of the source, is the loop length of the target and is the morph index, the loop length is derived thus:

Equation 2 The length of the loop for the morphed envelopes, derived through weighted combination of source and target loop lengths

The source and target loop lengths are defined by the user directly.

Phase shifting the current time

As mentioned, the value in the phase envelope represents a shift ahead or behind the beat by a certain amount. Therefore, before any information from the other envelopes (inter-onset, pitch and so on) can be read, the current time must be shifted by the current value in the phase envelope. Let be the current time, where and let be the current time shifted by the current value of the phase enveloped.

This is analogous to shifting the play head ( ) forward or back while reading from a tape reel (envelopes other than the phase envelope). The precise position of this primary play head ( ) is controlled by a separate tape reel (the phase envelope) and separate play head ( ) that is not shifted.

Let be the function that returns the morphed phase envelope, that is, the weighted combination of the phase envelopes of source and target. The range of values for phase remains to .

Recalling that is the length of the loop, is the current time since playback and is the function that returns the morphed phase envelope, the phase shifted time, , is thus:

Equation 3 The current time, phase shifted and bounded within the loop length.

134 Determining when to play a note: in theory

It is clear that, optimally, notes should be generated when the average morphed inter-onset envelope value since the most recent note that was played is equal to the time that has lapsed since the most recent note was played. In this way, the actual inter-onset between the most recent note played and the new note would be equal to the average value in the morphed inter- onset envelope over the same period.

Let the average level of the inter-onset envelope since the most recent note be . Let the time since the most recent note was played be . The condition for playing a note in any given play cycle is therefore:

Equation 4 Condition for playing a note during any given play cycle. is the average value in the inter-onset envelope since the most recent note played. is the actual interval between the most recent note played and the current position.

This begs the question: how is calculated? Let be the current time, bounded within the loop length, , such that . Let the function that returns the value of the morphed inter-onset envelope at any particular time be . Because this is a continuous function, rather than a set of discrete points, the average, , must be calculated through the integral of the function (the area underneath it) from the most recent note to the current time, divided by the period of that integral, . This way, the length of time spent at any particular value in the continuous function is weighted appropriately into the average.

For the case where the onset of the most recent note played requires no bounding, that is, when , we have:

Equation 5 Determining , the average value in the inter-onset envelope since the last note. is a function that returns the value in the inter-onset envelope at particular points in time. is the current time, bounded within the loop length. is the time interval between the current position and the most recent note that has been played. This is only for when .

135 For the situations where , two spans will be needed, one for to and another for to , recalling that is the loop length. That is:

Equation 6 Determining , as per the previous equation, but in situations where . is the length of the loop.

Determining when to play a note: in practice

An example of how note generation relates to the inter-onset envelope can be shown diagrammatically (Figure 5). Each play cycle, the area under the inter-onset envelope spanned by that cycle is accumulated and the distance since the last note was played is incremented. When checking to see if a note should be generated, the accumulated area is divided by the time that has lapsed since the last note was generated in order to find the average inter-onset value over that period, as per Equation 5 and Equation 6. If the time that has lapsed equals or exceeds this value, a note is generated. The relationship between the area under the inter-onset envelope, the time interval from the previous note and the point at which notes should be generated is made clear in Figure 5 (below).

Inter- Onset Envelope Inter- Onset Envelope

Inter- Inter- Current Onset Onset 0 1 2 3 4 time slice Value Value In In Notes beats beats

Time in beats Time in beats

Figure 5 Top: an inter-onset envelope (dotted line) with the notes (red) that were used to create it. Bottom: Generating a note from the inter-onset envelope – on the fourth play cycle, the area under the inter-onset envelope since the last note was generated (shown by the grey) will be equal to the distance from the previous note to the current position squared, thereby generating a note (shown by the red arrow). That is, combining Equation 4 and 5, .

When a note is generated, it is initialised with values in the pitch, duration and dynamic envelopes at the time of the current play cycle. After it has been generated, a user defined

136 switch controls whether the accumulated area is reset to , or if it has the square of the distance between the previous note and the current position removed, leaving a remainder.

Another user defined parameter controls the incidence of note generation by slightly offsetting the accumulated area on a range from to . Increasing this parameter will increase the number of notes being generated and decreasing it will decrease the number of notes generated.

During the morph, the result of envelope combination might change rapidly, meaning that the accumulated area at one point is not necessarily what it would be if playback had started at that particular morph index. Because of this, the user can specify an alternative option: to recalculate the accumulated area from simulated playback at each morph index value, rather than updating the area and resetting it when notes are generated. This way, a complete note sequence with a constant morph index can be created for each possible position of the morph index. However, because the morph index usually changes over time before any of these complete sequences can be heard, the listener usually only hears a “window” into each one.

Direct interpolation in absolute MIDI pitch space often yields notes that are foreign to the tonality of both the source and the target. In order to counteract this, the user can choose to constrain the interpolated pitch to the set of pitch-classes that are used by source and target. This involves searching the set and choosing the pitch-class that is closest to the interpolated pitch. When the interpolated pitch is halfway in between two pitch-classes from the set, the statistical occurrence of pitch-classes is considered and the more common is selected. When creating the final pitch from the pitch-class and the octave of the original pitch, the pitch can sometimes be shifted by an octave, as when the interpolated pitch is above the highest pitch-class in the set and closer in circular pitch-class space to the lowest pitch-class. This is easily overcome by comparing the distance of the unconstrained interpolated pitch to the new pitch-class at the octave one above, the octave one below, and the original octave, to find the closest.

Often, slight changes in the inter-onset envelope between the source and target will effectively put the music out of phase and adopt a rate of play that is unusual for the time signature. In order to counteract this, another user option is to reset the accumulated area at the start of each loop. This maintains a degree of rhythmic coherence, effectively removing the phase shifts and stopping rate changes short before they begin to imply another time signature.

The pseudo code for the morphing algorithm that combines the envelopes and renders them into notes is in the appendix (9.5.1).

137 5.3 Informal Evaluation

The evaluation of the parametric morphing algorithm described above consisted only of informal testing and listening. This convinced me that, while the algorithm can generate coherent and smooth morphs when the source and target are similar, it fails with patterns that are very different. I considered this to be self-evident from the musical examples that were created with it and so I expended no effort on empirical tests for this algorithm (subsequent algorithms were tested more rigorously, as described in Chapters six and seven). Some of the musical examples that I generated using the parametric morphing algorithm will be discussed below, showcasing various properties of the algorithm that include phase-offset detection; pitch and duration interpolation; cancellation when morphing inverse melodic contours; monophonic music limitation; and onset resolution limitation. The final examples demonstrate interpolation between source and target music of various styles.

5.3.1 Phase offset

Phase offset is currently calculated from the onset of the first note in the loop. Figure 6 shows the transcript of a simple, four beat long pattern that is morphed into a pattern that is identical except for being shifted behind by one beat. The MIDI example is included (~5.1). Because the first note of the target pattern (bar five), starts on the second beat, the phase shift recorded for this pattern is . The source music has no phase shift. During the morph (bars two to four), it is easy to see that the phase is being interpolated from , to , and because the pattern is shifting behind the beat by in each bar.

Figure 6 Phase shift interpolation, made obvious by morphing from a pattern starting on the first beat to the same pattern starting on the second beat.

The limitations in the way the phase envelope is currently calculated are made obvious in the example shown in Figure 7, and heard in this example (~5.2). As with the previous example, the target pattern is the same as the source, except it is shifted behind by beat. Unlike the previous example, there are notes within the last beat of the source that wrap around to the beginning when the target is phase-shifted. Because the first note in the target is on the first beat, the value in the phase-offset envelope of the target is zero. As a result, interpolation occurs in pitch rather than phase, even though the two patterns are identical except for a phase-shift.

138

Figure 7 No phase shift detected - pitch interpolation is used.

5.3.2 Inverse melodic contours

The problem with phase offset is made even more apparent when we consider a source and target that are direct inversions of each other (~5.3). In this case, most of the pitches during the morph will be that of the pitch centroid, which is not particularly relevant or interesting to listen to, as shown in Figure 8.

Figure 8 Interpolating between inverse melodic contours creates a mostly flat ‘unmelodic’ contour.

The details of a much more sophisticated approach to phase detection that might be implemented in the future will be presented below in 5.4.

5.3.3 Pitch and duration interpolation

As opposed to phase and inter-onset, the envelopes for pitch and duration are much more straightforward to generate. The linear interpolation of these two dimensions is clearly shown in the example transcribed in Figure 9 and audible in this example (~5.4). The pitch pattern in the target (bar six) is same as the source (bar one) except that it is transposed up a perfect fifth. The durations of notes in the source are increasingly short, whereas in the target they are increasingly long.

139

Figure 9 Pitch and duration interpolation. In the source pattern (bar one), the notes start with a long duration and become shorter, while in the target (bar 6), the notes start short and get longer. The pitches of the target are the same as the source, but transposed up a perfect fifth.

However, on listening, it becomes obvious that the linear interpolation of pitch is lacking in tonal musicality. To overcome this, tonal constraints have been implemented that shift the interpolated pitch to one of the pitches that occurred in the source and target, as shown in Figure 10 and heard in this example (~5.5). When constraining a pitch that is exactly half way between two of the allowable values, the value that has occurred over a greater time-span of the source and target is preferred. This is evident in Figure 10 where G or B could be equally chosen and B is preferred (see the notes at bar three, beat three and bar five, beat two). All of the B’s occupied beats, while the G’s only occupied around in the source and target collectively.

Figure 10 Constraining the pitch to values that occur in source and target, favouring the more commonly occurring values.

5.3.4 Music of different styles

When applied to various loops of different styles, parametric morphing had mixed results. If the two patterns share the same tonic and have related scales, it can work quite well, despite other

140 stylistic differences. To illustrate this, I have included an example that morphs between Freŕe Jacque a simple French children’s song and a well-known Funk lick from Chameleon, by Herbie Hancock. This is transcribed in Figure 11) and is audible in this example (~5.6).

Figure 11 Morphing between a two bar loop from Freŕe Jacque to a two bar loop from Chameleon (Herbie Hancock).

The morphing example above is partially aided by the fact that the two patterns share the same tonic and have fairly similar scales (C Major and C Pentatonic Minor). When the target is transposed to the much more distant tonic of F#, the interpolation tends to make less sense, in terms of tonality, as shown in the example below (see Figure 12) and heard in this example (~5.7)

Figure 12 Interpolation between different styles is harder when they are also in different keys, in this case, C Major and F# Pentatonic Minor. Source and target are two bars long, as in the previous example.

The first parametric morphing algorithm that was developed by Mathews and Rosler (1969) morphed between The British Grenadiers to When Johnny Comes Marching Home and back. Although the music is more within the context of ‘marching-song’ than MEM, I applied the example to my own implementation of interpolation out of curiosity (listen to ~5.8). While it seems no better than the original, it is interesting to note how different they sound. Part of this is a technical issue – the current implementation of interpolation is limited to onset quantisation of beats (because of the realtime implementation of quantise and the beat play cycle resolution) which makes it difficult to morph to When Johnny Comes Marching Home, which is

141 based on triplets, that is, . This was overcome by using beats as though they were beats and slowing the tempo of the target down to of the source. This introduced another problem, in that rather than morphing from one beat loop to another beat loop, the morph was now between a beat loop and a beat loop and similarities of melodic contour in relation to the start of the phrase would be lost.

These informal tests indicated some of the major limitations to a low-level parametric approach. Essentially, it is a suboptimal compromise between source and target. Rather than retaining the compatible musical elements of the source and target and splicing them together in a way that is reminiscent of both, the current implementation of parametric morphing tends to blur the elements together so that the result can hardly be recognised as originating from either, as is particularly apparent when the morph index is . These results have spurned some ideas for further developments that are outlined below, as well as the investigation of other algorithms that are discussed in Chapters six and seven.

5.4 Extensions

While the current implementation of parametric morphing is fast enough to operate in realtime and displays some elements of a logical musical progression, it lacks musical style and coherence. Some ideas for overcoming this are: alternative and varied envelope representations, interpolation in alternative pitch spaces, higher-level musical representations, rhythmic constraints and more detailed phase offset extraction.

5.4.1 Self-synchronising inter-onset function

The current envelope representation assumes a flat (zero) gradient for each segment in between nodes. This may be a suitable representation for pitch envelopes, however, duration and dynamic would be better served with a straight-line connection (variable gradient, depending on relative position of nodes), for reasons mentioned above. Note-generation could be simplified by using a “self-synchronising” style inter-onset envelope with a gradient of , as originally used by Mathews and Rosler (1969) and shown in Figure 14.

142

Figure 13 Mathews and Rosler’s “self-synchronising function” (reprinted with permission)

However, this technique queries the envelope for a new value only when a new note is being generated which may sometimes leave out important details. For example, during the morph, if one inter-onset distance was particularly large and the values in the interpolated inter-onset envelope suddenly became small, these notes would be missed. An algorithm, such as the current one, that is updated each play cycle makes it possible for these changes to be heard as they occur. However, this may or may not be desirable, depending on the context. Therefore in future developments it would be interesting to compare the “self-synchronising” envelope to the current one in a variety of situations.

5.4.2 Higher-level musical constructs

Incorporating higher-level musical constructs, such as scale degree, scale and metre, could provide a consistent musical framework and be used to define a more coherent musical style. In the same way that interpolated pitches can be locked to known pitch-classes, a set of inter- onsets or onsets could be used as a constraint, increasing the rhythmic coherence by using only known onset intervals or onsets. Despite the gains in coherence, it would be especially difficult to generate music for sequences that have a limited variety of onset-intervals and onsets. As well as this, there is a possibility that too much of this kind of constraint could lead to a mostly static morph that changes suddenly at the halfway point, when the morph index is . The pitch and rhythm constraints could instead be expanded to include onset-intervals, onsets and pitch- classes that are drawn from music theory, rather than the source and target. For example, relevant modes and scales will have pitches that sound pleasing, even though they did not exist in the source or target.

However, constraints are ultimately limited by the fact that the process of interpolation occurs using a primitive representation, for example absolute pitch space, before the constraints are

143 applied. In this case, while the musical surface may exhibit certain stylistic features, the underlying processes do not have much musical stylistic influence because the representations being used are void of the bias of musical style or, conceived another way, low-level representations afford stylistically uninformed music that is often difficult to relate to. For example, it would make sense musically to interpolate pitch in a space that is a combination of the circle of fifths, circle of chroma and linear pitch, rather than just linear pitch, because these musical constructs are at the basis of much tonal music, which is a significant part of MEM. Similarly, pitches could be interpolated in scale degree space, although because the scales themselves exist more as discrete symbols than within a continuous space, interpolating deterministically between them is more difficult.

These kinds of representations have been successfully applied to the TraSe (Transform-Search) morphing algorithm and, to a lesser extent, the Markov Morph algorithm (discussed in chapters six and seven). In future research it would be interesting to see how much of a positive effect they have on interpolation.

5.4.3 Phase offset detection

In a similar way to the higher-level representations mentioned, the phase offset envelope provides an opportunity to extract from the music another dimension which, although not technically necessary, has musical implications. Phase-shifting the beat is a common technique in electronic dance music (Butler 2006) and other styles to renew interest through creation or elimination of syncopation and may be related to the ease with which note sequences can be shifted in sequencers. The current implementation of phase does not utilise the envelope representation effectively, only storing one node, the value of which is calculated from the onset of the first note in the sequence. This single phase value is needed so that the position of the first generated note is known and the interpolation of this value has a noticeable musical impact. However, often rhythmic changes are felt more as a phase-shift, rather than a change in inter- onset rate. Therefore another interesting area for future research of interpolation could be the extraction of a phase-shift envelope and the simultaneous changes to the inter-onset envelope that will be necessary.

Modifying the level of variation in the inter-onset envelope

A user defined parameter can be conceived that controls the relative level of variation in the inter-onset envelope. Let be the function that takes the time, , and returns the inter-onset value in the inter-onset envelope at that time. The values of both and occur within the range of the loop. That is, if is the loop length, .

144 Let be the value of a horizontal line that runs through the exact centre of the inter-onset function, such that the area under is the same as the area under the inter-onset function:

Equation 7 The line that goes through the exact centre of the inter-onset envelope

145 Inter onset

value in beats

Time in beats

Figure 14 Diagram showing one possible example of the line that goes through the exact centre of the inter-onset function . The loop length, , is the period of the function.

Let be the user defined parameter that controls the level of variation in the inter-onset envelope. The function that reduces or expands the variation in the inter-onset envelope is :

Equation 8 A function that reduces or expands the variation in the inter-onset envelope. recovers . , .

A more natural technique perhaps could be to smooth the waveform and this will need to be examined in the future.

Phase adjustments to suit inter-onset modifications

With variation in the inter-onset envelope reduced or increased, it should be possible to reduce or increase the variation in the phase envelope in such a way that the original note onset pattern is maintained. The phase offset envelope should be created with a minimum number of nodes and a minimum amount of variation. If minimising variation without minimising the number of nodes, the function would include extremely short pulses of phase-shift which are not musically realistic. However, if the number of nodes are minimised in addition to minimising the variation, the phase function will be smoother and thus possibly reflect the musical applications of phase- shift more naturally. How is such a phase function calculated? The following three diagrams (see Figures 16, 17 & 18) demonstrate what is required through example.

Firstly, the note onsets from the original inter-onset function and the scaled inter-onset function must be generated. In the example shown (Figure 15), the scaled inter-onset function has been scaled down to a completely flat line (the user defined variation, ). Let

146 the set of onsets generated by be and let the set of onsets generated by be . In Figure 15, the positions of the grey squares on the x-axis are an example of while the positions of the white squares on the x-axis are an example of .

Inter onset Note onsets from original

value scaled to variation (flat) in beats Note onsets from original

Time in beats

Figure 15 Example of note onsets generated from a note onset envelope (white squares) compared to the note onsets generated by the same envelope that has been transformed into a flat line with no variation (grey squares).

Having generated and , the second step is to analyze to see how a phase shift function, let it be , might be applied to so that will generate the original onsets rather than . To do this, each onset value from or that is absent in the other, , will need to be considered as a potential point where a phase-shift node could be added to shift the onsets into synchrony. This is shown diagrammatically in Figure 16.

Note onsets from original Note onsets from original scaled to variation (flat)

Phase offset value in beats Potential new node points

Time in beats

Figure 16 With the objective of phase shifting the modified note onsets in such a way that results in the original note onsets, this diagram shows all of the potential points at which a new phase offset envelope node should be considered for the example note onsets given.

147 Finally, each of the potential points where a node might be added to the phase offset envelope needs to be considered and, if applicable, a phase offset value needs to be assigned. For each potential point, the value of the phase offset will be the interval that “shifts” an onset in onto the closest onset in (see Figure 18).

Note onsets from original Note onsets from original scaled to variation (flat)

Phase offset value in beats

Time in beats

Figure 17 The final phase offset envelope should be calculated to have the least variation, as with this example.

5.5 Summary of parametric morphing algorithm

Parametric morphing involves the conversion of note sequences into multiple continuous envelopes, the weighted combination of source and target envelopes and the rendering of multiple continuous envelopes back into notes.

An envelope is used for each dimension of inter-onset, pitch, phase, duration and dynamic. When combining the source and target envelopes, the morph index is used to weight each of them, such that the morphed envelopes match the source initially and become closer to the target as the morph index increases.

For playback of envelopes, a note is created whenever the average value in the inter-onset envelope since the most recent note was played equals the time interval that has lapsed since the most recent note was played.

Informal evaluation of the parametric morphing algorithm has demonstrated the application of particular features, including the interpolation of phase offset, the cancellation of inverted melodic contours as well as interpolation of pitch and duration. Experimenting with music of different styles found that, although reasonable results occur when the source and target share crucial musical elements such as key, source and targets that are even moderately dissimilar, are much more dissatisfactory.

148 Potential extensions to parametric morphing include: a self-synchronising inter-onset function, in order to allow historical comparisons; higher-level musical constructs, such as scale degree, scale, key and metre; and phase offset detection, where the user could control the relative levels of variation in the inter-onset and phase envelopes.

From the detailed explanation of parametric morphing, the differences with other historical approaches become obvious, justifying the claims to novelty. Despite this, the parametric morphing algorithm described here fell short of an acceptable level of effectiveness in musical output. Although a number of improvements to the current parametric morphing algorithm were conceived, a decision was made to shift the research towards a fundamentally different approach that had not yet been investigated to any great extent: probabilistic morphing. The rationale for shifting the research in this direction was that probabilistic techniques offered random access to the source and target notes, thus increasing the contextual breadth available to the algorithm. As well as this, it appeared more relevant to broaden the approaches to compositional morphing, rather than deepening a single approach (parametric morphing) that was conceived many decades ago.

149 6 Probabilistic morphing algorithm

Having discovered only mediocre musical results from parametric morphing, another approach utilising probability, prediction and similarity measures was developed instead. The particular probabilistic morphing algorithm I developed is called the Markov Morph and is the topic of this chapter. The Markov Morph exhibits more musical coherence than the parametric morphing algorithm described in the previous chapter, which is not to say that parametric morphing is an approach that is flawed in general. The Markov Morph is also a more original development within the field of compositional morphing, as the parametric approach had been explored originally by Mathews and Rosler (1969). Musically, the Markov Morph is distinguished by a characteristic unpredictability in style. The Markov Morph was initially inspired by Markov probability techniques, but has since been modified, fairly drastically, to suit the particular conditions of realtime operation and small sample sizes.

The Markov Morph involves the weighted selection of source or target, creation of a probability distribution and the generation of notes. The technique is fairly efficient for the current requirements and thus possesses a high degree of realtime flexibility. It is also able to generate musical extensions and, being stochastic, the morphs are able to continually change even when the morph-index is fixed. Sophisticated note similarity measures allow elements of musical style to be controlled. While some previous research has been conducted into probabilistic techniques for morphing (Didkovsky 1997; Momeni and Wessel 2003; Oppenheim 1995; Polansky 2006), the Markov Morph contains developments that are novel to note sequence morphing, including conditional probability and note comparisons in a continuous space.

From a composer-agent standpoint (explained in 3.1.3), the Markov Morph models the heuristic approach to composition. The music is composed bottom-up, note-by-note, estimated from the recent output as it is generated and using probabilistic rules that are extracted from source or target. There is an intention for the music to be based on the existing styles but not necessarily a clear musical goal. There is no ‘perspiration’ that is indicative of the trial approach, nor is the focus on exploration of mappings, as with the abstract approach. The Markov Morph utilises analytic, transformational and generative algorithms, with a medium level of contextual breadth.

The first section, 6.1, will describe the techniques at the basis of the Markov Morph algorithm, including weighted selection, probability distributions and note similarity measurement. The process can be summarised thus: within each play cycle (usually at quarter beat intervals) either the source or target is selected through random selection weighted on the morph index. The

149 recent history of note output is compared to each note in the selected source or target and a list of similarity-to-history ratings is created. The similarity ratings are used as a probability distribution to generate (or not generate) the next note from the selected source or target for that play cycle.

Musical outcomes from the Markov Morph were subjected to the scrutiny of a focus group and a focus concert, both of which are detailed in section 6.2. The former study was developmental, aiming to streamline formal evaluation processes. The latter was a functional test that established the competitiveness of morphing when compared to DJ cross-fading, particularly in situations where the source and target music are very different. Interesting stylistic characteristics of morphing became apparent through the comments of the participants and some ideas for further improvements to the algorithm were obtained. The focus concert itself appears to be a fairly original evaluation methodology for computer music, particularly for compositional morphing (Clarke and Cook 2004). While the data gathered in the focus concert was mostly valid, the experience was also useful for improving the evaluation techniques in future studies (see Chapter Seven).

Overall, the probabilistic morphing algorithm is an important part of the research into automated and interactive compositional morphing due to it being a novel approach with some advantages over traditional techniques such as DJ cross-fading. As with parametric morphing, there are a number of extensions to the current probabilistic algorithm that may be pursued to increase its musical efficacy and these are discussed in section 6.4.

6.1 Description

Firstly, it should be noted that for all of the note sequences described below, notes that occur with the same onset are grouped into the same event, rather than existing as separate elements within the list. Therefore, I will often refer to ‘note-groups’ rather than notes. The note-group is a vertical grouping (like a chord) and should not be confused with note sequence, which is a list of notes. Instead of note sequences, I will also refer to note-group sequences, which are lists of note-groups.

During each play cycle of the Markov Morph algorithm, the morph index influences a random selection of either the source or target. This process, called weighted selection, is at the basis of the Markov Morph and is described in more detail below (6.1.1). In the simplest case, with the user specified order (as in, ‘Markov order/depth’) at , each note-group in this selected sequence is compared to the last note-group that was played and assigned a rating of similarity to it. The exact similarity measures are detailed in 6.1.3. With order above , each similarity

150 rating will be conditioned on (multiplied by) the similarity rating of the notes immediately prior, so that short sequences of note-groups are compared, rather than just single note groups. The final similarity ratings are then used as a probability distribution to select the next note that will be played, which assumes conditional independence.

151 6.1.1 Weighted Selection

Weighted selection is a morphing technique that selects between the source and target during each play cycle. It was first introduced by Oppenheim (1995). Let be the morph index that is normalised such that when , the source is playing and when , the target is playing. During each play cycle (usually distanced at quarter beat resolution), the probability of selecting the source is and the probability of selecting the target is . Any notes in the selected source or target that start within the bounds spanned by the play cycle are then played.

The effect is that small segments of music are spliced together haphazardly. It produces quite interesting, although often bizarre, transitions between parts that have many notes and a strong sense of metre, such as drums. However, in sparser parts, important notes can easily be missed, greatly reducing the ability to perceive coherent phrases in the music. This problem is partially solved in the Markov Morph by using various statistical properties of notes as a selection measure rather than purely the occurrence of note onset.

6.1.2 Markov Morph overview

As with weighted selection, the Markov Morph starts by selecting either the source or target. However, rather than playing all of the notes in the selected pattern that are within the span of the current play cycle, the whole note-group sequence is compared with the recent output, using note-group similarity measures to generate a probability distribution to select the next note- group. The probability distribution is regenerated each play cycle because the source and target music, as well as a number of user-defined parameters, may change in realtime. The current implementation of the algorithm is able to function in realtime, given moderately sized source and target loops of under twenty note-groups in each part.

This approach deviates somewhat from the standard statistical approach of Markov Chains, where the similarity measure would return either only or , depending on whether an exact match was found or not. The standard method only works when there are many items to form a comparison with, for example, a database of music with over note-groups would probably be sufficient. In the context of morphing short loops of music in LEMorpheus, the sample space is clearly insufficient, often being less than note-groups. With this small amount of note-groups, situations when the source and target do not contain any of the same notes are quite likely. In these situations and with a discrete or similarity measure, none of the notes would be given a probability of occurrence. In which case, the Markov Morph would fall back on weighted selection. In response to this, the more complex, continuous similarity measures that are currently used were developed.

152 In order to fully explain the Markov Morph, I will now describe in detail what constitutes the seed, how segments are compared to the seed and how notes can be generated using the similarity measurements.

Defining the seed

The seed is the list of note-groups of the most recent output, with length equal to the user- defined variable, ‘depth’ (Markov order). To define this formally, let the list of notes in the entire history of playback be . Each play cycle, as note-groups are generated, they are appended to the end of . Let the length of at the current point in time be and let the user-defined depth be , which ranges from . Let the list of notes that constitutes the seed be . Provided that , consists of the most recent elements of , or . In this case, also serves as the length of .

If the morph starts playback before any notes have been played, is initialised with the notes from the source, , or target, , depending on the direction of the morph: forward or backward respectively.

Comparing segments with the seed

Let the list of note-groups selected through weighted selection be . Recalling that is the morph index, the chance that is and the chance that is . Each note-group in , along with the note-groups preceding it, will be compared to the seed (see above). Let contain similarity measurements between and all the note-group sequences in that are of the same length as , wrapping where needed. For example, if the depth is two, , and the length , then , that is, , would be compared to each of , , and . Each of these comparisons would be a value in .

Let be the length of and , which are necessarily the same length. Let be an index with the range that points to the tail (last note-group) of a note-group sequence in that is being compared to . When , the head and body of the segment wrap around through the end of . Recall that is the list of seed note-groups from the recent history of length , the user specified depth/order. will hold the similarity between and the string of notes in of length that ends on .

153 1 2 3 4 5 The selected source or 60 62 55 50 60 pitches target sequence

1 2 The seed from the pitches 60 62 history

1 2 3 4 5 Example similarity 0.8 1.0 0.1 0.2 0.3 similarity measurements

Figure 1 An example of how similarity measurements between segments of the selected source or target sequence and the seed are created. In this simplified case, only monophonic pitches are being compared and the order (length of seed) is only two. It is clear that the first two note pitches are the same as the seed, and , so the similarity measure relating to the tail of the segment (index ) is . None of the other two-note segments in match the seed and so the other similarity ratings are less than . Note that the sequence wraps around – the segment to , and , is fairly close to , and , which is apparent in the relatively high similarity rating for index of .

Recall that is the length of and , is the length of the seed and that is an index pointing to the tail of the segment in that is being compared. Allowing for wrap-around, the sequence is being compared to the sequence to generate the similarity measurement in . If , the sequence would need to be looped until , however, this has not been implemented as such a case is quite rare.

Factoring each element in the segments being compared

Let be a function that measures the similarity of two note-groups, returning a value between , least similar, and , most similar. The details of the similarity measure used will be provided later. Recall that is the tail of the note group sequence being compared to the history. The similarity measurement of the most recent note-group in and is given by . For cases where the user sets , the similarity matrix can be defined thus: . For cases where , will also be factored by the similarity of the note-groups preceding to the note-groups preceding . Let be an index with the range . Recall that is the index pointing to the head of the segment in that is being compared to . Let be a normalisation constant which ensures that sums to . Using the notation ‘ ’ to indicate the cardinality (length) of the sequences, the similarity ratings are defined thus:

154

Equation 1 Function to calculate the similarity matrix . is the head of the segment of length in that ends on , which is to be compared to . The is to allow for wrap around. is a normalisation constant.

Generating notes from the similarity matrix

Given that holds the similarity between the seed, , and the segment ending on , how can this information be used to generate new notes? It is logical that the note-group which directly follows the segment in that is the most similar to should have a high likelihood of being played next – this way, if the music that is played would resemble . Because of this, the similarity matrix can be treated as a complete (sum to ) probability distribution matrix – the similarity value in is the probability that will be selected for playback. To select notes for playback, a note-group is chosen randomly according to the probability weights in .

6.1.3 Markov Morph details

Combined similarity measure

The similarity measurement function that compares two different notes is a weighted combination of inverse distances between note pitch, duration and start-time. Dynamic, being less crucial, has not been included although it would be a beneficial addition for future work. The user controls the weights that affect the balance of each distance in the final value for similarity. Let be a similarity measure function that compares two notes, and , and returns the similarity rating. Let , and (these are detailed below) be similarity measures functions for pitch, duration and onset respectively. Each of these takes two notes and returns a similarity rating between and . Let represent three user-defined weights for each of these similarity measure functions, each weight ranging from to . Letting be a factor such that the result of is normalised between and , we have:

Equation 2 The similarity between notes as the combined similarity of pitch, duration and onset, weighted by the user-defined weights.

155 The similarity measurement function is used to derive the similarity matrix, , as explained in Equation 1.

Variable levels of similarity measurement contrast

I included a variable that controls the level of contrast in the similarity measurements, let it be , which ranges from to . It is used in a function, , that transforms the similarity ratings in the similarity matrix . When , the output of will be effectively discretised to either or . When , there is no difference. The musical effect of increasing the contrast is to reduce the level of randomness in the music being generated, by reducing the number of candidate note-groups. However, reducing the candidate note-groups also increases the likelihood of ‘null predictions’, where .

The contrast function takes to a power between and , defined by the user. The result is then normalised again between and . With as a normalisation constant:

Equation 3 The contrast function. As increases, the difference between low and high similarity ratings are compounded.

Values that are smaller than the arbitrary cut-off of are quantised to . While using an exponential function is currently sufficient, in future work, a sigmoid function would be more suited to accentuate the contrast, as it reduces low values and increases high values, instead of reducing all values (albeit, reducing the low values more than the high values).

Combined pitch similarity measure

The function that determines the similarity of note pitch, , is a weighted combination of the similarity in linear pitch space (as per MIDI, from to ), the Circle of Fifths (CF) and the Circle of Chroma (CC). This pitch space is similar to that originally proposed by Shepard (1982) and therefore has some basis in music psychology. Let the linear pitch similarity function be ; the similarity in Circle of Fifth be ; the similarity in CC be . Let hold three user-defined weights and let and represent two MIDI pitches being compared. With as a normalisation constant, we have:

156 Equation 4 The similarity between two note pitches, and , as the combination of similarity in linear, CF and CC pitch spaces, each weighted by user-defined weights.

Linear pitch similarity

The linear pitch similarity function, is the absolute difference between the two input pitches, normalised to the range from to , taken to the power in order to magnify the resolution of smaller distances and finally subtracted from so that is totally similar and is totally dissimilar. This simple similarity measure can be generalised. Let the range be , which in the case of MIDI pitch equals , and let the magnification be , which in this case equals . We have:

Equation 5 Generalized similarity in linear space is the inverse of the difference, normalised by the range of possible values, , and taken to the power, in order to exaggerate the smaller differences. For MIDI pitch space, . I settled on the magnification after experimentation.

The power was chosen, after some experimentation with other powers, because it changed the curve of distances to an order of magnitude that seemed to roughly reflect my own intuitive musical understanding of the similarity measure more accurately, as well as making it more easily comparable to other similarity measures. For example, an octave difference, without the magnification, would rate a similarity of . With the power, this drops down to less than . The difference of a semitone, the most similar interval in linear pitch space, would be without the power and with it – this appeared to be suitable. Using a power that deviated from by a small amount would be unlikely to have a significant effect on the algorithm.

CC and CF similarity measure

The similarity in the CC is the distance in pitch class, normalised between and , and subtracted from . Let represent the number of steps per octave. The default is , due to there usually being steps per octave. In implementation, LEMorpheus does not use any other value than this default, however, it is described as a variable here so as to theoretically allow for other possibilities. Let and be the two input MIDI pitches (range to ) and let be a function that compares them in the CC and returns a similarity rating. is a normalisation

157 constant that ensures the result is between and . The CC similarity measure can be defined thus:

158

Equation 6 Similarity measure as shortest distance between pitches in the CC1.

Because this equation can be a similarity measure in any circle space, the formula for the CF is the same, except that and need to be transformed into the CF before they are compared. Let the interval of the fifth be . Recalling that is the number of steps per octave, the fifth is calculated thus: . The is needed because the interval is the tonic, rather than the . Usually, which means: .

Transforming a pitch into the CF involves multiplying by the fifth interval, , and modulating by the number of intervals per octave, . The CF similarity measure, , is thus:

Equation 7 Similarity in CF space. The function measures the shortest distance between the inputs in chromatic circle space and is explained in the previous equation.

Note duration similarity measure

The function that determines the similarity of note duration is a combination of two functions: relative distance and factorial difference. The relative distance function was developed in response to the fact that duration does not have a limit – rather than normalising the difference between and an arbitrary limit, the relative distance function is based on a ratio between the inputs.

Let and be the durations of the two notes being compared and let be the relative distance function. The variable indicates the value of the difference, , at which point the similarity measure is half. That is, when , . The mid-point variable, , thus controls

1 Note that modulo ( ) operates like a clock face, always returning a positive number , as opposed to similar operators in some programming language, for example the remainder operator (%) in Java which returns negatives.

159 at which level the measure will be most sensitive. In LEMorpheus, . The relative distance function for duration is thus:

160

Equation 8 Function that finds the relative distance between two durations, without any upper limit imposed. It is centralized on .

Another component of the duration similarity measure is the ‘common factor’ difference function. The purpose of this function is to determine how closely note durations are related by a common factor. For example, the duration of short notes that are swung may be factors of long notes that are swung, for example, , , and . In this case, a duration of should bear less similarity to than , even though it is closer. To measure the common factor distance, the remainder from dividing the higher number by the lower number can be taken, centred on , made absolute and then normalised to be between and :

Equation 9 A measure of the common factor difference between two durations.

The duration similarity measure, let it be , is a combination of the relative distance function, , and the factorial difference function, . As the distance between the two durations has much greater impact on their being perceived as similar compared to the factorial difference, the latter is weighted with less influence in the combination. It currently only contributes to the duration similarity function. This weighting that was determined though informal trial and error analysis of test-cases:

Equation 10 Duration similarity is the inverse combination two separate distance measures: relative distance and factor distance (described in previous two equations). The latter is weighted less due to it being a less important measure of similarity.

Onset similarity measure

The onset similarity function is a weighted combination of various inverse circle distances, each representing a different metre of loop length. Let be a circle distance function, (similar to Equation 6), that takes three arguments: the two onset values to be compared, and , and the

161 length of the circle space within which they are to be compared, . The onset values are decimals with a range from to the length of the loop. We have:

Equation 11 Circle distance function. and are two values being compared, and is the length of the circle space they are being compared within.

This circle distance function is then applied to find the distance from a combination of circle spaces within the bar. Let contain five different user-defined weights, one for each circle space length of , , , and (in beats). With the normalisation constant to keep the result within the range of to , the onset similarity function, , is thus:

Equation 12 The onset similarity function is the inverse of a weighted combination of distance in five loop spaces of different lengths.

This technique allows stylistic notions of rhythmic similarity to be specified by the user. For example, within a straight 4/4 musical style of morph, it might make more sense to increase and above the others, as they relate to lengths of and respectively. should be weighted heavier when similarity on the micro-level needs to be exaggerated, for example, between and . For the over rhythmic influence, , the weight favouring the loop-space of beats can be used. For future work it would make sense to also include a weight for a beat circle space.

Polyphony

When the similarity measurement function is given note-groups with more than a single note, each note within one group is compared to each note within the other and the comparison with maximal (the highest) similarity is the one that is used. Let and be two note-groups (sets of notes with the same onset) being compared, with the number of items in each set being larger than . Let the function return the similarity between two note-groups and let return the similarity between two individual notes, as per the ‘combined similarity measure’ discussed above (Equation 4). Polyphony is dealt with thus:

162 Equation 13 The similarity measure for two polyphonic note-groups and . is the similarity measure for individual notes within the note groups.

This is not an optimal measure – if two note-groups each have a note that is maximally similar to the other, then removing or adding other notes will have no effect on the similarity rating between the note-groups. Some ideas for improvements to the approach that would be trivial to implement are mentioned in section 6.4.

Realtime implementation

The note selection process outlined above occurs every play cycle, so as to allow the system to be responsive to changes in realtime. If the inter-onset between the most recently played note- group and the time at the current play cycle is not equal to the inter-onset between selected note-group and the note-group previous to it in sequence then nothing is played. If nothing is played and the gap between the current time and the recent note onset is larger than the largest inter-onset interval in the selected source or target, , then it will be impossible to select a note for playback. This is called stream loss, whereby the default fallback of weighted selection is used.

In order to control the occurrence of stream loss, I have included a user-defined weight called ‘anticipation’, which ranges from to . When checking for stream loss, the anticipation variable is multiplied against the value of the largest inter-onset interval in to make it smaller or larger than it really is. If anticipation is smaller than , stream loss may be ‘forced’ when it otherwise need not occur. Another alternative would be to predict the next note only once – after a note has been created, rather than each frame, and keep this selection until the appropriate point to play it is reached. This would ensure that weighted selection is never used as a fall back, however it would be less responsive in realtime, particularly when very long notes are used.

The time complexity of this algorithm is for monophonic music. It is also in polyphonic music if the number of voices is limited. This enables realtime adaptability to changes in source and target music. The complexity was determined thus: if is the number of note-groups in the selected sequence, is the number of layers of polyphony in each note-group and is the Markov depth, in each play cycle there is: . Since is limited to below and is most often no larger than , they can be considered constants, thus leaving .

163 6.2 Informal analysis

6.2.1 Weighted selection

The informal analysis of weighted selection was conducted by creating numerous musical examples and reflecting on the impact that the weighted selection algorithm had on how they sounded. To recap, weighted selection operates by selecting between either source or target, weighted on the morph index, at each play cycle. Two examples of weighted selection morphing between drum patterns are included within this section.

Unlike parametric interpolation, the weighted selection deals particularly well with rhythmic and polyphonic parts, rather than monophonic and melodic parts. Like all morphing algorithms it works better with source and target patterns that are already similar, however similar note onsets in particular are useful for providing consistency. In the current implementation, because the notes are being copied as a whole group of notes within the beat span, rather than being generated as a single note, the possible onset resolution is much smaller than the play cycle resolution. That is, many notes within one span can be sent to the scheduler and thus more complex rhythmic possibilities emerge.

A demonstration of the ability of weighted selection to deal with rhythmic and polyphonic parts can be shown by morphing between two drum patterns, as in the following example (~6.1), transcribed below (Figure 2).

Figure 2 Morphing, through weighted selection, between two different drum rhythms of the same length. The audible notes are in red, while the semi-transparent black and white notes underneath are provided as a reference to the source and target respectively. The first bar is the source and the last bar is the target.

164 In all the morphing algorithms, swing is handled easily because it is treated as a meta-parameter that can be linearly interpolated. A technical problem of the current implementation is that for swing to occur, quantise must also occur which means that the target of the previous example would not have been able to represent the snare roll on beat (counting from ). The following example (Figure 3) is similar to the previous, except the source is hard swung and the target is not (~6.2). In terms of swing, this transition appears to be handled adequately.

Figure 3 Swing can be dealt with easily, but different dynamic levels make the random nature of the weighted selection more obvious.

A major problem with weighted selection is that any similar sections within the source and target that are not within proximity are overlooked as pivot points. That is, even when sections that might be spliced together well exist, if they are not in the appropriate location within the bar, they will never be spliced together. Another major problem with weighted selection is that the resulting output tends to be randomly spliced at the frame resolution, rather than the length of phrases as would be more musically appropriate. This becomes especially obvious when the source and target have substantial differences in their pitch, dynamic and duration content. For example, the rendering in Figure 5 shows how the dynamics can jump unexpectedly from high to low or visa-versa. These two problems, pivot points and phrase boundaries, were addressed by the Markov Morph and TraSe algorithms respectively.

6.2.2 Demonstrating Markov Morph through variation

For this section, the effect of the different parameters described above will be demonstrated through a number of variations (as in, ‘variation on a theme’). While not the focus of the

165 research, variation is an effective technique for examining the properties of the algorithm because it is essentially the ‘easiest’ kind of morphing – from a source to itself.

To recap on the Markov Morph process, the note-groups are probabilistically selected from within a note-group sequence (source or target through weighted selection), favouring note- groups where the preceding note-group (in the sequence) is similar to the most recent note- group output. This approach enables interesting connections to be drawn between sections of the source and target that are similar. The weighting of the similarity measure enables different styles of generation to occur, including: rhythmic, melodic, fifths-oriented melody or chromatically-oriented melody. Despite these gains, the current implementation has a range of limitations and problems, including illogical handling of polyphony, the disregard of pitch interval data and discontinuity when inter-onsets in source and target are very different. These problems are overcome in the TraSe (Transform Select) morphing algorithm that is discussed in the next chapter.

Controlling the level of variation

The Markov Morph algorithm includes a user-defined parameter, ‘contrast’, which can exaggerate the level of contrast within the probability distribution. The musical effect of this is to control the amount of overlap and randomness that is possible. When it is high, many different notes could be mistaken for the seed (like a composer with blurry vision) and thus used to generate the subsequent note, whereas, when it is low, only the notes that match the seed very specifically will be recognised. For musical variations, it essentially controls the extent of deviation the variation has from the original. In the following example (Figure 4), the contrast is decreased during bars to and increased over bars to (~6.5).

Figure 4 Example of a variation of a two bar bassline of cheesy nineties dance (first two bars). The top piano roll contains bars one through sixteen and the bottom piano roll continues with bars seventeen to thirty-two. The contrast is decreased from bars three to seven and it remains low until it is increased again over bars twenty- four to twenty-eight.

166 This example also shows that notes are only allowed to be played if the inter-onset with the note preceding it matches the time since the last note was played. By bar , we can see that the original pattern had emerged but was behind the beat by . This has occurred because the contrast has been increased far enough to strongly favour notes that fit the original pattern, but the onset of the note is relative to the previous note, which in this case happened to be behind the beat. At this point the contrast has not been increased as far as to force weighted selection through stream loss and the onset intervals at that point are fairly short, further reducing the opportunity for stream loss, which would have shifted the pattern back onto the beat.

Comparing the Circle of Fifths and the Circle of Chroma

The differences between using the CF rather than the CC for similarity of pitch becomes much clearer when applied to simple scales rather than any particular melody. With a diatonic scale, the CF space will, on average, yield a greater variety of note choices than CC, indicated by a higher average value for the sum of the similarity matrix at each frame (compare Appendix C Figures C-5 and C-6). With a short chromatic run that does not have the entire set of chromatic pitch classes (for example, only seven pitch classes, as with the diatonic scale), the opposite is true. With a chromatic scale, containing all of the pitch-classes, this value will be fairly similar for both CF and CC.

The following two examples (Figure 5) are variations on the major scale with pitch weighted maximally, applying the CC (~6.6) and the CF (~6.7) respectively. The contrast for the CF example is increased so that the two examples have a similar overall rate of variation, making the harmonic differences more audible. The moving average shown in the print-out in Appendix C Figures C-7 and C-8 verifies that there is only an average difference in overall similarity of , confirming that the two have a comparable rate of variation.

167

A.

B.

Figure 5 Transcripts of two different variations on a simple major scale run, up and down. “A” uses only the CC for the similarity measure and “B” uses only the CF. The depth for both is . For “B”, the contrast was increased so that the two could have a comparable rate of variation, as the pitches of diatonic scales are more closely related in the CF than the pitches of chromatic scales.

Multiple perfect matches through onset similarity

On some settings, the similarity matrix contains multiple ‘perfect’ matches, that is, maximum similarity with seed. In this case, even the maximum setting of contrast will not be able to provide a completely accurate reproduction of the source because these values cannot be ‘squashed’ any further and multiple options for note generation will remain. This can be seen when using any similarity measure that judges two different notes to be equally similar.

For example, if onset similarity is weighted maximally; the modulus beat space component of onset similarity is weighted maximally; the length of the loop is beats; and some of the notes in the first two bars are on the same onset (relative to the beat space) as the notes in the second two bars, then these notes will have equal weighting and thus both be candidates. If the contrast is maximal, this would be equivalent to weighted selection between the first two bars and last two bars, as demonstrated in the following example. The musical example of Take On Me by a-ha was chosen because the onset pattern in the second two bars is exactly the same as the onset pattern in the first two. The Markov order has no effect because the note onsets in

168 the modulus space are exactly the same for both sections. The first four bars contain the original pattern. After this point, only the modulus beat space similarity measure was used and it is clear that the pitches from the first two bars of the original loop are mixed with the pitches of the last two bars (~6.8).

Figure 6 Variation of a four bar loop from Take On Me by a-ha, from bar five and onwards. Due to using only modulus eight beat space as the similarity measure, the notes from bars three and four of the original loop are measured as a perfect match to the notes in bars one and two. The notes highlighted in green are the actual notes played. For reference, the notes in black are the notes of the second two bars and the notes in white are the notes of the first two bars of the loop.

This is equivalent to a weighted selection between the first and second halves of the sequence. For confirmation, see Figure C-1 in Appendix C which is a snippet of the print out that shows the similarity matrix created at each quarter beat.

Modulus three and four beat onset space

A similar effect is can be created, with a notably different musical result, by using modulus beat space and modulus beat space together. In this case, with a Markov depth of , some of the time there will only be one single match found (the original note will always provide a perfect match, no matter what space it is in) and at other times, two onsets coexist within the macroperiod of modulus and space. The example used here is the a-ha sequence of the previous example, with a length of beats, however, if the length of the original loop was longer than , there would be more than two such sections and thus more than two matches possible. The contrast remains high in this example (~6.9).

169

Figure 7 Top is a numbered transcript of the Take On Me loop that can be used as a reference for the occurrence of the original notes in the generated material. In the two panels beneath this is a variation generated from the Take On Me loop of the previous example, using modulus three and four beat space for onset similarity. The actual notes played are highlighted in green. For reference, the notes from the first two bars of the original are overlaid in black while the notes from the second two bars of the original are overlaid in white. New notes that do not appear in either are red. Numbers are written in above or below notes refer to the position in the note-list (counting from zero) of the original sequence. “L” indicates a note generated through stream loss. The blue band at the top can be used as a reference to modulus three beat space.

In the diagram, it is possible to see in bar and that many of the notes are generated through combined similarity in modulus and beat space (they are annotated with numbers). Effectively, notes that occur where the beat space is synchronous with beat space (bar and

170 ) are randomly substituted. See Figure C-2 in Appendix C for an example of the print out of similarity matrices generated for each quarter beat.

Many multiple perfect matches through pitch similarity

In the previous two examples, the onset similarity is weighted maximally and this tends to favour rhythmic coherence or, on some weightings of modulus , , , , or beat space, rhythmic complexity, within the music generated. For more of an ‘improvised melody’ effect in the music, the pitch similarity can be weighted maximally as is the case in the following example (~6.10 and Figure 8). Using the same a-ha tune, a Markov order of , maximal contrast and weighting linear pitch space highly, the musical difference is immediately noticeable. The patterns that emerge deviate much more from the original, because there are many more choices available at each quarter beat.

171

Figure 8 Example of variation of Take On Me by a-ha where linear pitch similarity is weighted maximally, Markov order is one and contrast is maximum. To assist analysis, the number of the original note, from which the generated note is copied, is notated in red above each note. If a note is generated through weighted selection due to stream loss, an “L” is used instead. For reference, the numbers of each original note is included above.

In this example, because the contrast was decreased to a moderate level, the generative capacity of the algorithm is significantly higher, in that there are many options at any particular time to choose from. This is evident from the accompanying print out in the Appendix C, Figure C-3.

172 This is essentially an example of the well known and commonly used technique of generating note pitches using Markov chains, except that it is in realtime and the current implementation allows stream loss to occur, rather than making a single decision at each step of note generation. While some of the phrases that are generated sound believable, there is not much coherence in their placement in relation to each other or within a larger structure. For example, bar sounds like it could be a worthwhile phrase by itself, but would be more suitable at the end of bar and the beginning of bar where most of the notes originated.

Stylistic constraint

Constraints that favour placement of phrases in some way more similar to the original can be introduced by increasing the weighting of onset similarity, for example, to , while keeping pitch similarity high at . In the following example (~6.11 and Figure 9), the modulus , and beat spaces were weighted high in order to reduce the similarity of notes that are close to each other, while still favouring notes with onsets that fit the original pattern. The CF similarity was weighted maximally, order was , and contrast was moderate.

173

Figure 9 The Take On Me variation using pitch similarity at ; onset similarity at ; Markov order ; moderate contrast; inter-onset space of modulus , , and ; and CF pitch space. The cycle of is included in blue numbering along the top for

174 reference, as is the original pattern (red notes, top). The numbers indicate the original note. The L indicates stream loss.

Although, the structure remains random, it is clear that the timing of the original pattern tends to occur more strongly. Some of the generated patterns, even ones that are only vaguely similar to the original, tend to work well harmonically, for example, bar , , , , and . This can be attributed to the transformational effect of the CF space used as a similarity measure because, in these examples, while the note onset similarity is not particularly high, the pitch similarity in the CF is. Repetition of short phrases does not always work but can sometimes be reminiscent of the free structures of improvisation, for example, , , and . This is can be attributed to the transformational effect of the combination of modulus , , and beat space for onset similarity, as is evident in the print-out in Appendix C, Figure C-4.

The similarity weights are user controlled parameters that influence the compositional processes within the algorithm, and their moderate success in the Markov Morph partially inspired the development of the TraSe (Transform-Select) algorithm (Chapter Seven). In the TraSe algorithm, such parameters play a more central role, with a great many compositional techniques being controlled through weights.

A problem with decreased contrast and more options for note selection is that stream loss tends to occur more often. This is because a variety of values for inter-onset to the previous note will exist and only one of these inter-onsets will fit the current time interval since the last note was generated. This is not a problem with the technique, rather the implementation, which has not been refined due to the focus of development having shifted to the different approach of the TraSe algorithm (Chapter Seven).

6.2.3 When Johnny comes morphing home

Although application of the Markov Morph to variation simplified the context and highlighted interesting properties of the algorithm, it was also necessary to test it according to how well it morphed (this being the ultimate aim). Morphing between the British Grenadiers and When Johnny Comes Marching Home (Mathews and Rosler 1969), the performance remained fairly poor, but arguably better than weighted selection or linear interpolation. For this example (~6.12 and Figure 10) Pitch is weighted ; onset weighted ; Markov order ; contrast is moderately high; anticipation is at ; modulus , , , and onset spaces all weighted ; linear pitch weighted ; CF pitch-space weighted .

175

Figure 10 Transcript of a Markov Morph between The British Grenadiers and When Johnny Comes Marching Home. The source and target occupy the first four and the last four bars respectively.

The primary problems facing the Markov Morph are stream loss and small sample space. Stream loss has been only a minor problem in most of the examples shown above, because the source and target are the same – weighted selection can only serve to bring the music back more closely to the original. As well as this, most of the music has had a small set of possible note inter-onsets. However, when morphing between a source and target with a very different set of inter-onsets (Figure 10) it is clear that the problem becomes much more frequent and interferes substantially with the musical outcome. In this example, the margin for stream loss occurrence was reduced through the ‘anticipation’ variable (see Realtime implementation in section 6.1.3, above), thus inducing a more frequent occurrence of stream loss in order to maintain continuity within the music. If this was not done, there would be many more discontinuous breaks in the melody which sound awkward. From the print out, the average stream loss rate over the entire morph was extremely high, at . This meant that many of the notes chosen resulted from weighted selection rather than Markov Morphing. This is an implementation problem which could be easily overcome in future work by making a single selection for the next note whenever the current note has just been generated, instead of making the selection every frame.

176 In statistical terms, the sample space of two short note sequences is too small to be useful. This has been partially overcome by incorporating similarity measures which allow fairly high depths while still retaining a diverse probability distribution. However, there appears to be a fundamental limit to the amount of coherent material that can be generated without incorporating some kind of external source of musical stylistic information. Some preliminary trials using long pieces of classical music with Markov Morph were attempted, but the time and memory limitations with creating a huge note probability distribution on each play-cycle made this intractable with the current algorithm. Another approach would be to use a database and pre-render the probability distributions, however, this direction would shift the focus of research further from morphing as a compositional technique and put the emphasis too heavily on machine learning. Rather than improving the Markov Morph algorithm, the TraSe approach was developed instead as it held the promise of being able to more easily and explicitly incorporate music composition knowledge into the system and seemed like a more realistic model of compositional processes related to morphing.

6.3 Formal evaluation

Two formal evaluations of the Markov Morph took place: a focus group and a ‘focus concert’. The focus group included a prototypical questionnaire and was designed to explore how the morphing systems might best be evaluated, rather than to generate valid data. The focus concert was aimed at assessing the Markov Morph in the realistic context of a concert while applying a focus-group style questionnaire. Live morphing using the Markov Morph algorithm was compared to the live mixing of a professional DJ, who used the same source and target material. The emphasis on live delivery was to provide a degree of realism to the focus concert.

6.3.1 Focus group

The aim of the focus group was to obtain methodological criticism regarding the focus group process itself and the questionnaire used and to a lesser extent, musicological feedback on the transitions generated by morphing algorithms. Smoothness, coherence and ‘danceability’ were posed as quantitative attributes of the morph, while mood required qualitative feedback. While a range of useful improvements to the methods of designing and executing the questionnaire were obtained, one of the sections was invalidated due to technical errors and as for the others, only some fairly obvious musicological evidence was generated.

177 Method

I will now explain the method of the focus group evaluation which included: the algorithms and algorithm settings that were tested, the source and target music, the content of the questionnaire and the procedure that was enacted to obtain information from the participants.

Four different algorithms and parameter configurations were tested:

1. Cross-fade

2. Weighted selection

3. Markov Morph favouring pitch and with Markov order of two

4. Markov Morph favouring pitch and onset evenly with depth of four and low contrast

For the pitch similarity measure, the CC was used because the CF had not been implemented at that stage. The cross-fade was included as a benchmark to represent the ‘industry-standard’ approach currently being applied in the context of computer games or DJ mixing. From the informal tests it was clear that weighted selection would perform poorly. Despite this, it was included so that comparisons could be made between it and the Markov Morph, which, due to stream loss, would fall back on weighted selection some of the time.

The original plan was to also include Markov Morph examples with:

1. Duration weighted maximally and Markov order of two

2. Pitch, duration and onset weighted evenly and a Markov order of two

3. Pitch, duration and onset weighted equally, a low contrast and Markov order of four.

Unfortunately, technical problems occurred while running the test so that duration could not be weighted without incurring null predictions. As a result, the settings were modified after the problem had been discovered, so as to avoid duration. The primary reason for using the different Markov settings was to confirm my own subjective opinion that using a high order and low contrast was more musically appealing than high contrast and low order.

Three different sets of source and target music were created, each within a different ‘style’: electro, psychedelic trance and drum’n’bass. The music was based on the tracks Emerge by Fisherspooner (Dave Clark remix) and Flatbeat by Mr Oizo for electro; Planet Dust by Bad Company and Decoy by Stakka and Skynet for drum’n’bass. It was more difficult to find trance

178 tracks in which note-level composition rather than spectromorphology was the dominant feature. The final examples created for trance were loosely based on Wunderbaum by Highpersonic Whomen, Computers & Microprocessors by Logic Bomb and Solar by Miraculix. All of these tracks were selected from a pool of thirty for each genre of electro, drum’n’bass and trance (ninety in total) and were chosen by scoring each according to how indicative the music was of my own subjective notion of the genre, and how easily it might be replicated. The main sections of the songs were replicated, with a little elaboration, using the LEMorpheus MIDI sequencer, driving the ReasonTM synthesiser.

Throughout the focus group each morph occurred from two different loops, both belonging to the same musical style. Each algorithm and/or parameter configuration was used to create a different morph in each of the three styles. This meant that because there were four different algorithms/settings and three different styles, twelve examples were played overall.

The questionnaire sheet was designed with a page for each morphing algorithm/configuration, with three sections per page for each of the three styles. Each section had the same format of four questions regarding: smoothness, coherence, danceability and mood. Smoothness was defined as moment to moment continuity. Coherence was defined as the degree to which the music sounded as though it was effectively communicating what the composer intended. Danceability was defined as how easy and appealing the music would be to dance to. Mood was defined as the subjective aesthetic experience induced in the listener by the music. The smoothness, coherence and danceability were considered as quantitative criteria which required a rating from to , but additional space was left for comments. The mood was considered as qualitative and a space for descriptive terms was provided for each source, morph and target. At the bottom of each page was space for overall comments on the algorithm.

It was envisaged that, as a practice study, the quantitative criteria could have statistical techniques applied to discover trends from example to example. The qualitative question was included to confirm that the source and target were musically realistic in that they were able to express some kind of mood, and also to examine if the mood projected during the morph was a hybrid of the source and target or if it were something else more common to all the examples regardless of source and target mood and style.

There were six participants in attendance which is not a statistically significant sample size. After signing the consent form, the goal and structure of the focus group was introduced and the ground rules were made clear. The meaning of the criteria was discussed and refreshments were made available. Notes were taken throughout and the event which was recorded onto video-tape. The structure as follows was repeated for each of the algorithms/parameter configurations:

179 1. Three examples with different source and target material were played.

2. The group listened to each example.

3. Each participant answered the questions relating to the morph.

4. After all examples were played, the group discussed the musical properties of the algorithm overall.

Feedback on the questions and criteria were noted during the discussions, and more extensive feedback on this was had at the end of the session.

The ground rules of the focus group (McNamara 2007) were to keep focused, sustain momentum and obtain closure for each question.

Results

The most useful and reliable results from this study was feedback and ideas for improving the definitions, quality of data and methods. As far as the actual data is concerned, the most striking feature was the amount of variation from person to person, from algorithm to algorithm and from style to style. The results were inconclusive.

Overall, the smoothest, most coherent and most danceable style was perceived to be drum’n’bass, and the most popular morphing algorithm was the Markov Morph on pitch and onset with high-order and low distinction. However, this slight trend may be entirely due to a methodological problem that is described further below. The criteria seemed to be closely correlated in that for each example marked, the three scores were rarely different by more than two points (on a scale of one to ten). In general, electro was less liked than trance which was less liked than drum’n’bass, although, there were often exceptions. Most people gave the cross- fade and the Markov Morph with pitch only and depth of two a low rating on all criteria when applied to the electro example, while the trance and drum’n’bass fared moderately well.

From the discussions with the participants at the end of each algorithm/parameter configuration, it appears that different people would listen to different aspects of the music and this would define to a large extent how they perceived the transition, thus partially accounting for the large amount of variation. For example, if the melody happened to be performing quite well while the drums were erratic, some listeners would rate the example very high, without noticing or giving much weight to the drums, which were perceived by them to be peripheral. Other listeners would pay attention to the drums only and mark the example very low accordingly. This highlights the importance of participant instructions.

180 Numerous feedback items regarding the methods employed within the focus group study were obtained:

o Either the algorithms need to be more deterministic and reliable, or a greater number of examples for each need to be played. This controls the chance that unlikely morphs overly influence the results.

o The examples should be recorded prior to the test to ensure that no software problems occur during the test and threaten the integrity of the data. If playing live, the software needs to be tested sufficiently prior.

o Either more specific listening instructions or additional questions regarding what the participant is listening to within the music is needed. This would act as a control for participants tuning in to different parts such as the bass, melody or drums.

o Questions that confirm whether or not the source, target and morph are musically ‘realistic’ need to be clarified. Realism in this case means how likely it would be for the music to occur in a real world context such as a club or a computer game.

o The range for the quantitative criteria should be odd rather than even, so that there is a number that represents the middle.

o Other methods of judging one example against another, such as ranking, need to be investigated.

o The size of the group must be much larger if statistically significant findings are to emerge from the quantitative data.

o The order of examples should be structured randomly so the participants have no expectations that might be projected from a logically structured questionnaire. For example, the questionnaire used followed a progression from cross-fade, weighted selection, Markov depth two and Markov depth four with low distinction. The styles followed a progression from electro to trance to drum’n’bass. This ordering might explain the trend to prefer drum’n’bass.

o The problem of biased expectation could also have been reduced by leaving each example untitled.

o There was a significant drop in volume during the middle of each morph that hindered analysis. A better approach to cross-fade would be through the equal power (logarithmic) curve as used on DJ mixing desks.

181 o The length of the morphs was too short to judge them confidently. The test should have morphs of different lengths or be conducted online so as to allow the participants to replay the morph.

o It was suggested that a way of obtaining feedback could be to predict the effect that the algorithm is likely to have on unheard musical examples or discuss which aspects did and didn’t work.

Summary of formal focus group evaluation

The primary objective of the focus group – to obtain thorough methodological feedback regarding the gathering of musicological morphing data – was clearly achieved. The secondary objective – to gather useful musicological data – was only partially achieved, due to the many problems highlighted above. Despite this, the findings did suggest that morphing was at least competitive with the industry standard of cross-fading and also that the technique of Markov Morphing was promising and at least marginally more successful than the technique of weighted selection. Some of the feedback gained was directly useful in highlighting aspects of the software in need of technical improvements. Overall, the results helped clarify the directions for further development and for more robust, conclusive studies to occur, as shown below.

6.3.2 Focus concert

The objective of the focus concert was to gauge the potential of morphing divergent styles in a live Electronic Dance Music (EDM) context and to receive musicological feedback on differences between morphing and current EDM practice that could feed into future developments. The term ‘focus concert’ was used because it was conducted with much of the same context and structure of a live concert but with the intention and apparatus of a focus group. Participants were asked to provide written responses to live music, some of it performed by me using LEMorpheus, and some of it performed by a DJ using standard mixing and sampling equipment.

The results indicated that, for divergent styles and particularly for long durations, morphing is competitive and, for some individuals, preferable to DJ mixing. Particularly noticeable aesthetic features that were contrasted against standard DJ mixing were the erratic structures of probabilistic morphing and tempo interpolation.

Method

The approach of the focus concert was to play live morphs and elicit written responses from the audience regarding various aspects of the music that was heard. Both long and short morphs

182 through both LEMorpheus morphing and DJ-style mixing occurred. The morphing was performed by myself and the DJ mixing was performed by an experienced professional DJ who was paid for the gig. The morphs were between three quite different styles of EDM. Questions regarding musicological aspects were broad and qualitative while other questions that were directed at gauging the morphing algorithm were quantitative. Comments and feedback regarding methodological aspects of the focus concert approach and the questionnaire content were gathered at the end, both in written form and verbally.

Three divergent tracks were selected and recreated, in an approach that was very similar to that used in the focus group described above. From a pool of thirty for each style of electro, drum’n’bass and trance (ninety in total), the tracks were chosen according to how indicative the music was of the genre and how easily it might be replicated. As well as this, particularly contrasting pieces were chosen so as to test the ability of the morphing algorithms to cross stylistic boundaries. The songs were Flatbeat by Mr.Oizo for electro, Decoy by Stakka and Skynet for drum’n’bass, and a fusion of Wunderbaum by Hypersonic Whomen and Computers and Microprocessors by Logic Bomb for trance. The main sections of the songs were replicated, with a little elaboration, using the LEMorpheus sequencer driving the ReasonTM synthesiser. The examples are (~6.13) for electro, (~6.14) for trance and (~6.15) for drum’n’bass.

The DJ was provided with music with which to practise and familiarise himself, two weeks before the concert date. These were seven-minute versions of the tracks that were recorded through automating the muting and un-muting of various layers within the main section. These are (~6.16) for electro, (~6.17) for trance and (~6.18) for drum’n’bass. My own form of preparation was to adjust the various parameters of the Markov Morph algorithm. The DJ was questioned informally, confirming that the material was of a realistic standard, albeit an odd selection of music to play, back to back.

On the day, an audience of sixteen (only eleven completed the response form) volunteers from various musical backgrounds were situated in a concert room as if to watch an event, but with pens, table-chairs and the questionnaire response form. The background of the project, the goals of the focus concert, the structure and ground rules were all explained before participant consent forms were signed. The planned structure was as follows:

183 1. 7:00 - 03 Get pens, read information pack, sign consent form

2. 7:03 - 05 Answer question 1: Electronic music background

3. 7:05 - 06 Read through the structure and the questions that will be asked, so as to have an idea of what to listen for during the music.

4. 7:06 -11 Section 1: Short transitions

a. DJ mixes from track A to track B to track C (two short mixes)

b. Researcher morphs from track A to track B to track C (two short morphs)

5. 7:11 - 16 Answer the questions for section 1: short transitions

6. 7:16 - 26 Section 2: Extended transitions

a. DJ mixes from track A to track B to track C (two long mixes)

b. Researcher morphs from track A to track B to track C (two long morphs)

7. 7:26 - 31 Answer the questions for section 2: extended transitions

8. 7:31 - Finish

Note that the music was labelled anonymously. In reality, the focus concert went overtime by almost half an hour. As with the focus group, the ground rules reminded the participants to stay focused, sustain momentum and obtain closure for each question.

Before listening to music, the participants filled out a number of questions pertaining to their background experience in EDM and the breadth of the participant’s familiarity with EDM genres. This involved selecting which genres of music were played at the dance music events they attended, including: house, hardcore, techno, drum’n’bass, trance, disco, old school, electro, breaks, industrial, glitch-tech and underground. Room was provided to name genres that were not on this list. The location of the events was similarly gauged, including: pub, night club, friend’s house and outdoor party. The participants were then asked approximately how many EDM events they attended over the last year, as well as how long it had been since they last attended such an event.

Each performance consisted of two morphs/mixes played back to back, starting with the trance track (A), proceeding to drum’n’bass (B) and then electro (C). During the first section of the focus concert, the morphs/mixes that were analysed were short, the DJ mixing within seconds while

184 I morphed in around seconds. Originally, it was planned for the DJ mixing to also be seconds; however, it was subjected to the spontaneity of live performance. The Markov Morphing algorithm with the CF similarity measure was used for the tonal parts, while manually specified note layering was used for the drums. After two short performances the audience participants were asked various questions. Firstly, they were asked whether or not each transition played would ever be heard at an EDM event (separate questions for morphing and mixing) and if not, why not. Following this, it was asked which technique for short transitions would be preferred to listen to at the kinds of EDM events that they attended. A vote for ‘undecided’ could also be given. The participant was then asked to justify the preference. They were then asked to give a rating from one to seven on how applicable the short morphs would be in a live electronic music context. Following this, they were asked to comment on the musical qualities of the DJ mixing and then the morphing software. At the end of the section there was room for any additional comments. The second section of the focus concerts was identical to the first section except that the morphs and mixes were longer and the questions referred to long transitions rather than short transitions. The transitions were to seconds for the morph and, again due to the unpredictable nature of the live performance, seconds for the DJ mix. Some Markov Morph examples which were recorded on a separate occasion are included and are indicative of those that were played at the focus concert (~6.19, ~6.20, ~6.21, ~6.22, ~6.23).

Results

The focus concert has been recorded on video (~6.24). I analysed the background EDM information given by each participant and gave each one an ‘EDM credibility’ rating from to . This was based primarily on frequency with which they attended EDM events and the range of genres they were familiar with. The average grade was , with a standard deviation of . The minimum was and maximum . This was used to weight results, as discussed further below, but without substantial effect.

When asked if they would expect to hear DJ mixing similar to that which was heard during the study at an EDM event, participants unanimously said yes.

When asked whether, at an EDM event, they might hear a short morph similar to the one played, said yes, said no and was undecided after the short morphs.

When asked whether, at an EDM event, they might hear a long morph similar to the one played, said yes, said no and were undecided. When asked to comment, a common observation was that DJ mixing always maintains tempo during the transition in order to perform beat matching, while the morphing software interpolates it. For example:

185

186 “It seemed like the DJ actually beat-matched the two tracks, so that the tempo of the beat remained the same throughout - whereas it sounded like the morphing software sped up and one slowed down the other so that it ended faster than it started. This could be good sometimes but not always.”

For short transitions, no one preferred morphing, preferred DJ mixing and were undecided. Most of those who were undecided liked both, but for different reasons. For example:

“I liked both - I like to watch skilled DJ's and find their mixing choices interesting on a personal level. The morphing had some cool rhythmic stuff happening - also, it goes beyond what I'm used to hearing in mixing and so was interesting in that respect.”

It should be noted that, during performance of the short morphs, the DJ spontaneously increased the length of his mixes, meaning that, for the short morphs, the DJ mixing was perceptibly longer than the LEMorpheus morphs, which is likely to have influenced the preferences. This induced some comments, for example:

“Perhaps the DJ mixing appealed more because the excerpts were longer. It was difficult with the very short morphing excerpts to compare it to the DJ.”

For long transitions, the preferences were different to those for short transitions. preferring the morphing, preferring the DJ mixing and were undecided. When these preferences were weighted on the ‘EDM credibility’ of the individual participant, they did not change significantly. One comment from a participant who preferred the morphing was:

“Wow, very impressive. There is so much added musicality. Would keep people wondering where each sound is from.”

The comment about the sounds relates to the fact that the synthesiser creates new sounds when rendering new MIDI notes, whereas this is unable to occur through the DJ-mixing.

Short morphs were rated only moderately on applicability to real electronic music contexts, scoring a average with standard deviation of , while the long morphs were judged highly applicable, with a rating and standard deviation of . Weighting these results on EDM credibility did not change them significantly.

Discussion

The fact that all participants agreed the DJ mixes were similar to those in real EDM contexts implies that the benchmark comparison was well-founded and likely to apply to real world

187 settings. However, because the tracks were selected according to how different they were, not how similar they were, the study purposefully ignored the ‘art of track selection’ which is part of the skill of being a DJ. From conversations afterwards it appear that the question could have been reworded in more direct language, such as “do you think the DJ demonstrates technical ability similar to the professional DJs you have heard at EDM events”. Future focus concerts could morph between songs that were selected by the DJ, however, tracks that are similar tend to be much less challenging for the morphing software to deal with and so it is unlikely that such a test would provide useful new musicological insights that improve the software. Another approach could be to allow the DJ to use their own record collection to ‘construct’ a lengthy morph, or to raise the benchmark further by employing a producer rather than a DJ.

The fact that people generally did not hear morphing-style transitions in the EDM events that they attended indicates that morphing is a practice that is alien to current methods. From comments, two factors are primarily responsible: the erratic structure produced by the unpredictable probabilistic techniques and the ramping of tempo due to interpolation. Future algorithms might use a database of musical patterns against which the structure can be fitted, or any number of improvements (see 6.4) while the option of tempo matching rather than interpolating will need to be added.

The preference the audience showed towards morphing in long transitions, as well as the applicability ratings, indicated that morphing is a viable new approach. It is particularly interesting to consider the high preference ratings even though morphing was clearly judged as being atypical to the EDM contexts the participants were familiar with, implying that there is a musical niche that may be filled. The preferences directed towards the DJ mixes for the short morphs should be tempered by fact that the DJ mixes were longer and comments that cited length as a positive factor. However, before the ‘long’ mix, the DJ verbally expressed his misgivings at attempting a difficult mix which may have suggested a kind of dominance of the morphing software in the minds of the participants.

Conclusion

In conclusion, it is clear from the results that morphing of EDM is applicable and competitive with status quo EDM mixing techniques when applied to divergent styles of music. Despite this, it results in an unfamiliar aesthetic that could undermine people’s expectations and subsequent judgements, mainly due to the interpolated tempo during transition and the aleatoric nature of the Markov Morph. While adding an edge of realism, the focus concert approach was difficult to control. Ideas for improvements included:

188 o A more controlled environment, either with better rehearsal or in a context that is not live.

o More explicit and detailed wording of instructions and explanations that seeks qualitative rather than quantitative responses.

o More time for the participants to properly word their responses.

o Putting the section on participant background at the end of the survey so they do not feel compelled to live up to their own descriptions.

o Raising the benchmark by comparing the morphs with human composed and produced transitions or other performance software such as Abelton live! rather than DJ mixes.

o A larger body of participants to add variety and validity to the results.

6.4 Improvements to probabilistic morphing

There are a number of possible improvements to the current algorithm, including: phrase detection, inter-onset similarity measure, note loudness/dynamic (MIDI velocity) similarity measure, multiple note similarity comparisons, metric constraints, better handling of stream loss and sample size.

Musical phrases that are generated during the morph are currently spliced together from play cycle length chunks of music from the source or target. Musically, there is no reason for phrases to be composed of these segments and the chance of a short musical phrase ending prematurely is higher than the chance of the phrase ending at a natural end point. This problem might be improved using phrase detection to find the start and end points of phrases and purposefully increase the chances of selecting notes from the phrase – the value of such a technique would depend on the number of phrases used in the source and target music.

Currently, the similarity measure from which the probability distribution is constructed uses a combination of pitch, onset and duration similarity. Perhaps more suitable notes might be favoured by the probability distribution if some additional measures were combined within the current similarity function, such as inter-onset and dynamic/velocity. While the current onset measure, that assesses the onset similarity relative to the various loop lengths, is useful for enforcing a kind of metre, an inter-onset measure would increase the continuity of phrases, regardless of their position within the loop. The inter-onset similarity function would be similar to the duration similarity function described in Equation 10. Including a similarity measure for

189 dynamic would provide an additional dimension of detail. The appropriate function for dynamic would be a version of the linear similarity shown in Equation 5. The range would be to , and the magnification would likely be smaller than , as large distances generally seem more common in velocity than in pitch.

Currently, some detail is lost when comparing polyphonic note-groups. The approach takes only the most similar notes within the note-groups and the similarity rating of these is used to represent the similarity rating between the two note-groups as a whole. This measure was used because it ensures that when two different note-groups have exactly the same notes, they will be considered to be the same. This is not the case using some other measures, for example, taking the average note similarity between all possible note comparisons. Since moving on from the Markov Morph, I have developed a Nearest-Neighbour distance (detailed in the following chapter) which also has the ability to more accurately judge two polyphonic note-groups, while providing an additional level of accuracy. A trivial future development that would assist the ability of Markov Morph to deal appropriately with polyphonic music could be to apply this Nearest Neighbour distance to note-group comparisons.

Currently, the user increases or decreases the ‘contrast’ variable at will, effectively to control how similar the notes need to be to the seed to be candidates for playback. Instead of the user controlling the contrast, with it remaining fixed for much of the time as a result, it should automatically increase if the selected source or target is the pattern that generated the seed and decrease if it did not generate the seed. This way, connections based on similarities between very different source and target patterns would be accentuated.

Musically, perhaps the biggest problem with the Markov Morph is the sense of randomness in the interjection of notes. Therefore if some kind of grid was applied that constrained notes to start times that are coherent within a specified metre, this problem might be alleviated somewhat.

As mentioned above, the case of stream loss occurs when the largest inter-onset interval in the selected source or target is smaller than the interval between the previously generated note and the current frame. The current strategy for stream loss is to revert to weighted selection. Perhaps a better strategy would be to predict a note only once each time – after a note has been created, rather than doing it again each frame. This would ensure that a note which has been selected according to the probability distribution is used; however, it would be less responsive to realtime changes. The realtime responsiveness could be maintained if combined with a ‘listener’ that polls for significant change, triggering a re-prediction when they occur.

190 A quite severe problem is that the amount of musical information provided to the algorithm is usually not enough to build a working probabilistic model with a depth/order greater than two, depending on the amount of repetition in the material. Statistical techniques such as Markov chains are more commonly performed on large databases. This problem has been reduced by the use of a continuous similarity measure, however, it remains significant. To overcome it more effectively, a database of music could be incorporated from which a more comprehensive model is built. The Continuator provides a good example of how this could be implemented efficiently (Pachet 2004). The reliance on music of different styles could be weighted differently by the user.

Fundamentally, the Markov technique, as currently implemented, reflects only the musical surface and is fundamentally different to my own compositional practice, those of other composer/producers I know, and to compositional techniques described in music textbooks. Because of this, rather than developing the increasingly complex layers mentioned above, it seemed more fruitful to first attempt a more convincing morph that was able to directly incorporate compositional techniques without the restrictions of realtime operation. It was also noted that no evolutionary approaches to morphing had been attempted before. If this processor-intensive technique proved successful, realtime adaptation could be explored later. This new approach was the TraSe morphing algorithm, which is detailed in the following chapter.

6.5 Summary of the probabilistic morphing algorithm

To summarise, a probabilistic morphing algorithm has been developed that operates through the weighted selection of source and target which is used to generate a probability distribution for note generation conditioned on previous output as a seed. Each play cycle, the probability distribution is used to generate a note and if the inter-onset distance between the most recent output and the currently generated note is correct, the note is played. The length of the seed and Markov depth is determined by the user. The probability distribution is generated by comparing the similarity of the seed with each segment of the same length in the selected source or target note-group sequence.

The Markov Morph uses the similarity measurements to obtain continuous probabilities, thus compensating for the statistically insignificant number of note-groups in the source and target. The similarity between note-groups is derived from a weighted combination of similarities of pitch, duration and onset. The pitch similarity is a weighted combination of distance in linear pitch space, the CC and the CF.

191 Informal evaluation of the Markov Morph demonstrated how the level of variation and various stylistic aspect of the music can be controlled by the user. The first formal focus group evaluation provided a number of ideas and refinements for future tests. The second formal ‘focus concert’ evaluation benchmarked the software against a professional DJ in a live performance context. The Markov Morph was competitive, particularly with long durations between source and target, possessing a somewhat erratic and unusual style.

Future research in probabilistic morphing could incorporate a number of refinements to the current Markov Morph. This includes a more musical approach to phrase lengths, rather than using the play cycle length. More similarity measures such as inter-onset distance and dynamic are needed. Polyphonic note-groups could be compared more accurately using a Nearest Neighbour similarity measure. Automatic adjustment of the contrast of similarity measurements is needed to obtain a consistent level of variation in the probability distribution. Additional constraints of music representation, particularly for rhythm, could help to reduce the erratic nature of the music. Stream loss could be avoided if the notes were predicted only one at a time, rather than each play cycle.

Rather than implementing and testing these potential improvements, however, an even more novel approach was envisioned, involving evolutionary processes. After some initial scoping, this approach was found to be practical and the remaining research effort was directed towards it, as explained in the following chapter.

192 7 Evolutionary morphing algorithm

After probabilistic morphing, the focus of research turned to evolutionary morphing, which was explored through the development of the TraSe (Transform Select) algorithm. The hypothesis behind TraSe is that effective morphing can result from a series of compositional transformations applied to the source. TraSe allows greater control over stylistic elements of the music than the previous morphing algorithms, through user-defined weighting of numerous compositional transformations. Informal evaluation of TraSe involved subjective analysis of a range of audible examples. The formal evaluation of TraSe was mostly qualitative and occurred in the controlled environment of an online questionnaire. TraSe deviates substantially from any other approach in the field of compositional morphing. As a morphing technique, it directly fulfils the research objective to develop new techniques for automated and interactive compositional morphing. The music generated by TraSe was a substantial improvement on the parametric and probabilistic algorithms, in some cases being seen as more creative than a human composed benchmark.

As an evolutionary approach, TraSe is related to previous algorithmic composition systems such as GenJam (Biles 2002), Vox Populi (Moroni, Manzolli, Van Zuben and Gudwin 1990) and others (Marques, Oliveira, Vieira, and Rosa 2000; Towsey, Brown, Wright and Diederic 2001; Miranda and Biles 2007). A bibliography of evolutionary computer music is maintained by Al Biles (2007). TraSe is distinguished from other evolutionary music systems by features that are designed specifically for morphing.

In terms of modelling music composition and creativity, the evolutionary approach is clearly a trial approach to composition. This is evidenced by the generation of a large number of candidates in a selection pool, from which only one is selected according to specific criteria, in this case, the similarity with the target note sequence. This can be compared with the abstract approach of parametric morphing and the heuristic approach of probabilistic morphing.

An overview of the TraSe morph algorithm is provided in section 7.1. More detail on how the algorithm transforms and selects the musical sequences is outlined in 7.2. Section 7.3 specifies each of the nine different compositional transformations that are used to transform the source music. Both informal and formal evaluations of the music are provided in 7.4. The informal evaluation highlights various features of the algorithm through the analysis of musical examples. The limits of the algorithm are tested with many randomly generated source and target samples. Subsequently, the formal evaluation is presented in the form of a comprehensive critical listening questionnaire that was developed and applied to asses the musical coherence and stylistic qualities of the morphs. The questionnaire found that music composed by a human was usually

193 more acceptable, however, there were a number of notable exceptions. The morphs were almost unanimously said to be applicable to electronic dance and computer game music. The results also challenged the feasibility of morphing outright; in particular, the fundamental notion that morphs should be ‘smooth’ or ‘continuous’.

TraSe has by far the highest time complexity compared to the other algorithms at , due to the exhaustive searching. Because of this computation load, the morphs need to be rendered prior to realtime operation. Although realtime interactivity in playback is possible through control of the morph index, changes to the source and target music do not influence the morph in realtime.

A number of possible extensions to the current evolutionary morphing algorithm have been identified and are discussed in 7.7. Notably, algorithm designs that would significantly reduce the complexity from to were comprehensively scoped out, despite not having been implemented. This means the algorithm has definite potential for true realtime operation in the future.

Overall, TraSe is the most novel and effective compositional morphing algorithm developed through this study. Despite this, there is still much to be done before it can consistently compare favourably to a human composer, particularly in terms of coherence. The final results were constructive and produced a number of ideas for future research.

7.1 Overview

TraSe takes two note sequences, source, , and target, , and produces morphed material, , which is a hybrid transition between the two. The morph, is a list of note sequence loops that I call frames, the first and last of which is and respectively. If we let be the length of the frame list, and . The frames of the middle constitute a sequential progression.

During playback, the morph index, , determines which frame in the series is playing. Letting be the index of any frame, ranging from , and noting that is normalized between and , will hold the note sequence that is looped when . Logically, the morph will be smooth if each frame is somehow made similar to its neighbours. Smoothness is the quality of continuity, whereby each segment is perceived as being more related to the points directly ahead and behind than the others which are further away.

The musical representation includes the loop length, note onsets, durations, dynamics, scale

194 degrees/passing notes, scale and key; as per the Degree Passing note (DePa) scheme described in Chapter Four. There are two separate TraSe algorithms that operate in parallel – one for key and scale and another for note-level data (note onset, duration, dynamic and scale degree/passing note). I refer to the former as a key/scale morph and the latter as a note morph. Both the note morph and key/scale morph have a separate which can be of different lengths. Unless stated otherwise, the TraSe processes described in this chapter are for the note morph.

During playback, the current frame of the key/scale morph and the note morph is combined to produce output with a standard note representation of pitch, duration, onset and dynamic. Having the key/scale and note morphs separate affords more specific control over the tonal elements of the morph. Early versions of TraSe that did not separate key/scale were more difficult to use.

If and are different lengths, each frame will have a length equal to the lowest common multiple. For example, if was beats and was , the loops in each frame of would be beats long. Requiring the frames to be the same length is practical; a uniform length affords automated comparisons between sequences, due to their notes occupying an identical space. It also affords the combination of material.

Recall that is an index to . TraSe sequentially fills each frame of , starting with , and finishing with . Each frame, , is created by passing the previous frame, , through a chain of compositional transformation functions: let this be the list . The functions in the chain perform musical transformations that a composer might attempt, for example harmonise creates or removes harmonies at particular intervals, while rate speeds up or slows down the music by particular commonly used ratios, looping when necessary.

To generate the frame , the first transformation, , is passed the previous frame, . The second transformation is passed the output of the first, and this continues until the last transformation. The last transformation is fed the output of the second last transformation to create . Letting equal the length of , or , this means: . TraSe stops generating frames when the current frame is judged as being similar enough to the target, , by a variable for ‘cut-off’ that is specified by the user. Each frame is part of a sequence of frames that is the morph: .

The evolutionary component of TraSe is in the fact that each transformation produces a range of different sequences from which only one is selected, through a comparison with the target sequence. This is essentially a Genetic Algorithm (GA), as the transformations are ‘mutations’

195 and the comparison to the target is a ‘fitness function’. This transform-select process is quite complex and will be explained in more detail in the following section 7.2. Allowing specific compositional transformations and dissimilarity measures (see 7.3) to be designed and plugged into the algorithm enables elements of compositional style to be specified. For now, a simple high-level overview of the TraSe process has been summarised in pseudocode:

// inputs: S is the source note sequence T is the target note sequence C is a list of compositional transformations that provides data and functions used in the TRANSFORM-SELECT process. cutoff defines how similar the current frame needs to be to the T to end the process // functions: MAKE-SAME-LENGH Loops the inputs until they are as long as the lowest common multiple of their lengths so that they are the same length FIND-DISSIMILARITY Rates the dissimilarity of two inputs, from the same (0) to most dissimilar (1) TRANSFORM-SELECT Applies a transformation and selects an output, more similar to T than the input. // output: M List of note sequences where the first item is the same as S, subsequent patterns are more and more similar to T and the last item is the same as T function TRASE (note sequence S, note sequence T, transformation list C) returns note sequence list M { MAKE-SAME-LENGTH(S, T) i = 0; // index of current frame in M M[i] = COPY(S) WHILE( FIND-DISSIMILARITY(M[i],T) < cutoff ){ i = i + 1 // increment index to the next frame M[i] = COPY(M[i-1]) FOR(j = 0; j < LENGTH(C); j++) { M[i] = TRANSFORM-SELECT(C[j], M[i], S, T, i); } } return M } Figure 1 Pseudocode overview of TraSe.

7.2 Transforming and selecting

Each transformation has a set of possible parameter configurations. A pool of candidate note sequences are created using each possible parameter configuration. A single candidate is then selected, using a fitness function that compares the candidate note sequences with the target note sequence.

Recall that the number of transformations is , that is . Let be an index into the list of transformations, , ranging over . For , there are a related set of parameter configurations, . The parameter configurations within are used by to create transformed note sequences within a ‘selection pool’; let it be . The item in each parameter set is “bypass”, , thus enabling the unmodified input to also be an item in the pool.

196 For example, the rate transformation multiplies the start-time of each note by a ratio from the set and . If rate is the transformation, that is, , the corresponding parameter set is: . While rate has a single parameter, more complex transformations include particular combinations of multiple parameters. Having a fixed set of values may seem like an unnecessary limitation when dealing with parameters that are otherwise continuous; however, it also affords precise control over parameter values and thus musical style. As well as this, the parameter sets can be any size and requiring discrete specification encourages careful consideration of the musical ramifications of each parameter value.

Each transformation has an input note sequence; let it be . Let be an index to parameters in the set and the corresponding output in , that is, . Thus, the creation of the note sequence in the pool for the transformation is: . Continuing the rate example, where , creating the note sequence in the selection pool ( ) would be: . In this example, the start time for each note would be multiplied by and the result looped four times to make it the same length as the input. The diagram below provides an example of the complete selection pool for this rate example:

Example of an input to rate, . Zeroth pattern in the selection pool, where (bypass).

First pattern in the selection pool, Second pattern in the selection pool, where . , where .

Third pattern in the selection pool, The fourth pattern in the selection pool, where . where .

The fifth pattern in the selection pool, The sixth pattern in the selection pool, where . where .

Figure 2 Each of the seven note sequences that would be in a selection pool for rate (j=1), given the original (top-left).

197 A fitness function determines which note sequence in the selection pool, , is selected for output. Let the index of this selected note sequence in be denoted by . The input to the first transformation, , is the previous frame (in the first cycle, this will be the source). Recalling that is the number of frames and that , this means: . For transformations other than the first in the chain, the input is the selected output of the previous transformation in the chain, that is, for , . Each frame other than the first is derived from the selected output of the last compositional transformation in the chain, that is, for , . The first frame is the source music: .

7.2.1 Selection through dissimilarity measures

Each element in the selection pool is compared with using a dissimilarity measure1. The selected note sequence, , has a dissimilarity rating that is the closest to a particular target value. The process for determining the target value of dissimilarity with is described further below. The application of user-defined weightings and “convergence” settings are also involved.

The dissimilarity measure for a particular transformation , denoted by , is designed to complement that transformation. For example, if , calculates dissimilarities between the inter-onset envelopes, which are directly affected by the rate transformation. Each dissimilarity measure takes two note sequences and returns a value between (similar) and (dissimilar).

Each dissimilarity measurement between elements in the selection pool and the target is factored by a user controlled weight. The weights allow various parameter configurations to be unnaturally favoured which means the style of compositional transformations used throughout the morph can be influenced by the user.

1 Discussing the measures in terms of ‘dissimilarity’ rather than similarity is an arbitrary choice, however, it is useful as ‘dissimilarity’ is analogous to ‘distance’, which is a natural method of comparing two values. The dissimilarity measure is not a distance metric in the formal mathematical sense, because triangle inequality is not necessarily upheld.

198 Let hold the user-defined weights. Recall that is an index to the items in the parameter set and the corresponding selection pool . Let each item in the selection pool have a related weight, . Let hold the result of the dissimilarity between and , scaled by . This means: . Each measure in the set of weighted results can be sorted from lowest to highest. Let the result of this sorting be . Thus, within this list, refers to the smallest value (least dissimilar), and refers to the largest (most dissimilar).

The selection pool can be arranged in the same order, so that refers to the that minimises .

7.2.2 Speeding through the scenic route

If the goal of TraSe was to converge with in as few frames as possible, then it would make sense to not use weights, and always choose , however, this is not the case. The aim instead is to find a musical progression and this involves taking the “scenic” route. The transform speed is a user-defined parameter, , which influences the extent to which the number of frames, , is minimised, for each of the transformations in the chain. is between , for the maximised or “slowest” setting, and , for the minimised or “fastest”.

The inverse of represents how many frames, , can be expected using a single perfect transformation. A perfect transformation is a theoretical transformation that has the capacity to produce an infinite number of patterns in the selection pool, with an even spread of dissimilarity to throughout. In a perfect transformation, when , there will be cycles before convergence. In practice, most transformations are imperfect and with some combinations of transformations and settings, convergence is never reached.

Target dissimilarity and transformation speed

From the user-defined transformation speed, , a target value of dissimilarity, let it be , needs to be established. This involves estimating how much the dissimilarity should be reduced by each step to ensure the specified number of steps, , and multiplying it by the number of steps that have occurred already, , to find the appropriate target, , for this step. The estimate of the change in dissimilarity required for a single step is obtained by normalising between the difference in least dissimilar, , which would be with a perfect transformation, and the rating of the current note sequence that is the input, (recall that ). Considering that it would be unfeasible to pick a target that is below the lowest dissimilarity, and recalling that

199 is the current frame, is defined thus:

.

Equation 1 Defining the target value of dissimilarity . is the user-defined ‘speed’ ( is

slowest). is the lowest dissimilarity rating. is the dissimilarity rating of the unmodified input, that is, the previous frame.

This definition of can be understood more intuitively thus: when the speed is set to slow,

, it becomes . In the perfect case, where and

, this becomes . Note also that convergence does not occur in a single frame.

With this target (the most dissimilar), the algorithm will never converge, which is appropriate considering that as approaches , (the number of steps) will approach . If the speed is set to fast, we obtain , which, in the perfect case, becomes

.

In this case, for , the target similarity will be . This means that, in the perfect case, the algorithm would converge in a single frame, confirming that .

Aiming for target dissimilarity while maintaining consistency

An appropriate (the index of the selected sequence) will minimise the difference of measured dissimilarity, , to the target dissimilarity, , while at the same time minimising the dissimilarity of the selected sequence to the source, . This ensures that in situations where the candidates are rated equally close to , the one that is the most similar to will be selected, providing a greater sense of musical continuity. This will involve interpolating between the consistency with source and target for the currently desired dissimilarity. The difference between and the target dissimilarity is: . The distance from the source is . Let be a user-defined weight, called ‘tracking VS consistency’, which controls the balance between tracking , when , and being consistent with , when . This effectively controls the influence of or . Usually, so the difference to the target dissimilarity is the principal component. All together, is determined thus:

200

Equation 2 The index to the selected note sequence, , is determined by minimising the difference of the dissimilarity of that note sequence with the target dissimilarity (first term above), while minimising the dissimilarity between that note sequence and the source (second term). The user can control the influence of each of these terms using - the ‘tracking VS consistency’ variable.

Capping the number of transformations per cycle

The ‘mutation limit’ is a cap that can be placed on the number of transformations applied in each iteration of the transform-chain and is controlled by the user. This should not be confused with the ‘dissimilarity cutoff’, which is essentially a threshold of fitness which stops the whole process, rather than just the transform-chain of a single cycle.

During each iteration of the chain, the number of transformations that result in a sequence other than the input, , are counted. If the count exceeds the mutation limit, the chain finishes prematurely. Without the mutation limit, the extent to which the transform-chain reduces the dissimilarity of the input to is unpredictable, as each subsequent transformation starts from the point at which the previous transformation left off. This effect can be reduced by the mutation limit. For example, a mutation limit of will guarantee that only one transformation is used per cycle. Although this is currently sufficient, other improvements to the chain structure have been envisaged and are discussed below (7.7.3).

7.2.3 Putting it together

In pseudocode

A summary of the Transform-Select function of the TraSe algorithm is provided in pseudocode below (Figure 3):

201 // input: C a compositional transformation. C.TRANSFORM takes a note sequence and a parameter setting and returns a transformed note sequence. C.P is an array of different parameter settings that can be passed to C.TRANSFORM. Note that one of the parameter settings is “bypass”. C.D measures the dissimilarity of two note sequences, returning values between 0(most similar) and 1 (most dissimilar). I is the input note sequence that will be transformed by C Src,Tar source and target note sequences(read-only). i the number of frames that have been generated so far // user-defined global parameters: W an array of weights used to favour certain parameter settings in C.P over others s controls convergence speed by influencing the target level of dissimilarity. v “track VS consistency”: controls the balance between minimising the difference of the dissimilarity rating of the output to the target dissimilarity and minimising the dissimilarity of the output to the source. // output: Out a transformed sequence. If W is not used, C.D(Out, T) <= C.D(I, T) function TRANSFORM-SELECT (transformation C, note sequence I, note sequence Src, note sequence Tar, int i) returns Out {

SP = new note sequence array, length of LENGTH(C.P) // the selection pool SP[0] = COPY(I)

R = new double array, length of LENGTH(C.P) // the ratings R[0] = C.D(I,T)

min = 0; // the index of the lowest rating

FOR ( j = 1; j < LENGTH(P) ; j++ ) {

SP[j] = C.TRANSFORM(COPY(I), C.P[j]) // add a transformed sequence R[j] = C.D(SP[j], T) // rate the transformed sequence R[j] = R[j]*W[j] // weight it

IF(R[j] < R[min]) min = j // update the index to the smallest rating }

// derive the target value of dissimilarity to T that is to be aimed for t = MAX(R[min], R[0] – i*(R[0] - (1 - s)(R[0] - R[min])))

dmin // index of the pattern to be returned

FOR( j = 0; j < LENGTH(P); j++) {

// factor in the distance to t and dissimilarity with source R[j] = v*ABSOLUTE(R[j] - t) + (1 - v)D(SP[j], S)

IF(R[j] < R[dmin]) dmin = j // update the reference to the minimum }

Out = SP[dmin] return Out } Figure 3 Pseudo-code description of the Transform-Select function that is used in the TraSe algorithm.

Techniques for dealing with too many frames

Often there are too many frames in , due to the transformations in being far less able to converge than the theoretical perfect transformation. Imagine if the required morph is beats

202 long and frames were generated, or, . This means that, in the default approach, the entire morph will be created by beat segments from each of the frames, as , which is unlikely to exhibit much coherence.

A simple method to counter this is to quantise the morph index, , so that only a particular number of frames will be played. For example, if the user specifies sections, will only ever be one of the four numbers in the list: . The is added so that will be central to each section, not at the beginning of it. In this example, the only frames that will be played are , , and . From each of these frames, beats will be played, which allows more coherence than the beats that would occur without quantization.

However, with so many cycles separating each frame ( in the example given), the sense of a logical progression may be lost. This can be minimised somewhat with carefully designed compositional transformations. For example, the add/remove transformation (see 7.3.8) is a special transformation that is always included at the end of the compositional transformation chain in order to ensure this. Add/remove is designed in such a way as to guarantee an output, , that is closer to the target, , than the input, . In fact, this transformation is necessary due to the potential lack of convergence of the algorithm.

7.2.4 Summary of transforming and selecting

In summary, TraSe involves passing the source through a series of transformation and selection cycles until it becomes identical, or almost identical, to the target. The architecture provides the ability to favour certain parameter configurations which in turn allows greater control over musical style. The limitations inherent in TraSe are closely aligned with morphing:

o Smooth transitions are afforded through the fact that any one frame is more closely related to its neighbours than any other, in terms of the number of applications of the transformation chain.

o Transitions that are coherent for a particular style of music are afforded through the ability to include and favour transformations and parameter configurations that are indicative of that style.

o Musical material beyond source and target is not required.

During the above explanation of transforming and selecting, I have alluded to various

203 compositional transformations such as rate and add/remove. The following section will detail each of the different compositional transformations that have been developed and are implemented in TraSe.

7.3 Specific compositional transformations: their process, parameters and dissimilarity measures

The transformations that have been implemented are described here in order of their position in the transformation chain: divide/merge, rate, phase, harmonise, scale pitch, inversion, octave, add/remove and key/scale. All of these perform large scale transformations to the note sequences, except for add/remove, which deals with individual notes. Key/scale morphing is a special transformation which operates in parallel to the other transformation, using key and scale data rather than note sequence data. This set of compositional transformations were chosen due to their capacity to affect musical dimensions such as note density, rhythm, harmony, melody, register, and tonality. There are no doubt other compositional transformations that may improve the performance of the algorithm, however, the current set is sufficient.

Each transformation has a range of parameter configurations which are used to generate a pool of potential note sequences for that transformation, each time it is called. The candidate note sequences are rated according to their dissimilarity with the target, using a dissimilarity measure which is specific to that transformation.

7.3.1 Divide/merge

The divide/merge transformation was envisaged as a technique for affecting the note density of the input without dramatically altering the music. Divide/merge has five parameter configurations, apart from ‘bypass’, and uses a ‘Nearest Neighbour’ dissimilarity measure to compare the transformed items in the selection pool with . The five configurations are: merge forwards, merge backwards, split , split and split .

‘Merge forwards’ iterates through the note sequence, merging notes that overlap into one, retaining the pitch and dynamic of the earlier note. The end-time of the earlier note will increase to become the end-time of the later note and the later note will be removed. ‘Merge backward’ is similar, except that it proceeds from the last note to the first and the scale degree and dynamic are copied from the later note, deleting the earlier note.

204 // inputs: I the input note sequence // output: M the note sequence that is the result of the merging process on I MERGE-FORWARDS(note sequence I) returns M { M = COPY(I) Note curr, next // a reference to the current and next notes

FOR(int i = 1; I < LENGTH(M); i++) { curr = M[i-1] // update the current note next = M[i] // update the next note

// see if the duration of the current note exceeds the inter-onset IF(curr.onset + curr.duration >= next.onset) {

// set the duration of the foremost note SET-DURATION(curr, next.onset – curr.onset + next.duration);

REMOVE(M, next) // remove the note that was merged } } } Figure 4 Pseudocode of the Merge-Forward algorithm

‘Split’ finds the longest note and splits it in two. The three different parameter settings, , and refer to the point at which the note is split. For example, splitting a whole note (four beats) with would turn it into one crotchet followed by a dotted minim. The scale degree and dynamic for both of these notes would be the same as the original.

To compare each note sequence generated by the difference parameter settings, Divide/merge uses the NN dissimilarity measure (described below). ‘Average number of notes’ was trialled originally, as divide/merge was aimed primarily at controlling the number of notes. However, this was not discerning enough to enable convergence with the target and so NN has been used satisfactorily instead.

A more flexible and natural design for divide/merge might be achieved through an envelope representation, as described in the parametric morphing algorithm chapter. With the input as envelopes, merging would be akin to smoothing, while dividing would be similar to interpolating between the points.

The Nearest Neighbour (NN) dissimilarity measure

The Nearest Neighbour (NN) dissimilarity measure is the most thorough of all dissimilarity measures within this research. For each note in the two note sequences being compared, the NN measure finds the distance of the closest (neighbouring) note in the opposite sequence. The final distance returned is the average of all such distances. This is an inefficient process and some ideas for optimisation are explained below (7.7.6).

205 The process for NN is bidirectional, comparing the target sequence of notes, , to the input, , as well as to . These two calculations will be referred to as backward and forward respectively. The bi-directionality is necessary because the NN of one is not necessarily the NN of the other. This is made clear in Figure 5 where the first note in is closest to the first note in but the first note in itself is closer to the second note in the input than the first:

Input notes 1 2 distances 3 1 1 3

Target notes 1 2

Figure 5 A bi-directional NN comparison. The red box holds two notes (red 1 and 2) from the input sequence. The blue box holds two notes (blue 1 and 2) from the target sequence. The blue arrows show the NNs from the backward computation and the red arrows show the NNs from the forward computation. The distances between NNs are adjacent to the arrows.

Let and be two note sequences and let be the average distance between each note in and its NN . Let be an index to that ranges over ( indicates length) and let be an index to that ranges over . Let be a function that finds the distance between two different notes primarily in terms of pitch and note onset (described above in 7.3). The function is thus:

Equation 3 The average distance between each note in and its NNs in .

The forward calculation will be and the backward calculation will be . Letting be the NN dissimilarity measure, we have:

Equation 4 The NN dissimilarity measure.

The note distance function for the NN measure is a combination of distance between note onsets within the loop and distance between “octavised” scale degrees. That is, the DePa scale degree, , plus the octave for that pitch, , multiplied by the number of DePa scale degree

206 steps per octave, : . This scheme favoured the use of source and target note sequences within the same octave. In future work it would be simple to overcome this by calculating the octave and scale degree distance separately and combining them afterward, weighted on a user-defined variable. Other measures that compare the similarity of scale degrees and passing notes could be applied. Inclusion of duration and dynamic and the Circle of Fifths (CF) distance, as explained in the previous chapter, would also be simple.

Pseudocode of the NN dissimilarity measure is included below:

// input: M A note sequence being used for comparison (often the result of a transformation) T A second note sequence being used for comparison (often the Target) // global functions: FIND-DISTANCE takes two notes and returns the distance between them in octavised scale degree and note onset. // output: avd The average distance to nearest neighbours NND(M, T) returns avd { avdM, avdT, avd// stores average NN distance for M, T and the final average

FOR (j=0; j < LENGTH(M); j++) { // find the neighbour of each note in M, in T

min_d // reference to minimum distance

FOR (i=0; i < LENGTH(T); i++) { distance = FIND-DISTANCE(T[i], M[j]) IF(distance < min_d) min_d = distance } avdM += min_d // accumulate the average }

avdM = avdM/LENGTH(M) // normalise

FOR (i=0; i< LENGTH(T); i++) { // find the neighbour of each note in T, in M

min_d // reference to minimum distance

FOR (j=0; j< LENGTH(M); j++) { distance = FIND-DISTANCE(T[i], M[j]) IF(distance < min_d) min_d = distance } avdT += min_d // accumulate the average }

avdT = avdT/LENGTH(T) // normalise

avd = (avdM + avdT)/2 // final average } Figure 6 Pseudocode of Nearest Neighbour dissimilarity measure

7.3.2 Rate

Rate multiplies the onset of each note by a ratio, removing notes that exceed the loop length, or looping the sequence as many times as needed to fill the loop. The ratios are based on patterns from MEM (Mainstream Electronic Music) genres: , , , , and . For example, a straight beat

207 transformed by or yields a common three against four rhythm. Transformations by , and are notorious in breakdowns, while a rate change of or often increases the intensity of build- ups.

The dissimilarity measurement used for rate is the difference in area between the onset envelopes, combined with the difference in area between the pitch envelopes (for an example, see Figure 7, below). The difference for inter-onset is normalised by the maximum possible difference area, which is the length of the loop squared. The difference for pitch is normalised by an arbitrary maximum of multiplied by the loop length, which is equivalent to one note in each pattern, semi-tones apart. If the difference happens to be greater than this, the maximum dissimilarity of is returned. These measures are sufficient, but could be improved by normalising by the maximum (for pitch only) and magnifying the lower end of the spectrum through a logarithmic function, as implemented in the “linear pitch similarity” measure of the Markov Morph.

pitch

time

Figure 7 Two pitch envelopes (red and blue) and the difference in area between them (in pink). The difference in area between the onset and pitch envelopes are used as the dissimilarity measure for rate.

The result is probably similar to a NN measure. A comparison of the two measures is yet to be thoroughly investigated, however it is imagined that because rate preserves the contour of the envelope, the dissimilarity function that measures envelope differences is more relevant than the NN distances.

7.3.3 Phase

The phase transformation shifts the start time of each note within the input by a certain amount, either ahead or behind. Onsets that exceed the loop boundary are wrapped around. The dissimilarity measure for phase is the NN dissimilarity measure, because of the high-level of accuracy in absolute difference between notes. The effect of phase is that the input pattern is “nudged” ahead or behind a certain amount, so as to maximise the absolute similarities between

208 the patterns. This can have the musical effect of highlighting the similarities of notes relative to one and other (rather than relative to the metre) in the input and target patterns. The parameter space for phase is the series of quarter-intervals from through to , that is, , , , , , , . This makes different parameter settings, not including or “bypass”.

7.3.4 Harmonise

Harmonise works by either adding or removing parallel harmony at each of the tonal intervals from the through to the octave. This means different parameter settings in all, excluding “bypass”: remove , , , , , ; or add , , , , or . While not all of these intervals are particularly common, they are included to provide flexibility.

To remove a harmonic interval, the note data is analysed to find the notes that occur on the same start time but with a different pitch – “harmonic clumps”. Within each clump, notes that are the specified interval above the lowest pitch are removed. For the addition of harmonic intervals, monophonic notes are doubled with a parallel harmony at the specified interval.

The dissimilarity measure used by harmonise is a weighted combination of the difference in average harmonic interval and the difference in average clump size. The average harmonic interval is the sum of the average harmonic interval of each clump, normalized by the number of harmonic clumps (including clumps with a single note).

The weighting between average harmonic interval and average clump size is and respectively. This is because the most important effects rendered by harmonise are the changes in the harmonic interval, but clumps of various sizes can have the same average interval and so the average clump size is included so as to help differentiate between cases such as this.

7.3.5 Scale pitch

The scale pitch transformation expands or reduces pitch range, while maintaining contour. The pitch of each note is scaled by a certain ratio, relative to a Central Tonic (CT) pitch. There are fourteen ratio values, excluding “bypass”, in the set of possible parameter configurations. These are all the intervals of from to : , , , , . This was chosen so that it would be possible for an octave to be shifted to any of the other scale degrees above or below it.

The CT pitch is the tonic in the most central octave covered by the pitches of the input sequence. Firstly, the average pitch of all the notes is calculated and rounded down to the

209 nearest tonic. The pseudocode to find the CT pitch is as follows:

// Inputs: I an array of pitches spo the number of steps per octave. // Output: ctp the tonic pitch of the most central octave FIND-CENTRAL-TONIC(integer array I, integer spo) returns integer ctp { avp = SUM(I)/LENGTH(I) ctp = avp – MODULUS(avp, spo) return ctp } Figure 8 Pseudocode for the algorithm to find the tonic of the central octave.

For each note pitch in the pattern, the interval with the CT is found by subtracting the CT from it. This interval is multiplied by the scaling ratio and the fundamental is added back to it. The scaled pitch is then adjusted to be in-scale or out-of-scale, to be consistent with the original DePa value.

Finally, a bounding function sets the pitch to be the maximum if it is above maximum, or sets it to be the minimum if it is below minimum. The default bounds used in the DePa representation is octaves: between and the number of DePa steps per octave multiplied by . This would be for a standard church mode. Pseudocode of scale pitch follows:

210 // Inputs: I an array of pitches (in DePa) r a ratio which controls to what extent the pitches are scaled spo the number of steps (in DePa) per octave. hs the largest number of consecutive semitone steps between each tonal semitone in the current scale // Global functions: BOUND takes a DePa pitch and spo and constrains the pitch within the range 0 and spo*10 // Outputs: I the original array of pitches that has been scaled SCALE-PITCH(integer array I, double r, integer spo, integer hs) {

ct = FIND-CENTRAL-TONIC(I); // (defined above) FOR(int i = 0; i < LENGTH(I); i++) {

// record the scale remainder. 0 is in scale, others are out of scale

ton = MODULUS(I[i], hs) I[i] = I[i] – ct // find difference with the fundamental

I[i] = I[i] * r // scale the difference

// find the scale remainder of the scaled pitch nton = MODULUS(I[i], hs)

// shift to the nearest point with a similar remainder IF( ABOLUTE(ton - nton) < ABSOLUTE(ton + hs - nton) ) { I[i] += ton - nton } else { I[i] += ton + hs - nton } I[i] = I[i] + ct // add the fundamental back in BOUND(I[i], spo) // constrain it within bounds return I[i] } } Figure 9 Pseudocode for the scale pitch compositional transformation.

An alternative scale pitch technique that avoids the need to bound the pitches is to determine the maximum possible ratio without exceeding the bounds, and weight this ratio with a parameter to determine the final scaling ratio that is used. This approach was trialled, however, the current approach was favoured as it was more predictable and the exceeding of bounds was rarely a problem.

The dissimilarity measure used for scale pitch is the difference in average interval from the CT, . Recalling that is a sequence of note pitches, is the length of , is an index to with the range and is the central tonic pitch, we have:

Equation 5 The average interval from the central tonic. is the sequence of note pitches, is the number of note pitches in (the cardinal). is an index to and is the central tonic.

211 This measure is used because the average interval from the central tonic is directly affected by the scale pitch function. One improvement to this measure could be to store and use the CT of the original sequence, rather than recomputing it for each of the scaled sequences. This is because in some situations the CT itself may be shifted, resulting in overly dramatic scaling ratios, as evident below (7.4.1).

7.3.6 Inversion

Inversion operates in a similar way to “chord inversion”, but inverts the pitches of the input sequence, , rather than a chord. The degree of inversion is controlled by a parameter, let it be , that ranges between . The number of octaves, , by which to shift the pitches that are to be inverted must be greater than the octave range of the sequence, so that inverted notes do not clash with existing notes. Letting the number of steps per octave be denoted by , we have , rounded down2.

The parameter controls which pitches will be shifted and the direction. is the fraction of the range that will be shifted, and determines whether the percentage is taken from the top or bottom of the range and which direction it will shift. When the selected pitches are taken from the bottom and shifted up and when they are taken from the top and shifted down. For example, when , all of the pitches that are below half the pitch range of the input pattern are shifted up by octaves and when , all of the pitches that are above half the pitch range are shifted down by octaves. The inversion transformation is in pseudocode below:

2 Adding and rounding down is chosen over simply rounding up, as it includes the octaves, which would not normally be rounded up.

212 // Inputs: I an array of note pitches that are to be inverted p fraction of the range of pitches and the direction of the inversion spo the number of steps in each octave // Outputs: I the array of note pitches that has been inverted

INVERT(integer array I, double p, integer spo) { returns integer array I lop = FIND-LOWEST(I) hip = FIND-HIGHEST(I)

shift = (ROUND-DOWN((hip-lop)/spo) + 1) * spo // the amount to shift

FOR(integer i = 0; i < LENGTH(i); i++) { IF( p > 0) { // inverting upwards IF( I[i] <= ((hip-lop) * p + lop) ) I[i] = I[i] + shift } ELSE-IF(p < 0) { // inverting downwards IF( I[i] >= (hip-lop) * (p + 1) + lop ) I[i] = I[i] - shift } } return I }

Figure 10 Pseudocode of the inversion transformation.

To select an inversion, the dissimilarity measure used is the difference in pitch envelopes, as described above in Rate (7.3.2). This was chosen because the inversion process has a substantial effect on the contour of the pitch envelope.

7.3.7 Octave

The octave transformation shifts the whole note sequence by a number of octaves, either: , , , , or . The dissimilarity measure used for octave is simply the difference in average pitch.

7.3.8 Add/remove

The add/remove transformation guarantees that the output will be closer to the target, , than the input, . Each note in is considered for removal and each note in is considered for addition3 and because of this, the parameter space for add/remove is not fixed like the other transformations. The NN dissimilarity measure is used to compare each result to , so as to ensure accurate results. Add/remove is always the last transformation in the chain, enabling

3 Adding notes from the target is a kind of ‘cheat’. Ultimately, TraSe aims to function without deriving material directly from the target.

213 sequences that are not brought closer to by any other transformation to eventually converge. Add/remove can be cycled a number of times specified by the user, feeding the output immediately back into the input. This is the ‘add/remove cycles’ parameter.

Add/remove operates in either ‘polyphonic’ or ‘monophonic’ mode. In monophonic mode, if a note is added at an onset that is already occupied, the existing note is replaced by the new note. Specifically, the pitch of the old note is replaced by that of the new note whereas the values of duration and dynamic for the new note are derived through combination between the old and new notes. The weighting of the combination is determined by the user. This feature was added after I had trialled source and target examples with very different durations and dynamics. It was found to add a degree of smoothness to the transition.

In polyphonic mode, the new note is overlaid, regardless of whether a note exists on the same onset. If a note exists on the same pitch (and onset), the attributes of the existing note are merged with the new note, using the same weighted combination parameter as monophonic mode.

Monophonic mode has greater transformational impact than polyphonic mode because, through replacement, an addition and removal can occur in a single step. Musically, especially in contexts that are mostly monophonic, the harmonies generated by polyphonic add/remove can sound unrealistic. A possible improvement could be for polyphonic mode to perform complete vertical replacements. That is, add or remove vertical note groups rather than single notes.

7.3.9 Key/scale morph

The key/scale morph deals with key and mode data, rather than sequences of notes, and operates in parallel to the note sequence transformations described above. In other words, each TraSe morph consists of a sequence of note pattern frames and a separate sequence of key/scale frames, which can be of a different length.

The frames generated by key/scale morph each contain a key which can be from C, C#, and so on, through to B, and a scale which can either be ionian, dorian, phrygian, lydian, mixolydian, aeolian, lochrian or harmonic minor. The selection pool at each step of the key/scale search consists of the different combinations of keys and scales.

If the parameter controlling the speed of transformation is set to , the process will converge in a single step because the parameter space covers the complete range of values that are possible. This is close to being the perfect transformation mentioned in 7.2.2. The transform speed is

214 typically set at less than so that a progression of key and scale changes is generated.

The dissimilarity measure used for comparing of each of the key-scales with the target is a weighted combination of scale dissimilarity, key-scale dissimilarity and key-root-distance. The user specifies the weights manually. Scale dissimilarity takes two different scales, each consisting of a set of allowable pitch classes, and counts how many are not shared, normalizing the result between and . This is a similar technique as used by Polansky (1987). Key-scale dissimilarity is similar to scale dissimilarity except that the pitch-classes in the scales are first transposed by the key. For example, with scale dissimilarity, C-Ionian and A-Aeolian would be considered different by a factor of (the comes from the difference between Ionian and Aeolian in the , and ), whereas with key-scale dissimilarity there would be no difference at all, because A-Aeolian is the relative minor of C-Ionian and the two pitch class sets are identical.

Key-root distance is a weighted combination of distance between keys in the CF and the Circle of Chroma (CC), both of which were explained in detail in the previous chapter. Let represent the number of steps per octave, and thus the number of keys also. The default implementation is , with the keys roots from (C) to (B), however, it is possible to use other tuning systems and scales. If and are the two keys roots being compared, then the distance between them in CC space is an application of the circle distance function, . The CF distance is similar, except that and are first transformed into the CF by multiplying by a fifth, modulo .

In all, there are five different weights that influence dissimilarity measurement: scale dissimilarity, key-scale dissimilarity, key-root distance, CC key-root distance and CF key-root distance. Tweaking these weights, along with the morph speed, is the primary method for the user to control the musicality of key/scale morph. These parameters enable a surprising degree of musical control. For example, a “pop” style key change which retains the scale and shifts by chromatic increments can be achieved by favourably weighting scale dissimilarity, key-root distance and CC key-root distance. A more jazz-oriented key modulation, that walks through the CF and utilises pivot notes can be achieved by favourably weighting key-scale dissimilarity, key- root distance and CF key-root distance.

7.4 Informal evaluation: listening and analysis

Informal evaluation of TraSe was an iterative process of morph generation, reflection, parameter adjustment and further morph generation. From this, pertinent examples have been selected and

215 are discussed below. Firstly, a series of morphs that are produced from the same source and target note sequences but with different parameter tunings are analysed and compared, providing some insight into the TraSe process and the effect of the parameters. Following this, some differences between morphing forwards and backwards are examined. Some examples of key/scale morphing are also shown, including ii-V-I style modulation as well as direct chromatic shifts. The section concludes with a morph tribute to Mathews and Rosler (1969), which, due to the length of the note sequences is a particularly difficult example.

7.4.1 Tuning TraSe parameters

By tuning the parameters of TraSe, a range of different musical outcomes are possible. The first example below does not bias any particular transformation, however, it appears to have split too many notes with the divide/merge transformation, resulting in a degree of chaos. The second example biases note merging in order to balance note division. The third example demonstrates the emergent nature of TraSe: the transform speed is reduced, however, more frames are generated unexpectedly. The final example shows how smoother morphs can be obtained by limiting the number of transformations per cycle.

The source and target sequences that were used for all morphing examples have the same key and scale labels of C Mixolydian, and one bar in length. Although this automatically increases the similarity of the two, it was done so that the focus is on note sequence morphing rather than key/scale morphing or other aspects.

The source (~7.1) is a playful tune with short staccato notes played on beat intervals which, against the 4/4 drums, creates a “three over four” type polyrhythm with a short macrocycle. It contains pitches C, D, F and Bb and the tonal movement is I-IV. In contrast, the target (~7.2) is an ambiguous, rising tonal wash with long notes that are mostly syncopated, a greater pitch range and a mostly upward contour. Two pitches are different to the source, E and G, while F is not included. The tonal movement could be interpreted as ii-I, VII-I, V-I or possibly other combinations. They both have six notes.

216 A B

Figure 11 Close up of Source (A) and Target (B) .

Excessive note division inducing chaos

For this example (~7.3), all of the parameter weights were neutral, however, a slight excess of note division from the divide/merge transformation appears to have occurred. In order to obtain convergence at frames4, the cut-off similarity rating for convergence was increased to and the transform speed of add/remove was adjusted to . Add/remove was set to cycles.

Figure 12 TraSe morph with all weights equal. Add/remove has two cycles, the transform speed of add/remove is and the threshold of similarity to target is .

In the first frame, scale pitch shifted the G5s on and to F5, the C6 on to A5, the A5s at and to G5 and the F5 at to E5. Following this, the inversion transformation was applied upwards to a small degree, which had the effect of shifting E5 (the lowest pitch) at up an octave to E6. The first cycle of add/remove replaced the F5 at with the G5 of the target which also has a longer duration. The second cycle of add/remove removed the A5 on .

4 More than frames becomes difficult to present.

217 In the second frame, divide/merge split the G5 on into two notes, the second one appearing at . Pitch stretch shrunk the pattern, shifting all of the notes down by one tonal step, except for the E6 at which was shifted down by three, to Bb. The first cycle of add/remove added the C5 at , while the second added the E5 at .

In the third frame, the E5 at was split by divide/merge, resulting in another E5 at . A slight scale pitch was applied, shifting most of the notes up one tonal step and the Bb5 at up two tonal steps to D6. The C5 at , remained in place. Add/remove added the D5 at and replaced the new F5 at with C6.

In the fourth frame, divide/merge split the C6, creating a new C6 at . Scale pitch applied a slight shrink, shifting the G5s at , , and down to F5, the C6s at and down to Bb5 and the D6 at to C6. The first cycle of add/remove replaced the F5 at with E5 and the second cycle of add/remove replaced the F5 at with a G5.

In the fifth frame, the G5 at was split by divide/merge, resulting in a new G5 at . The first cycle of add/remove replaced the Bb5 at with C6, while the second cycle removed the F5 at .

In the sixth and final frame, the E5 at was divided, resulting in another E5 at . The harmonise transformation removed the C6 at . The first cycle of add/remove removed the F5 at and the second cycle removed the F5 at . If the similarity cut-off for convergence had not been increased to , the process would have continued for an additional cycles.

The shrink applied by scale pitch in the first frame, in combination with the removal of the A5 at , seems to change the harmonic movement from I-IV to IV-V. The first frame also provides more potential for change due to the lower note density. In the second frame the harmony reverts to I-IV, with some additional highlighting of the third and seventh intervals brought about by note addition and pitch-shrinking respectively. In the third frame, the tonal centre becomes the ii and a degree of ‘openness’ results from the combination of whole-tones, perfect fourths and the greater pitch range. Rhythmically, the pace of frame three is more frantic, as the density increases by two from the original six, to eight. This increases further in frames four and five, which have ten notes, while the harmony becomes more ambiguous. In frame six, the melodic contour is essentially the same as the target, but with repeated notes on the G5 and E5. Overall, this morph appears to be somewhat chaotic due to the excessive note density created through note division.

218 Balancing note division with note merging

This example (~7.4) demonstrates how the excess note division of the previous example can be counter-balanced by adjusting weights to bias merging over splitting. In this case, all ‘splitting’ weights for divide/merge were set to , while ‘merging’ weights remained at . The transform speed for add/remove was and the dissimilarity cut-off was to ensure convergence in frames.

Figure 13 TraSe morph that demonstrates the use of weights that favourably bias the compositional transformation of “merging” notes over that of “splitting” notes, thus creating a less chaotic morph compared to the previous example. Add/remove transform speed is and dissimilarity cut-off is .

In the first frame, scale pitch shifted all of the notes down one tonal step, except for the C6 at which was shifted down by two. Inversion was then applied, shifting the E5 at up an octave to E6. These two transformations are identical to those of the previous example. Unlike the previous example, add/remove added two notes, the C6 at and the Bb5 at .

In the second frame, divide/merge applied a forward merge, fusing the E6 at into the C6 at and the G5 at into the Bb5 at . Following the merge, scale pitch shifted all of the notes down one tonal step, except for the C6 on and the Bb5 on , which were shifted down two tonal steps to A5 and G5 respectively. The first cycle of add/remove added the D5 at and the second cycle replaced the G5 on with Bb5.

In the third frame, divide/merge merged forward again, consolidating the G5 at into the D5 at . This was followed by a severe scale pitch that shifted the E5 at to F5, the D5 at to E5, the E5 at to F5, the F5 at to A5, the A5 at to E6 and the Bb5 at to F6. This

219 severe stretch occurred because it shifted the central tonic to be C6 rather than C5. Recall that the similarity measure used by scale pitch is based on the average distance of pitches from the fundamental pitch. The average distance from C6, post stretch, was calculated as being closer than that of the previous average distance from C5. Following this, the whole pattern was then shifted down by an octave. Add/remove then replaced the F4 at with G5 and the E4 at with D5.

To generate the fourth frame, a slight downward scale pitch was applied, shifting the F4 at down to E4, the G5 at to A5, the E5 at to F5 and the F5 at to G5. The first cycle of add/remove replaced the E4 at with C5 while the second cycle replaced the F5 at with Bb5.

In the fifth frame, scale pitch shrunk the pitches slightly, resulting in the A5 at shifting to Bb5 and the Bb5 at shifting to C6. The first cycle of add/remove replaced the newly shifted Bb5 (at ) with G5, while the second cycle added an E5 at .

In the sixth and final frame, a divide/merge applied a merge-forward, consolidating the D5 at into the C5 at , the A4 at into the G5 at and the F5 at into the E5 at . The first cycle of add/remove added the D5 at , while the second cycle replaced the C6 on with Bb5.

The favourable weighting of merging over splitting in this example, and possibly the higher transform speed of add/remove, sets a different morphological trajectory to that of the previous example. The first frame is similar to that of the previous example, with an identical scale pitch and inversion. However, from the second frame onwards, there are numerous examples of merging where the previous example would divide, resulting in a less chaotic morph overall. The drastic scale-pitch of the third frame could be solved fairly trivially, as explained in 7.3.5, by referring to the original CT (Central Tonic) pitch of the source, rather than recalculating it for each transformed example. Alternatively, an additional transformation could be added after scale/pitch to compensate for the change of CT pitch (while maintaining the average distance from CT that was introduced by scale/pitch).

Emergent phenomena

This example (~7.5) demonstrates a situation where convergence unexpectedly occurred in fewer frames after the transform speed was reduced. In the previous example, a transform speed of with dissimilarity cut-off of yielded convergence in frames, whereas, in this

220 example, when the transform speed was reduced to from , it converged in :

Figure 14 Example of a slower transform-speed yielding faster convergence. Settings are the same as previous example, except for the transform speed on add/remove which

was reduced to .

In the first frame, as with the previous two examples, scale pitch applied a slight shrink, followed by an inversion that put the E5 at up to E6. Unlike the previous morphs, add/remove replaced the F5 at with C5 and the F5 at with G5. This was entirely due to the difference in transform speed. In the previous example, patterns with the addition of E5, C6 and Bb5 were all judged to be equally close to the target dissimilarity of specified by the transform speed, whereas, in this example, the G5 was the only real candidate, fitting the target dissimilarity of very closely at . The next closest was the C5 at (this is observable in Appendix D).

In the second frame, a divide/merge merged forward, as with the previous example. However, unlike the previous morph, the E6 on and the G5 on was unaffected while the A5 at was merged into the C5 at and the G5 at merged into the G5 at . Because of this, the subsequent scale pitch was milder, shifting all the notes down a tonal step except for the C5 on which remained there because it is the tonic pitch of the most central octave and thus the centre of the scaling operation. The two cycles of add/remove resulted in the C6 at and the Bb5 at .

In the third frame, the forward merge occurred again, as with the previous morph. Unlike the previous morph, the effect was to merge the F5 at into the Bb5 at , rather than the G5 at into the D5 at (both of which did not exist in this case). Following this, scale pitch applied a slight shrink, shifting the C6 at to Bb5 and the Bb5 at down to A5. As a result, the

221 complete octave shift that occurred in the previous example was not needed. Add/remove added the D5 at and the E5 at .

In the fourth and final frame, a slight pitch stretch was applied, shifting the Bb5 at back up to C6 and the A5 at back up to Bb5. Following this, only a single cycle of add/remove was required to replace the F5 at with G5.

With only a slight change to the ‘speed’ parameter, from to , this example is substantially different to the previous, thus highlighting the emergent nature of TraSe. The biggest difference occurred in frame three, with a slight pitch-shrink as opposed to a dramatic stretch. Compared to the previous morph, the overall structure at this point was maintained much more effectively and because of this, fewer frames were needed. This highlights some of the complexity in dealing with multiple transformations and dissimilarity measures, especially in terms of emergent flow-on effects.

Streamlining the transformations

A musical problem with TraSe is that mutation tends to be greater during the beginning of the morph, where multiple transformations are applied within a single frame, compared to the end of the morph, where the add/remove transformation performs slight adjustments. In response to this, the ‘mutation limit’ caps the number of transformations that may occur each frame. The effect is smoother changes during the beginning, however, more frames are generated.

This example (~7.6) has similar settings to the examples above, except the add/remove transform speed is set to and the mutation limit is set to transformations per frame.

222

Figure 15 The number of transformations per frame has been limited to two. Merge is biased over split. Add/remove transform speed is .

In the first frame, a pitch-shrink and upwards inversion was applied, as with the first frame of the previous example, however, the limit of two transformations per frame obstructed the add/remove transformation entirely.

In the second frame, scale pitch shifted every pitch down by a tonal step, except for the E6 at which was shifted down to Bb5. Add/remove added the D5 at and the C6 at .

In frame three a dramatic scale pitch occurred, shifting the E5s at and to G5, the D5 at to E5, the G5 at up to D6, the F5s on and up to Bb5, the C6 at up to C7, and

223 the Bb5 at up to A6. The octave transformation then shifted the whole pattern down one octave.

In frame four, the only transformation applied was add/remove, which added the Bb5 at and replaced the G4 at with G5.

In frame five, a divide/merge applied a forward merge, consolidating the D5 at into the E4 at , the Bb4 at into the G5 at , the Bb5 at into the C6 at and the Bb4 at into the Bb5 at . Scale pitch then applied a pitch-shrink, shifting the G4 at up to A4, the E4 at up to G4, the G5 at down to F5, the C6 at down to A5 and the Bb5 at down to G5.

In frame six, add/remove was the only transformation applied, which added the E5 at and replaced the G4 at with D5.

In frame seven, scale pitch shifted the A5 at up to Bb5 and the G5 at up to A5. The first cycle of add/remove replaced the A4 at with C5 and the second cycle replaced the Bb5 on with C6.

In frame eight, scale pitch slightly shifted the C6 at up to D6 and the A5 at up to Bb5. The first cycle of add/remove reasserted the C6 at and the second cycle replaced the F5 at with G5.

From listening to this example, it is clear that imposing a limit on the number of transformations that occur each frame reduces the severity of mutation, particularly in the first few frames. The limit ensured only two transformations per frame, while the other examples above averaged three or more.

7.4.2 Morphing Backwards and Forwards

TraSe is currently a unidirectional morph, in that a morph generated from source to target will be different to a morph generated from target to source, even when using the same parameters and played in reverse. The first few frames result from large scale transformations while the last few frames are influenced mostly by add/remove.

In order to informally ‘cross-examine’ this effect, an example is included (~7.7) that has been generated from target to source (the same source and target as the previous section):

224

Figure 16 Morphing from target to source. The add/remove transform speed is set to and merge is bias over divide. The dissimilarity cut-off is zero.

In the first frame, a divide/merge applied a forward-merge, which consolidated the D5 on into the C5 on , the E5 at into the G5 at and the Bb5 at into the C6 at . Following this, rate doubled the speed of the pattern. Scale pitch lifted the G5s on and up to A5 and the C6s on and up to E6. Inversion shifted the E6s on and down by two octaves to E4. Octave shifted the whole sequence up an octave. The first cycle of add/remove replaced the C5 at with a staccato G5. The second cycle replaced the C5 at with a staccato A5.

In the second frame, the forward-merge occurred again, this time consolidating the E5 at into the A6 at and the E5 at into the A6 at . A slower rate increase to that of the previous frame was applied, shrinking the onsets by a factor of rather than . The result brought the A6 forward to , the A5 at to , the A6 at to , with a repeat of G5 at , and A6 at . Scale pitch was then applied, shifting the A6s on , and to Bb6. An upwards inversion was then applied, shifting the A5 at up two octaves to A7 and the G5 at up two octaves to G7. The entire sequence was then shifted down an octave. The first cycle of add/remove added F5 at while the second cycle added the F5 at .

In the third frame, scale pitch raised the G6s at and to E7, the Bb5s at , and up to E6, the A6 at to F7, the F5 at to G5 and the A5 at to C6. This was followed by a severe upwards inversion, which shifted the E6s at , and up two octaves to E8, the G5 at up two octaves to G7, the E6 on up to E8 and the C6 at to C8. This pattern was then shifted downwards by whole octaves. The first cycle of add/remove added the A5 at

225 , while the second cycle added the C6 at .

In the fourth frame, the rate was reduced by a factor of , shifting the E6 at to , the C6 at to , the F5 at to , the E6 at to , the A5 at to ; and remove the E5 at , G5 at , E6 at and C6 at altogether. This was followed by a severe scale pitch which shifted the E6 at down to A4, the E6s at and up to G6, the F5 at down to C5, and the A5 at down to G5. An upwards inversion then shifted the A4 at to A6 and the C5 at to C7. The sequence was then dropped by a whole octave. The first cycle of add/remove replaced the C6 on with an A5, while the second cycle added the G5 at .

In the fifth frame, a merge consolidated the C5 at into the G5 at and the G4 at into the G5 at . No other transformations were applied on this frame except for add/remove. The first cycle replaced the G5 at with C6, while the second cycle replaced the G5 at with F5.

The sixth frame completed the morph with two cycles of add/remove, adding the A5 at and replacing the A5 at with G5.

Once again, this example demonstrates dramatic, yet fairly coherent, transformations during the beginning and middle of the morph, tapering away to very minor adjustments towards the end.

To provide another comparison, this morph was also played back in reverse order of frames, from target to source (~7.8):

Figure 17 The same morph of the previous example, played in reverse order of frames.

The second half of this ‘reverse’ morph (Figure 17), appears to me to be somewhat less

226 coherent than the first half of the corresponding ‘forward’ morph (Figure 16), even though the frames are identical. This effect might be explained by the differing musical contexts and expectations of the two examples. Electronic music often consists of cycles of break-down and build-up, where the break-down occurs suddenly and the build-up occurs gradually. In the ‘forward’ morph, the quick cut away from the source can be considered analogous to a break- down, in that the continuity is broken. This also serves as a structural indicator, suggesting a new section. From the ‘pseudo-breakdown’, frames three, four and the target are a gradual ‘build-up’ of stability to the target sequence, thus fitting the standard pattern of break-down to build-up more closely. With the reverse order of frames, the contour of continuity is also reversed and this does not fit the standard ‘break-down, build-up’ structure. As well as this, there is no clear indication of the start of the morph.

Despite these interpretations, my subjectivity is biased from listening to more ‘forward’ than ‘reverse’ morphs over the period of this study. A formal test of ‘reverse’ and ‘forward’ morphs with multiple participants would be needed before any conclusive claims can be supported.

7.4.3 Key/Scale morphing with TraSe

The TraSe algorithm computes the morph for key and scale data separately to note sequence data. Examples of key/scale morphing are included here that cover modulating to a distant key and modulating through function change. A significant degree of flexibility and control is demonstrated by the key/scale morphing algorithm.

To recap the process: to generate a key/scale frame at each step, all the possible keys and scales are considered and rated according to their dissimilarity with the target. The dissimilarity rating is a weighted combination of scale dissimilarity, key-scale dissimilarity and key-root distance, which itself is a weighted combination of distances in the CC and the CF. Because all possibilities are considered, the key/scale transformation is almost a perfect transformation, which means that by adjusting the transform speed, the user can specify how many frames are generated.

Modulating to a distant key

Between C Major and F# Major only two of the pitch classes (F and B) are shared, as is evident in the table below:

227

C Maj F# B A# A G# G F# F E D# D C# C Figure 18 Comparing the pitches in C Major with the pitches in F# Major

We will now examine select examples of morphing between C# and F# Major. Most of the examples below consist of frames, which results from a transform speed of .

If the key-root distance, CC and scale dissimilarity are weighted maximally, the expected result is a series of linear transpositions while maintaining the scale. This is confirmed with the following print-out:

frame 0 : root = 0 scale = ionian frame 1 : root = 1 scale = ionian frame 2 : root = 3 scale = ionian frame 3 : root = 5 scale = ionian frame 4 : root = 6 scale = ionian Figure 19 Key/scale morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using CC, key-root distance and scale dissimilarity.

If CF is used instead of CC, the transpositions will be intervals of a fifth:

frame 0 : root = 0 scale = ionian frame 1 : root = 7 scale = ionian frame 2 : root = 9 scale = ionian frame 3 : root = 11 scale = ionian frame 4 : root = 6 scale = ionian Figure 20 Output from morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using CF, key-root distance and scale dissimilarity.

If the key-scale dissimilarity is used exclusively, we obtain what at first appears at first to be an illogical result, ending on D# Aeolian, rather than F# Ionian:

228 frame 0 : root = 0 scale = ionian frame 1 : root = 9 scale = mixolydian frame 2 : root = 1 scale = aeolian frame 3 : root = 11 scale = ionian frame 4 : root = 3 scale = aeolian Figure 21 Output from morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using key-scale dissimilarity as the only measure.

On closer inspection, this is consistent with the specified task of finding the a sequence of key/scales that are related to each other by the pitch classes they contain; D# Minor (root = , scale = Aeolian) is the relative minor of F# Major and thus contains all of the same pitches. The logic of this progression is made clear in the following table:

C Ioni A Mixo C# Aeol B Ioni D# Aeol B A# X A X G# X G X F# X F X X E X D# X D X C# X C X

Figure 22 The pitch-classes (rows) present in each key/scale frame (columns) during a morph from C Ionian to F# Ionian, unexpectedly finishing on D# Aeolian. The “X” marks where there is a difference between the previous scale and the scale with the X, in terms of the pitch-class that the X is on.

In order to converge with the specified target, a very small fraction of key-root distance can be included:

229 frame 0 : root = 0 scale = ionian frame 1 : root = 6 scale = phrygian frame 2 : root = 6 scale = dorian frame 3 : root = 6 scale = mixolydian frame 4 : root = 6 scale = ionian Figure 23 Output from morphing between C Major (Root = 0, scale = Ionian) and F# Major (root = 6, scale = Ionian), using absolute pitch-class similarity combined with a small (0.01) weighting of key-root distance.

Compared with the previous example, this contains exactly the same set of pitches, but ends on the specified target of F# Ionian:

C Ioni F# Phry F# Dori F# Mixo F# Ioni B A# X A X G# X G X F# X F X X E X D# X D X C# X C X

Figure 24 The pitch-classes (rows) present in each key/scale during a morph from C Ionian to F# Ionian using absolute pitch-class similarity combined with a small (0.01) weighting of key-root distance. The “X” marks where there is a difference between the previous scale and the scale with the X, in terms of the pitch-class that the X is on.

It is interesting to note that the pitch class of F is absent during the morph, even though it is present in both the source and the target. This demonstrates how compositional transformations indirectly influence the musicality of the morph. Also worth noting is the immediate shift from C to F#. Musically it may be more effective for this shift to occur half-way through the transition, however, currently it is not possible to control this aspect. Given options with the same key-scale dissimilarity, the algorithm will select the key of the target.

230 Modulating though function change

When analysed in terms of harmonic function, the key/scale morph can demonstrate some common forms of modulation. For this, a source of C-Ionian and target of Bb-Ionian has been chosen due to similarity and common usage, rather than dissimilarity and difficulty (see previous examples for this).

With key-scale dissimilarity weighted , key-root distance weighted , CF weighted maximally over CC and scale distance weighted , a I-IV-ii-V-I progression is obtained where the I-IV is relative to the source key and the ii-V-I is relative to the target key:

frame 0 : root = 0 scale = ionian frame 1 : root = 5 scale = lydian frame 2 : root = 0 scale = dorian frame 3 : root = 5 scale = mixolydian frame 4 : root = 10 scale = ionian Figure 25 Modulating from C Ionian to Bb Ionian. Key-scale dissimilarity is , key-root distance , CF , CC , scale distance , track VS consistency and transform speed .

This kind of ii-V-I progression is commonly cited by jazz musicians, for example see (Pachet 1997; Pachet 1999).

The key/scale morph can also reveal coherent modulation pathways that are not particularly obvious. For example, the following progression of I-vii-ii-I , where the vii-ii-I is in the key of the target. This is arguably a coherent, if a little edgy, progression:

frame 0 : root = 0 scale = ionian frame 1 : root = 9 scale = lochrian frame 2 : root = 0 scale = dorian frame 3 : root = 10 scale = ionian

Figure 26 Modulating from C Ionian to Bb Ionian. Key-scale dissimilarity is , key-root distance , CF , CC , scale distance , tracking VS consistency and transform speed .

When the “track VS consistency parameter” is increased to the upper limit ( for this example) a smoother progression is generated, because the measure will include dissimilarity with the source, rather than purely the target. This is I-V-ii-I, where the V is in the key of F ( ) and ii-I is in the key of the target, Bb( ):

231 frame 0 : root = 0 scale = ionian frame 1 : root = 0 scale = mixolydian frame 2 : root = 0 scale = dorian frame 3 : root = 10 scale = ionian Figure 27 Modulating from C Ionian to Bb Ionian. Key-scale dissimilarity is , key-root distance , CF , CC , scale distance , tracking VS consistency and transform speed .

A slow transform speed can be applied to generate extended chord progressions. In the following example, a transform speed of yields: I-ii-vii-V-ii-IV-V-I where I is in the key of C, the following ii is in F, the vii-V-ii is in the target key of Bb, the IV is in F, the last V is in Eb before concluding with the I of Bb:

frame 0 : root = 0 scale = ionian frame 1 : root = 7 scale = dorian frame 2 : root = 9 scale = lochrian frame 3 : root = 5 scale = mixolydian frame 4 : root = 0 scale = dorian frame 5 : root = 10 scale = lydian frame 6 : root = 10 scale = mixolydian frame 7 : root = 10 scale = ionian Figure 28 Modulating from C Ionian to Bb Ionian. Key-scale similarity is , key-root distance , CF , CC , scale distance , tracking VS consistency and transform speed .

In summary, key/scale morphing is an effective application of TraSe because the entire search space is known, thus affording control over the number of frames generated. Flexibility has also been demonstrated, for example, through a range of solutions to the ‘difficult’ modulation from tonic to tritone. With the ‘easier’ modulation task of stepping down a whole tone, well known progressions such as ii-V-I were able to be generated.

7.4.4 When Johnny comes morphing home

Morphing between long, continuous melodies is difficult, because the large scale transformations are unable to perform the small-scale modifications that are necessary. This is evident in the following example from The British Grenadiers and When Johnny Comes Marching Home. This morph relies almost exclusively on the add/remove transformation, the other transformations being unable to offer any suitable modifications. The inadequacies of this are obvious in this example (~7.9). While this clearly demonstrates some problems, it is rare for extended tunes to occur in the short loops of MEM that are the research context. Nonetheless, the need for phrase analysis as a preliminary step to TraSe morphing is clear, as mentioned in more detail below (7.7.2).

232 7.5 Automated evaluation

Automatic evaluation of TraSe involved programmatically generating a statistically significant number of source and target test examples and, with default parameter settings, generating morphs for them. This technique was used to verify the time complexity and to investigate the correlation between the number of frames generated and the number of notes in the source and target note sequences. The time complexity has relevance to realtime operation, while the correlation between the number of notes in source and target and the number of frames generated is relevant to the issue of over-generation of frames. For these tests, the results using the whole transform-chain were compared to the results using only add/remove as a base-level comparison.

7.5.1 Computation time for a single frame

The time complexity of TraSe is , due to the add/remove transformation and the NN dissimilarity measure. In order to confirm this empirically, the time it takes to compute one cycle of add/remove for a single frame has been measured for different numbers of notes. No bias weightings were applied and the notes were equally distributed between the source and target.

The note sequences were generated randomly, with pitches ranging between - and onsets ranging from the start to the end of the loop. Velocity and duration was constant. The computation time was measured for notes, then again with notes, and so on up to notes, adding notes to source and target each time. The time was measured three times on each occasion to confirm the accuracy of measurements:

233 Computation time for add/remove vs the number of notes processed 180

160

140

120

100 Series1

Time (s) Time 80

60

40

20

0

4 2 6 0 4 8 2 6 0 4 8 2 6 0 4 8 2 6 18 32 46 60 74 88 7 8 6 8 9 10 11 13 14 15 1 1 200 214 228 242 256 27 28 29 31 32 34 35 3 3 3 Number of Notes

Figure 29 The computation time (y) for a single cycle of un-biased Add/remove, applied to various numbers of notes (x), which were randomly generated and evenly distributed between source and target.

These measurements were then fitted to polynomials from degree to degree in order to determine whether this curve was the result of , , or higher. It was found that the first and second degree polynomials could not sufficiently describe the curve, while polynomials of third degree and above fit it perfectly, providing empirical evidence that the complexity of add/remove is (see Appendix D, D-1).

While this justifies predictions concerning the time complexity of add/remove, less consistent results were discovered when the entire transformation chain was applied to the same randomly generated sequences:

234 Computation times for one cycle of the entire TraSe Chain 80

70

60

50

40 Time taken (s) taken Time 30

20

10

0 2 16 30 44 58 72 86 100 114 128 142 156 170 184 198 212 226 240 254 268 282 296 310 324 338 352 366 380 394 408 422 436 450 464 478 492 Number of Notes

Figure 30 The computation time (y) for a single cycle of the un-biased transform-chain, applied to various numbers of notes (x), which were randomly generated and evenly distributed between source and target.

First of all, it is clear from the peak time of seconds, compared to the seconds of the previous example, that a single cycle of the transform-chain will usually take less time to compute than add/remove only. Another difference is that, although the computation time increases with the number of notes, there is a huge amount of variability.

This can be attributed to the emergent flow-on effects of TraSe, due to the variety of large-scale transformations. While predictions can be made regarding the critical aspects of complexity, the efficiency experienced during execution of TraSe will usually deviate substantially. This indicates a chaos-like susceptibility to initial conditions.

It should be noted that the data obtained above (Figure 29 and Figure 30) was for only a single frame of the morph – various combinations of music and transformation parameter configurations will require different numbers of cycles before convergence is reached, which also has an effect on the time complexity.

7.5.2 Number of notes and number of frames generated

The computation time of TraSe clearly increases with the number of frames generated. As well as this, the coherence of the morph can be adversely affected. For example, with many frames,

235 the default operation is for only small snippets from each frame to be played and it can become impossible to hear the logic of the entire progression. The automated tests presented below have been selected so as to best demonstrate the correlation between the number of frames generated and the number of notes in the source and target.

As a baseline for comparison, the first example shown below uses only add/remove to generate the morph. The randomly generated material is monophonic and quantised to intervals within a single bar. To minimise the extent by which identical notes are generated in source and target, and thus to further challenge the algorithm, the pitches are generated in different octaves: between C5-C6 in the source and between C6-C7 in the target. The add/remove transformation was set to cycles, monophonic mode and the transform speed was . There were sample morphs randomly generated for each number of notes from to (equally in both the source and target):

Generated using Add/Remove only, with source and target in different octaves 9

8

7

6

5

4

Numberof frames generated 3

mean 2 quadratic fit to mean mean plus standard deviation mean minus standard deviation 1 samples minimum maximum 0 0 2 4 6 8 10 12 14 16 Number of notes each in source and target

Figure 31 Number of frames generated with two cycles of add/remove only, monophonic, quantised to , source constrained to C4 octave and target constrained to C5. For each number of notes there were samples.

236 The logarithmic curve can be explained by the way add/remove functions in monophonic mode, replacing notes in a single operation if they are on the same onset as the note being added. With sixteen notes in both source and target, quantisation at and the loop length of beats, each cycle of add/remove will be twice as fast than if the notes had to be removed and added separately.

When the entire transform-chain is applied to source and target material that is generated in the same way – that is, in separate octaves so as to eliminate the possibility of exactly matching notes in source and target– many more frames are generated, on the whole:

237

Figure 32 Number of frames generated using the entire transform-chain, versus the total number of notes in the source and target patterns. Both source and target are monophonic and quantised to 0.25. Source is constrained to C4 octave and target constrained to C5. MAD is ‘Median Absolute Deviation’. AR is ‘Add/Remove’.

238 However, generating morphs with the transform-chain outperforms add/remove by itself in terms of minimum number of frames produced in each sample group. The comparison is made clear below:

add/remove only 1 2 2 3 4 4 5 5 6 6 7 7 7 7 8 8

All trans. 1 1 2 2 3 4 5 6 6 7 6 6 5 7 7 6

Figure 33 With source and target being generated on separate octaves, comparing the minimum number of frames produced at each number of notes when using “add/remove only” VS “all transformations”. The yellow highlighted columns refer to occurrences where the minimum “all transformations” was smaller than the minimum of “add/remove only”.

Some of the gains in terms of minimum frames generated could be attributed to the “octave shift” transformation, which would shift the entire source into the same octave as the target, thereby increasing the chance of notes being the same. As well as this, a source and target combination will occasionally occur where the large-scale effects of the compositional transformations happen to produce a rapid mutation into the target. It is unlikely to be purely random considering the already large sample size of . In addition to this, the extent of the difference is significant, with the ratio of smaller minimums for “add/remove only” : “all transformations” being : .

These automatically generated examples demonstrate the difficulties that TraSe has without manual parametric control from the user. In particular, the problem of over generation of frames remains evident, with the morphs that could not converge after frames peaking for notes at of the samples. Despite this, the potential of compositional transformations to assist convergence in fewer frames than add/remove has also been illustrated by comparing the minimum number of frames generated in each.

7.5.3 Future improvements to automatic testing

A number of possible improvements to the automatic testing procedure have become evident. Another test that is likely to show interesting results would be to apply the entire transform-chain on ‘musical’ loops, compared to randomly generated loops within the same general constraints. Also, to support comparisons, the same material should be applied in each test, rather than generating fresh material each time. Another useful set of tests would be on source and target with different numbers of notes, rather than evenly distributed between them. Although larger-

239 scale tests — such as applying the entire transform-chain to large sample sizes, un-quantised material and pitch ranges greater than an octave — could be executed over many days or on a super-computer, the main problem of too many frames before convergence, as evidenced in the current tests, will still remain and no solutions are likely to emerge. The research effort could be better spent on implementing some improvements to the current algorithm that are explained in 7.7.

7.6 Formal qualitative evaluation: web questionnaire

A web questionnaire was developed as an empirical qualitative musicological investigation into morphing between MEM loops of various styles. The two primary aims were:

1. Collect knowledge that can be used to improve future morphing algorithms.

2. Establish if LEMorpheus is likely to be successfully applied to real-world contexts.

These were broken down further into more specific objectives:

1. Discover elements of LEMorpheus generated morphs that are perceived to have a positive or negative impact, benchmarking against human composed morphs.

2. Obtain techniques for morphing.

3. Determine if there is a correlation between smoothness and effectiveness. Smoothness is defined as moment to moment continuity, whereas effectiveness is defined as the ability the music to affect the listener in a way that is perceived to be the intention of the composer.

4. Obtain opinions as to whether the morphing techniques would be applicable to games or live electronic music performance.

The approach was to research, compose and recreate sets of MEM loops, create LEMorpheus morphs between them and pay for a professional music producer to compose and produce morphs manually. The questionnaire was developed according to the objectives, widely publicised and completed by nine participants with musical backgrounds. Australian respondents were paid and feedback was collated and analysed.

In terms of results, LEMorpheus morphs were characterised by progressive musical variations, while the human composed morphs utilised layering and larger structural changes. While the majority of respondents favoured the human composed music, there was a great deal of

240 controversy in opinion, due to participants attending more or less to different elements in the music and differences in musical background.

A number of techniques for morphing were gathered, including the blending of similar elements, removal of dissimilar elements, trading of loops (as with ‘beat juggling’), incremental substitution of phrases, gauging of source and target dissimilarity to determine whether to blend or switch directly, generation of a separate bridge section or breakdown and synchronisation of rhythm and evolution of other elements.

There was found to be very little correlation between smoothness and effectiveness. For example, many human composed morphs were found to be coherent but not smooth, while many LEMorpheus morphs were smooth but not coherent. Some terminological ambiguity with ‘smoothness’ was apparent in some responses, however, there was sufficient additional explanation for this not to impact significantly on the validity of the data.

Overall, LEMorpheus was rated, near-unanimously, as being applicable to both computer games and live electronic music contexts. The few instances of a negative result was balanced by similar reactions to the human composed morphs.

7.6.1 Method: music creation

I composed the four sets of source and target loops, generated morphs for them using LEMorpheus and hired a composer/producer to compose “benchmark” morphs. The source and target music were not based on any particular track, but were creatively composed, so that a high level of dissimilarity could be ‘designed’ into the examples. The morphs were generated by iteratively adjusting LEMorpheus parameters and auditioning the results. The benchmark morphs were composed through a reflexive process of listening and editing.

All of the musical examples used in the questionnaire are included:

241 Example One

Source (~7.10). Target (~7.11). LEMorpheus morph (~7.12). Composer morph (~7.13).

Example Two

Source (~7.14). Target (~7.15). LEMorpheus morph (~7.16). Composer morph (~7.17).

Example Three

Source (~7.18). Target (~7.19). LEMorpheus morph (~7.20). Composer morph (~7.21).

Example Four

Source (~7.22). Target (~7.23). LEMorpheus morph (~7.24). Composer morph (~7.25).

To compose each example, I applied my own composition skills and familiarity with the genre of MEM. The loops were designed to contrast drastically in ways that included: timbre, metre, rhythmic pattern, note density, key, scale, dynamics and phrasing. The parts were limited to drums, bass, lead, chords and auxiliary sounds/percussion. Occasionally, parts that were too long were split into two different layers (as if they had been clustered into two different phrases) to enable less frames to be generated. The music was also subjected to informal criticism by peers. I used the LEMorpheus sequencer which drove a custom developed patch in the ReasonTM synthesiser. The first three examples were composed with contrasting timbres to reflect real-world application contexts. The fourth example had no timbrel difference and was included to compare the influence of timbre in the analysis of the note sequence morphing techniques of TraSe.

The LEMorpheus morphs were sixteen bars long. There were different parameter settings for each part and these deviated to some extent from morph to morph. For drums, harmonic transformations such as pitch stretch, harmonise and octave shift were bypassed. For other tonal parts, one or two transformations were bypassed when they caused over-generation of frames. The number of cycles of add/remove was generally kept low, at two or three, although, occasionally this had to be increased up to seven. There were usually around ten frames, which is an adequate fit considering sixteen bars, however, some parts had more, for example, the drums of the second morph with twenty-five frames. The morph examples usually had three to four key/scale frames which served to divide the morph into distinct sections. Fewer than this appeared to break the continuity, while more than this lost the sense of tonality. TraSe was used almost exclusively, however, a ‘straight switch’ was used for some of the cymbals/auxiliary parts,

242 and the Markov Morph was used for one layer of the lead in the second example.

Ande Foster (2007) was hired to produce the benchmark examples. The task was to morph smoothly from source to target while maintaining coherence. The limit was also sixteen bars, however, in example two, some flexibility in the length was allowed for the creative metrical integration of and . Foster was provided with MIDI files of each source and target and a ReasonTM patch for each part. No other musical materials, for example loops from a sample library, were allowed. The composer was encouraged to avoid recognisable sequences and compose creatively with the resources given. From the video documentation “talk through” of his compositional process, some observations can be made regarding Foster’s approach:

o Establishing rhythmic coherence is usually the primary task, followed by designing a coherent structure and integrating melody and harmony.

o If there is a way to play the music together with little modification, this is preferred.

o Ambiguous tonality can be employed effectively during transitions, especially in difficult key changes.

o Different timbres in source and target afford more drastic changes in the music, such as breakdown. As well as this, parts are ended (either faded out or stopped dead) earlier than in the current morphing algorithms and at musically significant points.

o When timbres are the same, ‘smoother’ processes that are similar to the techniques of morphing are preferred.

In summary, the music creation consisted of three stages: composition of source and target music, generation of morphs using LEMorpheus, composition of benchmark morphs. The source and target music was composed by myself, designing the sequences to be dissimilar. The LEMorpheus morphs were generated through adjusting parameters. The benchmark morphs were commissioned by a professional composer/producer.

7.6.2 Method: questionnaire

The questionnaire was designed to discover more about the positive and negative aspects of the morphs, new techniques, any correlation between smoothness and effectiveness and the real- world applicability of the morphs. For most of these, qualitative musicological analysis was chosen as the most fitting approach to obtaining such complex information.

243 The questionnaire, as implemented in flash, can be viewed (~7.26), the ‘.swf’ file can be run from a flash enabled browser. Source files and the XML with raw results of the questionnaire are available in the accompanying Rom, in the folder ‘7. digital Appendix’.

The questionnaire had four sections for each of the four musical examples and a page at the end to ascertain the musical background of the respondent, record the completion time and elicit feedback. Each of first four sections used the same eight pages but with different musical examples. The first two pages related to the source and target material and the following six for the two morph examples – three pages each of the same questions. Each musical example could be played, paused and shifted to any point within the music, with the time displayed in seconds. The bias from multiple listening that this would introduce was not considered a problem, as it is much easier to perform musicological analysis with repeated listening. As well as this, in many applications, the music would be played multiple times (computer games, in particular).

Questions on source and target

The first page elicited a ‘subjective response’ to the source and target. A list explaining various attributes that might be used was provided:

o Intensity (strong VS weak, or in-between).

o Quality (positive VS negative, for example, happy, uplifting or feel-good VS sad, dark or depressing, or in-between).

o Recognisable parts, phrase groups within parts, and relationships between them.

o Connections between what you hear and music you have already heard: musical references, stylistic traits and their implications.

o Whether it sounds ‘right’, like every aspect of the music had a relevant purpose or role within the style, or ‘wrong’, like the composer/producer made a mistake and didn't bother to fix it up.

Figure 34 List of stimuli for subjective response.

The focus of the second page was on morphing techniques. Respondents were asked what their approach and techniques would be and how the elements of the source and target music would affect the task. A list of musical elements, adapted from Pratt (1998), was provided:

244 o Metre-underlying pulse; cycle lengths, regular or varying.

o Rhythms-uniform or non-uniform.

o Pitch tonality-pentatonic (or fewer notes); seven notes, major, minor or modal; tonal or atonal.

o Texture - monophonic, independent lines of counterpoint or homophonic chords.

o Timbre - different sounding instruments may serve a similar musical function in source and target.

o Range - the range of pitches covered by the different parts.

o Density - the number and distribution of notes.

o Dynamics - fluctuations in loudness and softness of the instruments.

o Articulation - the degree of accent/attack and separation/release of notes.

o Pace - how frequently any or all of the elements above occur and change.

o Structure - what kind of phrasing or logical structure is behind the organisation of musical events.

Figure 35 List of musical elements to assist musicological analysis.

Questions on morphing examples

On the first page for the morph examples, a subjective response from the listener was required, with the list in Figure 34 being repeated for stimulus. The respondent was asked to rate the morph according to how smooth and continuous it was, with values from one to seven. They were then asked what affect the smoothness had on their subjective response. Lastly, they were asked whether the changes had been effectively applied, even if the morph was discontinuous; or, if the morph was smooth, whether the smoothness made it effective.

For the second morph-related page, the respondents were asked to listen to the music and identify four important changes, both musically acceptable and not. They were asked to record the time (in seconds) of the changes, their subjective reaction to it and a musicological justification of the subjective reaction. The list in Figure 35 was included for stimulus.

245 The third morph-related page was focused on real-world applications. The first question asked whether the morph would be acceptable in a real-world context such as a computer game or live electronic music show. The second question asked the respondent for small adjustments they would make to the morph if they were a composer/producer and their task was to fix the morph in five minutes. The time limit was imposed so that efficiency was considered in combination with the ideal result. They were then asked for changes they would implement if more time was available.

Background

The final page of the questionnaire asked the participants about their musical background, including music they have been exposed to, educated about and compose/perform. This was placed at the end of the questionnaire so the answers given were not augmented by the categories the respondents placed themselves under. They were also asked to comment on the questionnaire and estimate how much time it took them to complete it.

7.6.3 Results and analysis

The questionnaire was completed by nine participants5, each taking two hours on average. There was a consistent qualitative response to most of the source and target material which highlighted the dissimilarities of source and target examples. The approach to morphing proposed by most respondents consisted of similarity/difference analysis, thinning of dissimilar elements and rapid switching/cutting. A range of specific compositional techniques were also proposed. The source and target music had little influence on the techniques.

Subjective responses to the morphs varied considerably, depending on which aspect of the music attention was paid to. The typical result was a moderately negative response to the LEMorpheus morph, and a moderately positive response to the transition which was hand- composed by the professional; however there were some notable exceptions.

Smoothness was not correlated to effectiveness, in fact, if the source and target music was very dissimilar, most participants advocated that a smooth transition should not be attempted. Some respondents were confused by the notion of smoothness; although, the rich qualitative

5 More than sixty people began the questionnaire but did not complete it

246 information from various questions enabled a sufficient level of triangulation to interpret their response.

Obvious changes were usually noted as being important to all respondents, but would sometimes invoke different aesthetic responses. Other changes were highlighted by some respondents and not others. Changes noticed in the LEMorpheus morphs were usually musical variations, while changes noticed in the human composed morphs were usually layering or structural changes. This is due to the fact that the LEMorpheus algorithms do not utilise any techniques of structural form other than continuity.

On the whole, both the human composed and LEMorpheus generated morphs were judged to be applicable to real-world contexts, but a few respondents judged particular examples otherwise, both for LEMorpheus and the benchmark. All but one participant was male, and their backgrounds were mostly all in music composition, production or performance over a range of genres. One participant was an avid music listener but did not otherwise have a formal or professional musical background. The amount of experience ranged from three years to thirty- five.

The background of each participant is summarised below:

Participant Background One 30 years. Classical to pop, art, electronic. Researching/teaching. Male. Two 30 years. Classical to pop. Researching/teaching. Male. Three 15 years. Classical to pop, indie, electronic. Producing. Male. Four 40 years. Electronic, art, pop. Teaching. Male. Five 15 years. Alternative pop to electronic. Listening. Male Six 15 years. Alternative pop to electronic. Researching/Teaching. Male. Seven 4 years. Electronic, pop. Producing. Male. Eight 20 years. Classical to art/electronic. Researching. Female. Nine 10 years. Alternative pop to classical to electronic. Researching. Male.

Figure 36 Summary of participants’ backgrounds.

Subjective responses to source and target

The subjective responses to source and target music from most respondents have been summarised below (Figure 37). Because a range of different terms were used, a degree of interpretation was involved:

247

Example 1 Example 2 Example 3 Example 4 Source Dark and intense. Slow, jazzy. Ambiguous, sunny. Upbeat Latin/African. Target Happy, party. Upbeat funky. Horror, intense. Straight disco.

Figure 37 Summary of perceptions of the source and target material.

Despite this general trend, there were a number of anomalies. For example, participants two and seven perceived the source of example one to be “confused” and “jarring” in an unintentional way, partly because of the lack of a surrounding context within which to situate the loop. Despite this, it was still clear that the ‘dark/intense’ mood had still been perceived. Contrastingly, participant five perceived it to be “nice” and “trancey” and participant eight said “disco”, “good sound” and “dance”, without providing a mood indicator. Participant eight perceived the target of example two to be “too fast, and high pitched”. Participant seven identified the mood of example two source, but perceived it to be “not a good look… too slow to be successful”, although this was qualified by “not really my kind of thing”. A similar response was elicited from participant three. The most disliked loop of all was the source of example three. Despite this, only participants six and seven doubted the credibility of the music. Recall that the source and target of example four had contrasting musical elements, but with the same timbre. Despite the mostly mild responses, some minor criticisms of the example four source music were raised, with comments on note density, repetition and timbre selection.

Composition Techniques

A diverse range of compositional approaches were evident. Five participants suggested blending compatible elements and removing incompatible elements. Three participants advocated that the dissimilarity between the source and target should be gauged before deciding whether to attempt blending (if similar) or rapid switching (if dissimilar).

Rhythm was often mentioned before other elements and two participants explicitly stated that synchronisation of rhythm should occur first, followed by evolution of timbre and harmony. The various approaches to the task and the participants that proposed them have been summarised in the table below. Some degree of interpretation was required to condense the responses into this summary.

248

Participant (s) Overall approach Break into phrases and tracks. Do substitutions, one at a time, while 1,9 maintaining a consistent thread. Remove incompatible elements trade loops, like juggling tracks on a 3 turntable.

3, 4, 9, 7 Gauge distance, either blend or switch depending on distance.

4 Write bridge with similar materials. Blend into and out of the bridge.

Blend particular elements that are easy to blend, remove elements that are 5, 7, 2, 9, 6 difficult.

8 Evolve rhythm and timbre while cross-fading.

2, 6 Strip back, or use a breakdown in middle

5, 6 Synchronise rhythm first, then evolve other elements.

Figure 38 Summary of overall approaches used by participants.

The most striking feature of the responses to this section of the questionnaire was the diversity of specific techniques considered. The range of responses has been collated and are presented in Appendix E, Figure E-1 and E-2.

Smoothness and effectiveness of the morphs

The judgements of smoothness and effectiveness were mixed, however, most participants did not correlate the two. The human composed morphs were mostly considered to be the more effective, but also the least smooth. The LEMorpheus morphs were typically the opposite, however, there were notable exceptions to this, particularly in examples two and four, where the comments regarding effectiveness were evenly divided.

Some confusion regarding the definition of smoothness was evident for some participants and two different notions of smoothness emerged:

249 1. Continuity in mood/feel, so that subtle changes occur regularly throughout the morph, each one being a mild augmentation that maintains some aspect of mood or feel. (the expected definition)

2. Continuity in the level of satisfaction within the listener, as brought about by expectation and fulfilment. In this case, a dramatic change can be ‘smooth’ if it is somewhat predictable, well-timed and framed appropriately. (an unexpected definition which confounds appreciation with smoothness).

Most participants adopted (1), for example, participant seven on the human morph of the first example:

“I have only rated this morph as 5 on smoothness- but I would give it 7 for effectiveness- the drop into the new beat is not really smooth- but it's perfect. So smoothness is only required when appropriate.”

Other participants adopted (2), for example, participant two on the human composed morph of the first example:

“… more conscious focus on establishment than too many changes… works better for me, seemed smoother.”

Sometimes the understanding of smoothness was blurred. For example, when participant five described a change he would make to the human morph of example one:

“I would drop the sequence from approx 18 seconds through to 21 seconds. This bar does not add enough transitional changes for an aesthetically smooth transition. Although the transition is smooth musically, it feels overly cautious.”

From the musical context, it is clear that “aesthetically smooth transition”, refers to (1), while the second use of the term, “smooth musically”, refers to (2). The assumption appears to be that (2) is good and both (2) and (1) is better.

Occasionally, an insightful comment was made regarding the nature of the two forms of smoothness, for example, participant one on the second example:

250 “It seemed like a more human composed transition because each section was internally coherent (in the main) and morphing "attention" was directed to different tracks at different times, rather than to all tracks at the same pace.”

This explains an important compositional mechanism behind the two different approaches to smoothness.

In the majority, (2) was necessary for the morphs to be judged as effective, while (1) was enjoyed more as added interest, however, some participants preferred (1) despite absence of (2). The quantitative evaluations of smoothness cannot be used, due to the statistically insignificant sample size and the above terminological confusion. Tables in Appendix E (Figures E-3 and E-4) condense all judgements of smoothness and effectiveness. In these tables, the original definition (1) is default and where the respondent appears to have meant meta- smoothness (2), it has been annotated as such.

Perceived changes and responses

Participants were asked for subjective responses to important changes, with musicological justifications. I classify the changes as either “variation” or “layering”, where variation involves note adjustments and layering involves the introduction or removal of parts. There were a variety of reactions, with most preferring human composed layering techniques rather than LEMorpheus variations. Despite this, particular instances were popular and there was a great mix of opinion. Condensed results are available from Appendix E (Figures E-5, E-6, E-7 and E-8).

In the LEMorpheus morph for the first example, the most notable variations were in the rhythmic changes from four to seventeen seconds. For some, the variation was interesting foreshadowing or – in combination with phrasing, texture and timbre – acted as a bridge. For others, it was unexpected, abrupt, or the result of incompatible beats. One view was that it was too predictable; however, it is unclear if this referred to the key changes or the rhythm. Most participants had positive reactions to variations in tonal parts, in particular, the bass.

In terms of layering, the entrance of new material induced a negative initial reaction in most people that became more positive over time. This included the cowbell sound early on at three seconds, dissonance between bass and pads at six second and others. Occasionally, entrances were appreciated, in particular, the bass note that enters at seven seconds. Shifts in focus were also observed, for example, from sound effects to the accompaniment at eleven seconds.

251 In the human composed morph for the first example, ‘layering’ changes were predominant, for example, the clear cut at fourteen seconds was noted by almost all participants. Most thought it worked well, either because it highlighted the organ, brought the music back to a clear tonal centre or had clear rhythm. Others disliked the organ. However, by twenty-one seconds, many felt the break was too long. Some disagreed, describing the timing as “perfect”. In terms of variation, the dissipation of high frequencies early-on at three seconds foreshadowed change for some. Another comment was that the change in bass at eight and ten seconds, without the change in lead, made the tonal centre too ambiguous.

The perceived changes in the other examples, with the exception of example four, highlight similar differences between the TraSe and human composed morphs and it is unnecessary to reproduce them here (see Appendix E, Figures E-5 to E-8).

For the LEMorpheus morph of example four, a notable variation was the upwards to the marimba at six seconds. Some viewed this as a worthwhile and interesting key change, hinting of future variations. Others viewed it as a jumbly, conflicting key change. Another important change was the simplification of marimba and percussion at twenty-three seconds. Most people felt the simplification effectively relieved the metric tension and resolved the target bassline, while one participant stated that the harmony was too simple.

For the middle section, those who focused on the rhythmic section appreciated the fills that were used to punctuate changes, or conversely, felt that more punctuation was needed. Changes in density were also noted, for example, the thinning of rhythmic elements at seventeen was perceived as a logical progression, while the dense drums at thirteen appeared as a signal to change. Comments regarding the marimba melody were that it was an effective bridge, natural, effective combined with the bass.

For the human composed morph of example four, an important variation was the marimba at eleven seconds. One participant viewed the pitches to be random and others observed the change to be pleasing and natural, despite being sudden. Another noted the effective synchronisation of rhythm. The marimba and bass at seventeen to eighteen seconds was viewed by some as being clunky or too late. For others it was a pleasing riff that revealed the beat and highlighted the new pattern. The stripping back of rhythm and convergence of marimba and bass at twenty-two seconds was also noted by many, who felt this was a positive change and release of tension. One participant felt that this point was too repetitive and boring. Other observations of changes included the auxiliary percussion fills at seven seconds, which two respondents felt destabilised the rhythmic focus or was jarring, and the shift in focus brought by the off beat synth hocketing with the drums at fifteen to sixteen seconds.

252 To summarise, there appears to be a diversity of opinion, with positive and negative views on a particular change often being held by an even mix of participants. This often seems due to the fact that different participants are listening to different parts in the music or that they have different levels of tolerance to change. Despite this, differences of composition style between LEMorpheus and the human composer were evident: variation was a feature of LEMorpheus morphs, while skilful layering was demonstrated by the human composer. However, this distinction did not exist for the fourth example where the source and target timbres were the same.

Real-world applicability of morphs

Almost all participants considered the LEMorpheus morphs to be applicable to both computer game and electronic dance music context. The worst result was from participant seven who was unsure in the case of the first two examples and thought the third example would be applicable to computer games but did not think it would be applicable to the dance music. Participants eight and nine showed a greater tendency to be critical, however, they appeared to be more critical of the human composed morphs than the LEMorpheus morphs. These results are reproduced in the following table:

Applicability of each morph to computer game / dance music Participant 1L 1H 2L 2H 3L 3H 4L 4H one y/n y/y y/m y/y y/y y/y y/y NA two y/y y/y y/y y/y y/y y/y y/y y/y three y/y y/y y/y y/y y/y y/y y/y y/y four y/y y/y y/y y/y y/y y/y y/y y/y five y/m y/y y/y y/y y/y y/y y/y y/y six y/n y/y y/y n/n m/m y/y y/y y/y seven m/m y/y m/m m/y y/n y/y y/y y/y eight y/y n/n y/y m/m y/y n/n n/n y/y nine y/y n/n y/y y/y n/n y/y y/y n/n

Figure 39 Judgements of whether the morphs are applicable to the real-world contexts of computer games or dance music. Responses were originally in natural language, but have been condensed for display within this table. They are in the form: computer game/dance and can be y-yes, n-no or m-maybe.

Modifications to the morphs

253 Participants were asked to describe how they would improve the examples they had listened to, so as to provide compositional techniques that could be incorporated into future algorithmic developments. While some trends were clear, opinions varied wildly, with some advocating complete overhauls and others suggesting that no improvements should be made – both to the LEMorpheus morph and the human composed morph.

With LEMorpheus morphs, most participants suggested that, rather than attempting a continuous morph, the changes should progress in discrete segments. As well as this, it was often suggested that the morphs should be “thinned” out.

With the human composed morph, feedback included advice on the timing of changes, the addition of extra layers, removal of specific segments within layers. More often than with the LEMorpheus morph, it suggested that no change was needed.

For both the LEMorpheus morph and the human composed morph it was occasionally suggested that the morph length should be increased. A detailed collation of these results is in Appendix E (Figure E-9, for the LEMorpheus morphs, and E-10, for the human morphs).

7.6.4 Discussion: musical controversy

While some trends have emerged and are discussed above, the most striking feature of the questionnaire was the levels of disagreement, with the most polarised responses to the LEMorpheus morph occurring in example two. The complex key modulation and non-standard integration of rhythm from the target seems to have been the cause for most of the controversy. Some relevant quotes are included below:

“Excellent - because of the complexity of the tracks, the highly skilful melding of the source and target was a joy to hear. It was a very deliberate transition in moods, and almost evoked a story in itself as the listener was transported from a relaxed environment with the potential for disaster directly into a schizophrenic sound-scape. Very worthwhile.”

- Participant five, responding to the LEMorpheus morph for the second example.

254 “The key changes is REALLY interesting with the bass. Intense change, uplifting, then dark then resolute into the new key.”

- Participant nine, responding to the LEMorpheus morph for the second example.

“Blechh!! Intense negative. Not effective, an apparently random substitution of elements.”

- Participant four responding to the LEMorpheus morph for the second example.

Conversely, the human composed morph for the same example seems to have been generally regarded as being effective or at least adequate by all participants. Particular criticisms were targeted at specific aspects of the morph, as contrasted with the less specific negative responses to the LEMorpheus morph:

“o.k. but not brilliant, the source and target loops were out of sequence with each other - i.e. if you imagine each loop as having groups of four bars, logically bar one of the source would overlap with bar one of the target - this didn't happen, throwing my sense of keeping in place, and making me notice the changeover in a negative manner…. It felt like a mistake that the creator failed to recognise and so didn't fix.”

- Participant three, human transition for the second example.

One explanation for such differences in opinion is that participants were listening to different layers and elements and thus based their opinions on specific features, which are more or less pleasing in general. This is supported by the data gathered from the ‘perceived changes and responses’ section of the questionnaire, where, in certain cases, it becomes clear that the listener is tuning into a particular part, for example, the bass. Another explanation is the varied stylistic preferences of the participants.

7.6.5 Conclusion of formal qualitative evaluation

The formal qualitative evaluation investigated morphing between electronic music loops of various styles. The method to create the music was to manually compose four source and target examples, generate morphs between them using LEMorpheus, and hire a composer/producer to

255 create benchmark morphs. The questionnaire asked for: subjective responses regarding the source and target and morph examples; assessments of smoothness and effectiveness of the morphs; subjective responses and analysis to particular changes in the morphs; opinions as to the real-world applicability of the morphs; and possible modifications to the morphs. The background of the participants and additional feedback was recorded at the end.

The results and analysis from the questionnaire fulfilled the objectives:

1. Elements of LEMorpheus morphs that were perceived mostly to have a positive impact were mostly the tonal variations, while the note clutter and lack of structure had a negative impact. In contrast, the human composed morph had clear, structured changes which were more widely appreciated.

2. Techniques for morphing were gathered, and are summarised above (Figure 38). Popular approaches were to blend similar elements and remove dissimilar elements and gauge the dissimilarity of source or target before deciding to blend or switch.

3. There was found to be very little correlation between smoothness and effectiveness. For most participants, effectiveness was a priority and smoothness was of secondary consideration, although, for some, the interest generated by smooth morphing appeared to eclipse any deficiencies.

4. There was a near consensus of opinion stating that the morphing techniques would be applicable to games or live electronic music performance.

Future improvements to the web questionnaire

A number of ideas for improvements were generated and they are summarised here. These would make the process more economic, increase the possible scope of the topic and would be fairly easy to implement:

o Use four smaller questionnaires that can be completed independently and count how many people have completed each. For new participants, the questionnaire with least completions is provided. This would have enabled the sixty or more participants who only partially completed the survey to contribute.

o Use a preliminary focus group to ensure the source and target material is mostly well-liked and credible. In the questionnaire, if they happen not to like the source and target material, ask them to put themselves in the position of someone who does like

256 it, so as to ensure some kind of useful analytical data is still generated.

o The example with source and target of the same timbres should be shifted to be prior to the other examples with different timbres, to minimise the possible exaggeration of perceived similarity for this example.

o Include a continuous online forum, focusing on the changes section of the questionnaire.

o Incorporation of qualitative research software such as NVIVO (Richards 2007) to analyse large amounts of qualitative data through semantic association networks.

Ideas for related questionnaires

The results also beg a number of questions which could be addressed in various divergent questionnaires:

o Develop a questionnaire that exclusively examines the creation of hybrid music, rather than hybrid transitions.

o Morphs with the same timbre only. As part of this, ask the participants to rate how different the source and target music is, in order to find the limits of our ability to perceive dissimilarity when restricted to the same timbre.

o It would be sensible in subsequent studies to include a more even balance of female and male responses, rather than a single female.

o Investigation into what kinds of sequences and sounds can work as structural cues, so that an algorithm could be used to generate new cues according to the fundamental principals of what is a good cue.

o A study focused exclusively on rhythmic integration techniques, due to this important element currently being relatively deficient within LEMorpheus.

7.7 Extensions

The ‘holy-grail’ for TraSe is to define a comprehensive set of transformations that are able to turn any particular source into any particular target in any specified number of steps, without incorporating data explicitly from the target. While this is a difficult and possibly unreachable long

257 term goal, some clear possibilities for improvement have become apparent nonetheless.

Possible extensions to the TraSe morph include: note thinning, note clustering, changes to the transformation chain, structural morphing, automation of parameter settings and optimisation of the TraSe algorithm. Note thinning is required to reduce clashes. Clustering notes into phrases would assist morphing of long and dense note sequences. New transformations and strategies for the ordering of transformations could increase the musical possibilities and reduce the number of frames generated. High-level musical structure could improve coherence, particularly in examples that do not afford smooth, continuous morphs. Automatic adjustment of parameter settings would allow the user to find a greater range of usable morphs with less effort. Finally, optimisation techniques could reduce the time complexity of TraSe from , potentially , which would enable greater realtime interactivity.

7.7.1 Note thinning

Note thinning would involve the analysis of note sequences before they are played, and removal of notes such that ‘clashes’ and ‘muddiness’ is reduced. Such problems contributed to negative responses in the formal evaluation.

Note thinning is not a trivial problem, however, some basic approaches that may alleviate the most noticeable clashes include:

o Removing notes in response to above average note densities.

o Removing or reducing the loudness of notes that overlap or are very close to other notes, even if they are in different parts.

o Constraining the notes within an abstract metre or tonality that is derived from the source and target, weighted on the morph index.

These suggestions may increase the coherence of the morphs to some extent, however, more sophisticated approaches may ultimately be needed, such as automatic mixing. In regards to mixing, human producers have a clear advantage over the morphing algorithm.

7.7.2 Note clustering

Note clustering would involve the automatic segmentation of the note sequence into phrases or ‘clusters’ of notes. Clusters in the source and target could be paired up and the TraSe algorithm would generate a separate list of frames for each phrase pair. This would overcome the

258 difficulties in applying large-scale transformations, such as rate, to note sequences that are long and dense. However, additional care would have to be taken to ensure the phrases are transformed in such a way that they do not clash with other phrases.

Phrase boundaries could be established through temporal proximity, metric position, tonality and the contours for pitch, dynamic and duration; leveraging previous research (Desain, et al. 2005; Lerdahl and Jackendoff 1983). Simultaneous streams of pitches may also be grouped into separate phrases (Bregman 1990).

7.7.3 Transformation chain

A number of transformations have been conceived that may be added to the chain and thus increase the effectiveness and capability of TraSe. Some ideas for new transformations that could be developed:

o Differently (perhaps dynamically) shaped contours for rate, phase, scale pitch and the others, so that changes occur in different amounts to different parts of the sequence, rather than uniformly across the whole sequence.

o A rate transformation that retains the original pitch contour.

o A polyphonic mode for add/remove that deals with vertical groupings rather than individual notes.

o Max Reger’s techniques for modulation (1903).

o Retrograde (reverse order of notes).

o Inter-splicing of notes.

o Modification of the note onsets using groove quantisation (Chokalis 1999).

o A transformation to change the tonal strength through ‘in-scale’ and ‘out-of-scale’ pitch information and harmonic substitution rules.

Additionally, using all possible orderings of the transformations rather than a single linear chain, would increase the range of possible outcomes and results. However, this may impact the speed of the algorithm. If is the number of transformations, there are different possible orderings. With the current eight transformations, this would mean a total of . This is quite large and certain to have a noticeable impact on the speed of the algorithm, however, it is constant and

259 thus of much less significance (in terms of time complexity) with respect to the number of notes in source and target, which is variable.

Lastly, the current approach to transform speed works well when there is only one ‘near perfect’ transformation (for example, key/scale morph), however, there are currently many, ‘far from perfect’ transformations in the chain. It should be possible to further augment the calculation of the target dissimilarity to account for the other transformations in the chain, or somehow combine all transformations into one. This may also naturally reduce the extent of dramatic changes in the beginning of the morph.

7.7.4 Structural morphing

As is evident from the formal evaluation (7.6), the current techniques for structuring and layering the music could be improved. This would include components that:

o Determine if the source or target are dissimilar enough to warrant a structural approach, or similar enough for continuous variation.

o Determine the most appropriate points for a layer addition, removal or switch.

o Add cues that foreshadow changes, build expectation for change and reinforce the validity of the change when it occurs.

7.7.5 Automatic adjustment of parameters

An algorithm that automatically adjusts the parameters of the TraSe morph to find configurations that produce a minimum of frames is needed. This will increase the musical possibilities of the algorithm, as well as making the process more efficient for the user. The search method would be ‘iterative-deepening’. That is, searching for a parameter configuration that converges in one frame and, if no convergence is found, searching for convergence in two frames and so on.

Another useful automatic adjustment would be to change the behaviour of add/remove in response to notes densities. For example, with high note densities, increasing the number of add/remove cycles or enabling add/remove to operate on larger phrases rather than individual notes would reduce the problem of over generation of frames from dense source and target sequences.

Beyond this, there is potential for a machine learning algorithm to analyse a database of human composed transitions or variations and extract compositional transformations, parameter

260 configurations and parameter weights.

7.7.6 Optimisation

At , the current system is inefficient, however, it could be reduced to and possibly to . This would enable TraSe to compute morphs in realtime, which would enhance the level of interactivity.

The bottleneck is the add/remove transformation (7.3.8) and the corresponding Nearest Neighbour (NN) dissimilarity measure (7.3). Add/remove is , because it computes the addition of each note in the target and removal of each note in the input. Because each of the addition and removals is being compared with using the NN dissimilarity measure, which is , we obtain .

Reducing add/remove complexity

Add/remove could be reduced to , using dynamic programming. A matrix, let it be , holds the distance measurements between each note of and during calculation of NN dissimilarity, :

fwd T bwd T

T[1] T[2] T[3] T[4] T[1] T[2] T[3] T[4]

I[1] d(1,1) d(1,2) … … I[1] d(1,1) d(1,2) … …

I[2] d(2,1) … … … I[2] d(2,1) … … … I I

I[3] … … … … I[3] … … … …

I[4] … … … … I[4] … … … …

I[5] … … … d(5,4) I[5] … … … d(5,4)

Figure 40 A table for NN distance between , which has notes, and , which has . The iteration for the forward (left) and backward (right) calculations are shown by the red arrows. Example NNs are highlighted in yellow.

This initial operation is , however, by using the matrix , calculating the change in NN distance when adding a note to from or removing a note from can be reduced to .

261 (the average NN dissimilarity between the original input and the target) has already been calculated, all we need to do is subtract the distances of NNs that are no longer nearest (for a removal) and add the distances of the new NNs (for an addition).

Reducing the complexity of add

Let be an index to of the range (where indicates the length or cardinality of the sequence) and let be a function that adds a note to a note sequence. Let be the dissimilarity matrix between the notes in the target, , and the input sequence with added to it. Therefore, is used for the calculation: .

fwd T bwd Target

_ T[1] T[2] T[3] T[4] _ T[1] T[2] T[3] T[4]

A_2[1] d(1,1) d(1,2) … … A_2[1] d(1,1) d(1,2) … …

A_2[2] d(2,1) … … … A_2[2] d(2,1) … … … A_2 A_2 (I+ 0 (I+ 0 T[2]) A_2[3] … … … T[2]) A_2[3] … … …

A_2[4] … … … … A_2[4] … … … …

A_2[5] … … … … A_2[5] … … … …

A_2[6] … … … d(6,4) A_2[6] … … … d(6,4)

Figure 41 Finding the NN distance between and with a note added from (the note in this example) requires one operation for forward (left), and for backward

(right). A_2 is an abbreviation for “ with the note from added”. Grey squares are distances that are subtracted, while squares with a red outline are distances that are added. The number of squares with red arrows going through them indicates the number of operations required.

The forward calculation for with a note from added, , is one operation. has already been calculated and the NN will be itself, thus having a distance of . However, a renormalisation is required, due to the increased length. We have:

Equation 6 Forward term simplified for addition of a single note, , from .

262 For the backward calculation, recall that is a matrix that holds the dissimilarities between each note in (rows) and (columns). Let be an index to and the columns of the matrix , ranging over . For the backward term, we can compare the distance of the NN, , with the new distance. If the new distance is larger, the old NN remains; if it is smaller, the new NN is taken. We have:

Equation 7 Backward term simplified for addition of a single note from .

Therefore, the time complexity of the NN dissimilarity measurement for add can be reduced from to .

Reducing the complexity of remove

The complexity of calculating remove is also reducible to . Let be an index to the input sequence, , that ranges from . Let be a function that removes a note from a sequence and let be the matrix that holds the dissimilarity measurements between and .

263 fwd Target bwd Target

_ T[1] T[2] T[3] T[4] _ T[1] T[2] T[3] T[4]

R_2[1] d(1,1) d(1,2) … … R_2[1] d(1,1) d(1,2) … …

R_2 R_2 I[2] d(2,1) … … … I[2] d(2,1) … … … (I- (I- I[2]) I[2]) R_2[2] … … … … R_2[2] … … … …

R_2[3] … … … … R_2[3] … … … …

R_2[4] … … … d(5,4) R_2[4] … … … d(5,4)

Figure 42 Recalculating the NN distance for with a note removed (the note in this

example), forward (left) and backward(right). R_2 is an abbreviation for ‘ with the note removed’. The row that is crossed out is the row in that has been removed to make . The number of squares with red arrows through them indicate the number of operations needed. Yellow squares are examples of new NNs, and the grey squares are the previous NNs (continuing the example from Figure 40) that must be subtracted.

The forward recalculation involves subtracting the already known NN distance for the note being removed, , and re-normalising by the reduced length.

Equation 8 Forward term simplified for removal of from .

The backward term is difficult to simplify, yet a more efficient implementation has been conceived. Firstly, the NNs for each note in (the yellow squares in the right of Figure 40) are checked to see if they are were removed (if they were ). If so, the distances between the note in and every other note in , are searched in order to find the new NN. Because each note will eventually be removed, all of the NNs will eventually need to be re-estimated and will always occur, but only once in during add/remove. This is best described in pseudocode (Figure 43):

264

// Inputs: T an array that holds the target sequence of notes k the index of the note being removed from the input, I bwdNN an array that stores the NN information previously computed for each note in the target, T. The index of the note in I that is the NN to the ith note in T is “bwdNN[i].index”. The distance of this NN is “bwdNN[i].distance” Rm is I with the kth note removed (I without I[k]) // Outputs: avd the re-computed average distance of NNs in Rm to the notes in T NND-BACKWARD-REMOVE(bwdNN, k, Rm, T) returns avd FOR(i = 0; i < LENGTH(bwdNN); i++) { IF(bwdNN[i].index == k) { // if it was removed min // stores the new minimum FOR(j = 0; j < LENGTH(Rm); j++) { IF(min > DISTANCE(Rm[j], T[i])) min = DISTANCE(Rm[j], T[i]) } avd += min } else { //if the NN is not being removed, use the old one avd += bwdNN[i] } } avd = avd/LENGTH(T) //normalise } Figure 43 Pseudocode for the backward NN distance, optimised for a remove operation.

The ideas above demonstrate how the current complexity of could be reduced to , however this is still probably not efficient enough to allow realtime operation. However, the complexity may be reduced further still, to , through pruning.

Pruning the NN search space to allow realtime operation

Firstly, the notes would be ‘quick-sorted’ according to onset which is known to be . When searching for the NN of a particular note, notes with the inter-onset distance larger than the Euclidean distance of current nearest neighbour could be pruned:

265

Pitch Pitch

Onset Onset 1. 2.

Figure 44 Reducing the search-space when finding the NN of the blue note amongst the red notes. In this example, the blue note is the first note in the input, , and the red notes are from the target, . This shows the two steps (labelled 1 and 2) needed to find the NN. The note with a grey outline is the note currently being considered. Notes that are shaded out have been pruned because we know that the distance of their onsets alone will be larger than the Euclidean distance ( ) to the current NN (highlighted), because is sorted. The dashed circle shows the area, within which notes will be closer than the currently considered note.

Figure 44 shows two steps of a search for the NN to the first note of , amongst . To start, the distance between and (purely vertical in this case) is used to determine the radius of a circle around . Any note outside this circle is not the NN. Since has been sorted, all the notes ahead of will have a greater start time than it (of course, notes with the same start time would also be considered). All notes with the inter-onset greater than , can be pruned. This is depicted in step one of Figure 44 where the for the first note prunes , and . (the lowest note), is not pruned, even though we can see it is clearly outside the circle. In step two, is pruned because the distance of to the next note, , produces an smaller than the inter-onset to .

The example shown above demonstrates best case scenario. However, if the note we are looking for is at the end of the sequence, rather than the beginning, no significant save on complexity would occur. However, the binary search is and it is likely that this could be combined with the pruning technique.

Other worst case scenarios include extremely vertical music:

266

Pitch Pitch

Onset Onset A. B.

Figure 45 Worst-case scenarios: A. If sorting by start-time, there would be no pruning in ‘vertical’ music. B. If sorting by pitch is also considered, the worst case scenario is a sequence of diagonal notes.

In this case, each note needs to be examined because their start-times are within . This could be overcome by assessing the ‘verticality’ or ‘horizontality’ of the sequence and sorting according to either pitch or onset, depending on which dimension has the most variation. This technique is promising in that, depending on the music, it could reduce the complexity to . However, a more detailed examination is needed before any solid efficiency claims can be made.

Finally, it is also plausible that even more significant gains could be obtained through note clustering, however, this technique, may interfere with the ability of the algorithm to converge, due to it being non-optimal. Despite this, exact convergence may not be necessary to musically achieve a suitable morph.

7.8 Summary of evolutionary morphing

To summarise, research into an evolutionary approach to compositional morphing has been presented. This included an explanation of the algorithm, the evaluation of it and possible extensions. The evolutionary morphing algorithm, TraSe, was of core relevance to this investigation of automated and interactive compositional morphing of MEM. The TraSe algorithm was partially a formalisation of existing theories, for example, it incorporated known compositional transformations such as rate, phase, inversions and harmonies; it used known music representations such as scales, keys and notes; and it modelled the trial approach to composition. However, by combining these into a unique morphing algorithm, TraSe is also an exploration of new aesthetic possibilities. Thus TraSe satisfies the two key elements of the

267 research objective: formalisation and exploration.

To summarise the TraSe process: the source music is put through a chain of compositional transformations repeatedly until it matches the target. The result from each iteration of the chain becomes a frame, which is a discrete musical state that could be played during the morph. The series of frames that are generated constitute the morph, progressing from the source and eventually becoming the target. To guide the parameters used in each transformation, dissimilarity measures are used to compare various potential outputs with the target and, depending on user parameters, the candidate which is judged to be most similar to the target is selected as the output for that particular transformation.

Informal evaluation of the TraSe algorithm covered musical analysis and automatic testing. The analysis provided some sense of how the various parameters in TraSe can be used to influence the musical outcomes. Small differences in the first few frames of the morph can have dramatic effects on the frames towards the end, highlighting the emergent nature of the algorithm. On the level of key/scale morphing, the TraSe parameters have a more direct influence on the results and common key modulations can be synthesised by searching for the appropriate parameter combination. The key/scale morphing algorithm can also be used to find unobvious but logical progressions from the key and scale of the source to that of the target.

TraSe was automatically tested using a set of randomly generated source-target patterns, with the goal of converging in the minimum number of frames. The “minimum frames” heuristic is a desirable goal, because when the morph must be of a particular length, a series of frames which exceeds that length is difficult to map coherently into that length. From automatic testing, it was found that while sometimes the TraSe morph is more effective than straight-forward addition and removal of notes, it was generally not and would sometimes fail to converge even after one hundred or more frames. Despite this, the test could be improved by using a database of more realistic source and targets, rather than randomly generated ones.

In the formal web-questionnaire evaluation, human composed source and target material was used, human adjustment of TraSe parameters allowed and the evaluation involved empirical qualitative feedback from a group of people with musical listening skills. Particular morphs were judged by some participants to be extremely novel and musically innovative. As well as this, the music generated by TraSe was considered overall to be applicable to real-world applications such as computer games and electronic dance music. Despite this, the human composed morphs that were created from the same source and target material and used as a benchmark, were regarded as more appealing overall. Participants often differed in opinion because of different foci of attention at particular points in time and differing musical expectations.

268 Researching evolutionary approaches to compositional morphing of MEM has only just begun and there are many possibilities for future work that would yield substantial gain. This includes note thinning, to avoid muddiness and clashes; note clustering, to allow musical phrasing; new transformations, to extend the range of compositional capabilities; automated layering, to allow higher level structural features such as break-downs; and automatic adjustment of parameters, which would uncover less obvious – but workable – settings and reduce the time spent by the user. As well as these, some ideas for optimisation of the algorithm have been detailed, which could allow truer realtime operation.

Overall, a novel compositional morphing algorithm has been developed that is applicable to electronic dance and computer game music contexts, despite being considered less effective than a human composer. Having explored parametric, probabilistic and evolutionary approaches to compositional morphing with increasing detail and musical success, the research will now be brought to a conclusion.

269 8 Conclusion

In conclusion, the primary research objectives have been fulfilled. Interactive compositional morphing has been investigated as a new approach to enhance adaptivity. New aesthetic possibilities enabled by compositional morphing have been explored. I will end with a summary of the research that has been completed (8.1), some demonstrations of potential applications of the work (8.2), directions for future research (8.3) and some final remarks (8.4).

8.1 Summary

The first chapter introduced the concept of compositional morphing and explained the motivating factors behind its use. The goals and methods were clarified and the key knowledge contributions were summarised, followed by a description of thesis structure.

The second chapter situated the thesis within the musical context of Mainstream Electronic Music (MEM) and reviewed musical practice for instances of morphing, both within MEM and within other musical contexts. MEM is the genre of choice for this research, that is, the techniques I have developed are designed to produce MEM.

In chapter three I introduced a new framework for discussing algorithmic music systems and reviewed various fields of algorithmic music in order to gain a sense of what would constitute novelty and also to inspire the research. The framework distinguishes between algorithmic music systems and simple music algorithms. Algorithmic music systems were conceived as composer agents which could take one of three approaches: trial, which involved trial and error; heuristic which involved rule estimation; and abstract, which involved mapping. Simple musical algorithms were described according to their musical function and the contextual breadth. The function continuum ranges from analytic, transformational through to generative. The contextual breadth is the amount of surrounding information that influences the computation of the algorithm.

The broad review of algorithmic composition covered composer agents, Computer Assisted Algorithmic Composition (CAAC), sonifications and DJ agents. In a review of interactive music, I examined meta-instruments, jamming agents, adaptive music and interactive installations. Following this, I reviewed note level morphing itself, covering four major works in detail and providing a brief overview of a number of other smaller projects. The potential for growth in the

271 area of note level morphing was highlighted, particularly as only one of the four major works was currently in active development. Most of the major examples of morphing were exploratory in nature and based on the abstract compositional philosophy. This demonstrated some additional room for development of morphing algorithms that embodied the heuristic and trial approaches to composition. In terms of research method, none of the previous morphing research projects utilised formal empirical testing procedures.

Chapter four described the system architecture of LEMorpheus, the software application that emerged from the thesis. The system allows the user to specify note-sequences and meta- parameters for each layer. Various morphing algorithms can be selected and parameters that influence the morph can be adjusted. The morph index is a high level parameter which controls how far the morph has progressed. Other parameters can be used to control structural aspects for each individual layer of the morph. Pitches are represented by a combination of scale, key, scale degree and passing note combined with octave. Aside from note onset, rhythm is dealt with via transformations – quantise, shuffle and loop. The software infrastructure is designed on the principle of extensibility and the system can be adapted for a range of applications. MIDI output is achieved through periodic polling of the morphing algorithm to generate the next segment of the note sequence, typically at quarter beat intervals. In a single cycle, the current beat is updated, meta-parameters are interpolated and morphed note sequence data for that cycle period is generated. Transformations to the generated note sequence are applied and the tonal representation is converted to MIDI pitch before the sequence is sent out to the synthesiser.

Chapter five explained the parametric morphing algorithm, which was the first note level morphing algorithm that was developed. It is parametric in the sense that abstract parameter envelopes are the primary representation for musical data. The morph is created by weighting the envelopes of the source and target according to the morph-index and combining them. The parametric morphing algorithm is only really effective when presented with examples of the source and target that are fairly similar.

Because the output of the parametric morphing algorithm was found to be so clearly lacking, formal testing did not seem necessary. There were a number of ideas for improvements to the parametric morphing algorithm, including re-implementation of Mathews and Rosler’s self- synchronising function (1969), higher-level musical representations and phase offset detection. However, particularly as parametric morphing had been examined in the past, it was considered better to pursue more novel approaches.

272 Chapter six detailed a probabilistic approach to morphing, called the Markov Morph. Each play cycle, the Markov Morph randomly selects either the source or target, weighted on the morph index, so that there is more probability of selecting the source during the start of the morph and the target towards the end. A segment of the recent note output history is then compared to each segment of the same size in the selected source or target to create a similarity matrix. The similarity matrix is used as a probability distribution to predict the next note. A number of parameters can be changed to influence which notes are more likely to be selected, for example, increasing the weight of Circle of Fifth space in similarity judgements means that pitches separated by a fifth will be more similar. I demonstrated the influence of various parameters through some informal tests of the algorithm.

The Markov Morph was examined with a formal focus group, primarily to gain feedback on the questionnaire and research method but also to examine how people responded to different morphs, such as weighted selection and an early version of the Markov Morph. In these trials, morphing was benchmarked against simple cross-fading. The results were fairly unreliable, however, many ideas for improvements and feedback were collated and incorporated into subsequent formal tests.

Part of the evaluation was a ‘focus concert’ where the morphing algorithms were tested in a controlled (but at least semi-realistic) context. The Markov Morph algorithm was benchmarked against a human DJ who mixed the specified source and target material. The results indicated that the Markov Morph was at least competitive and, in some cases, the preferred choice, particularly with extended morphs. Obviously, a limitation of note level morphing is that it would require access to the MIDI sequences and synthesisers that were used to create the original tracks. The data generated from the focus concert was, on the whole, methodologically sound, however, some useful ideas for improvements were given by participants, for example, benchmarking against a human composer/producer rather than a DJ.

Chapter seven detailed a third and final morphing algorithm called TraSe which used evolutionary processes to generate the morph. Before playback, TraSe iteratively transforms the source until it becomes the target and the result of each iteration is a single frame in a whole series from source to target. During playback, the value of the morph index determines which of these frames are being played. This system design allows for a number of different compositional transformations to be explicitly incorporated into the algorithm and given more or less weighting by the user, thus allowing a substantial degree of musical control.

273 There are currently eight transformations through which, at each iteration, the note sequence transformed by the previous iteration is passed through as a chain: divide/merge, rate, phase, harmonise, scale pitch, inversion, octave and add/remove. Each of these have a number of different parameter configurations that produce different transformed results. The results are all compared to the target and ranked according to their measured dissimilarity with the target. A target dissimilarity is established through a user-defined parameter which controls how fast the morph will converge and the transformed result that matches the target level of dissimilarity is selected as the final output for that transformation.

The same process is executed on the key and scale to determine a logical progression of key modulations or chord sequences. Because the whole key/scale space is covered, the TraSe process for the key/scale morph is more controllable than the TraSe process for note sequences.

Informal testing of TraSe highlighted the emergent properties of the algorithm by generating morphs that differed quite markedly, despite only small changes to the morph parameters and no changes to the source and target. Playing the frames that were generated by TraSe in reverse order was informally evaluated. Forward morphing appeared to exhibit a more natural musical structure. Playing the morph in reverse order appeared less natural, as the dramatic change occurred at the end of the morph, a time which should mark some return to stability. The range of control for key/scale morphing was demonstrated through a number of examples that explored various flavours of morphing between C Major and the distant key of F# Major. The ability for key/scale morphing to generate Jazz-like chord progressions was also demonstrated through modulation from C Major to A# Major, with a harmonic function change and ii-V-I ‘turnaround’ in the new key.

The time complexity for TraSe was confirmed empirically by testing source and targets samples with increasing numbers of notes. A comparison between TraSe morphing with the add/remove transformation only and TraSe morphing with all eight transformations on default settings was executed with fifty samples at different numbers of notes. It was found that all eight transformations can sometimes lead to convergence in less frames than is possible with add/remove alone. Despite this, the typical result of applying all eight transformations on default settings is many more frames than add/remove alone. The results may be improved with tuning of parameters and the use of ‘musical’ rather than ‘random’ source and target material.

Formal evaluation of TraSe was conducted through a qualitative online questionnaire, where participants could play through musical examples and answer a range of questions pertaining to

274 them. All participants were musically active, except for one who was an avid listener. The morphs were benchmarked against a human composer/producer who was asked to create a hybrid transition between the same sets of source and targets that were used by TraSe to generate the morph.

Participants were asked to verbalise a subjective response to the source and target, and list ideas regarding how they would approach the task of morphing between them. For each morph example, including those composed by the human, the participants were asked to provide a subjective response and discuss the smoothness and coherence of the morph. They also commented on important changes that occurred, noting the exact time and describing the effect that the change had on their listening experience. Lastly, the participants were asked whether or not the morph they heard would be acceptable to most people if it were applied to real-world contexts such as computer games or electronic dance music.

The morphs generated by TraSe were considered competent and sometimes perceived as innovative, however they were more often criticised for deficiencies in structural clarity and coherence. In contrast, the human composed morphs were more acceptable overall, although were never viewed as particularly innovative and sometimes were considered to be too plain.

Potential extensions to the TraSe morph algorithm include: note thinning, note clustering, new transformations, new combinations of the transformations, automatic adjustment of parameters, layering and higher level structural control of the music and optimization of current process. New combinations of transformations would provide many new possibilities, rather than the single linear chain of transformations that is currently employed. Automatic adjustment of parameters would allow all the morphs that converge in a practical number of frames to be found with little effort. An algorithm which can determine when and how to perform dramatic changes, such as breakdowns, would add some realism and a clearer sense of purpose or coherence to the music generated. Optimisation of TraSe would afford greater realtime interactivity, as the morph would not need to be precomputed.

275 8.2 Demonstrations of potential applications

The morphing software has been trialled in three different contexts: concerts, installations and a computer game. These demonstrate how the morphing techniques that were developed can be applied within real-world contexts. The concerts show how the morphing software can be used as a meta-instrument, with the high-level control of the morph index facilitating a comfortable performance. The interactive installations show how the morphing algorithms can be the basis of an engaging social musical interface. The computer game application demonstrates how morphing can automatically adapt a musical score to suit changes in the game and generate new material.

8.2.1 Concerts

During stage three, some morphs were created for a lunch time concert at the Queensland University of Technology and were also used later for the onsite concert at the Australasian Computer Music Conference (ACMC) 2005. This is included here (~8.1, ~8.2).

At this stage in the development of TraSe, key/scale was not a separate representation that could be morphed independently of the note data. Instead, two compositional transformations, transpose and mode-lock were included within the chain of compositional transformations so as to enable changes in key and scale. This is noticeable in some of the more drastic changes that occur within the music.

At Earpoke, the offsite event for ACMC 2006, a more improvised use of the morphing software occurred using some of the material developed for the online questionnaire. For this event, the separate key/scale morphing was used. The software was used alongside live saxophone, boxo, and clapping.

8.2.2 Interactive Table Installation

A tabletop interface has been developed that demonstrates the morphing algorithms being applied to the context of interactive installation/toy/instrument. It has been showcased at four events: Sound Polaroids at The Brisbane Powerhouse, Bowerbird at Substation Number Four, as part of Richie’s Electronic Playground at the Peats Ridge Festival and the opening of the Computational Arts Research Group (CARG) at QUT. The gigs at the substation and the Peats Ridge Festival were almost completely free of technical problems and were accordingly the best

276 demonstrations of the technology. Overall, the morph table performs well, particularly in terms of levels of engagement, musical acceptability, meaningful interaction and social participation.

Figure 1 Pictures from the Morphing table installation

The table is a flat transparent surface upon which four different glowing cubes can be placed. Each of the cubes relates to one of four different parts: drums, bass, lead, pads. The cubes are tracked by a webcam and the locations of the cubes are sent to LEMorpheus. When a cube is on the total left-hand side of the table, the morph index for that part is zero and moving the cube across to the right will increase the morph index for that part continuously until it reaches one at the right-hand edge. Moving the cube away (up, on the screen) will increase the sound effects that are applied to that part in the synthesiser (ReasonTM) – for example reverb, delay or distortion depending on the patch – and moving it closer will reduce the sound effects.

Four sides to each cube each have a different fiducial that represents a different morph for that part that has been loaded up before the start of the event. The two other faces of the cube house the part labels for the cube, for example “Bass”. Each time the cube is flipped, fiducial down, the music for that part is switched to the morph represented by that fiducial. Each of the four morphs has a totally different set of instruments and the source and target patterns for each morph are also on different instruments, which means thirty-two channels, each with a different patch and sound effects are running simultaneously in the synthesiser. To achieve this, LEMorpheus sends out on two MIDI ports.

277 Each cube has a carefully designed cardboard ‘inner-cube’, within which are embedded thirty-six bright Light Emitting Diodes (LEDs), nine for each of the four sides of the cube. The inner-cube has supporting flukes on each edge to hold it in the middle. All of the LEDs run off two AAA batteries, which maintains sufficient power levels for at least two hours.

The table utilises the reacTIVision toolkit developed at the University of Pompeau Fabra (Bencina, et al. 2006) to track the fiducials and send changes to LEMorpheus via Open Sound Control (OSC). A Mac Mini running boot-camp and reacTIVision utilises a wide angle (seventy- two degrees) webcam underneath the table to track the cubes. The locations of the fiducials are sent to through Ethernet cable to an Acer TravelMate 8000 running Windows, LEMorpheus and ReasonTM. The locations are picked up by LEMorpheus, translated into morph indexes for various parts and the resulting MIDI output is sent to ReasonTM via two Midishare (Grame 2004) ports and MIDI Yoke (O'Connell 2003). The sounds are the rendered by ReasonTM and outputted to the speakers which are under the Morph Table.

The table itself was co-designed by Brendan Wright and myself and constructed by Brendan Wright. The light emitting cubes were co-designed and built by Kate Thomas and myself, with electronic engineering guidance from Matt Petoe. Andrew Brown supervised the morph table project.

A short documentary about the Morph Table was made by Conan Fitzpatrick (~8.3).

8.2.3 Computer Game

As a preliminary investigation into the application of morphing for a computer game context, a morph was adapted for the open-source adventure game Beige. In this game, a character walks between various scenes on a map, killing creatures and collecting gems. The classic adventure game genre was chosen due to its topological affordances and small Central Processing Unit (CPU) load. Beige in particular was selected from many other games due to its being open source, fully developed, and able to be easily modified. The mapping that was chosen was the x- axis position of the character’s avatar in the world map. The world consisted of a 6x6 grid of scenes and the morph index was quantised to remain constant across each scene. That is, when the character moves into a new scene to the right, the morph index decreases by , and when the character moves into a new scene to the left, the morph index increases by . Moving up or down has no effect at all. Video documentation of this development, which includes a view of the morph index above the game screen, is included (~8.4).

278 From this demonstration, it is clear that morphing can be used to automatically compose interesting transitional material that relates to an aspect of the computer game. The fact that the music changes subtly in each scene is a new aesthetic that is quite uncommon in most computer-games. The standard is for the same music to be repeated constantly within one whole region (multiple scenes) and then cross-fade to a pre-composed transition when entering the next region – for a composer to create the huge amount of material that would be needed otherwise would be very uneconomic. In the example (~8.4), only two pieces of music were composed (source and target) while the four intervening pieces were generate by the morphing algorithm. Whether or not this is a significant or welcome change and whether or not the particular music generated is suitable is another question that can only be answered in further formal tests or market research.

While this simple study is sufficient for a preliminary investigation, there are a number of deficiencies that would need to be addressed for application to a real game:

o Musical structure vs game-play: when the character shifts into a new scene, the changes need to occur on the beat, not exactly when the shift happens. o Utilising the Y-Axis: currently the morph is only between a source and a target, not multiple sources. o Temporary cues such as the killing of a creature or the collection of a gem are overlooked in the music. o Timing glitches: Java garbage collection occasionally causes a timing glitch. o The high CPU load of synthesis: during this demo, the synthesiser was hogging between 20-40% of the CPU which is a totally unacceptable ‘CPU budget’ for audio in the commercial computer game industry (Brown 2006; Edlund 2006; Sanger 2003).

279 8.3 Future research

For future research into automatic and interactive note level morphing of mainstream electronic music, a number of ideas have been conceived that would be likely to contribute to substantial improvements. I have catalogued these ideas into four primary directions of research: algorithmic composition techniques, production techniques, evaluation methods and applications.

Potential algorithmic compositional techniques include:

o Note thinning. For the TraSe morph this could involve a combination of reducing the number of notes if it is greater than the average in source and target and removing notes that overlap, are very close together or outside of the expected metre or tonality.

o Note clustering/phrase detection. TraSe morph could use a pre-analysis of the material and determine which phrases from the source and target to morph together. In the Markov Morph, phrases could be played through to completion, rather than inter-splicing them randomly.

o New transformations. TraSe morph depends on hard-coded transformations for the range of the compositional decisions that generate the result and therefore new transformations would add greater musical possibilities. For the parametric morphing algorithm, the continuous parameter envelopes which represent the music can each be thought of as a transformation of a fundamental pattern, for example, the inter-onset envelope controls the value for a ‘rate’ transformation, the pitch envelope controls the value for a ‘transpose’ transformation and the phase envelope controls the value for a ‘phase shift’ transformation. Given this, any other transformation that is governed by a continuous parameter could be adapted to the parametric morphing algorithm and potentially increase the appearance of more human-like compositional thought in the morph.

o New or extended note similarity measures. TraSe morph has a number of similarity measures, each of which complements a transformation. The similarity measures could be adjusted to better suit the transformation to which they apply. The similarity measures used in the Markov Morph could deal better with polyphony by using the

280 Nearest Neighbour measure. The Markov Morph should also incorporate other musical dimensions such as dynamic and inter-onset into the similarity measurements. o Combinations of transformations. In the TraSe morph, new methods of combining the transformations could be developed that are more effective. For example, using all possible combinations of the transformations would create a greater diversity in the potential output. This may be computationally quite intensive, so perhaps it could be tested and only a few different combinations which are the most useful are used. o Layering and structural changes. All morphing algorithms developed thus far could benefit with a more natural approach to layering and musical structure. Rather than always attempting a smooth morph on each layer simultaneously, the algorithm should be able to, for circumstances where the source and target material are particularly ‘different’, cut some layers out and add them back, in a way adheres to a musical structure. o Automatic adjustment of parameters. For the TraSe morph, all of the weightings for each of the transformations could be searched automatically, so as to find the various different solutions to the morph that have an acceptable number of frames. For the Markov Morph the ‘contrast’ parameter in particular should be automatically adjusted so that the probability distribution always has a target level of variance. For example, it would be decreased automatically by a certain amount if the note sequence in the seed was foreign to the selected source or target. o Constraints. Within the Markov Morph and the parametric morphing algorithm, rhythmic constraints would add a degree of stability. For example, with parametric morphing, the inter-onset intervals and the onsets could be quantised to those that are present in the source and target. With the Markov Morph, a metric template with weightings for each possible onset could be extracted from the source and target, and the probability of the notes playing on the onsets could be factored by the meter. Another constraint that would be useful within the Markov Morph is to restrict the note generation process to each time a note is played, rather than every frame, so as to avoid stream-loss. o Data-mining. The Markov Morph could incorporate a database of musical material to inform the probability matrices. For the TraSe morph, there is a possibility that compositional transformations could be automatically extracted from a database.

281 o Higher-level musical constructs. As an extreme example, parametric morphing could interpolate between the high level psychological dimensions of intensity and valence (Wundt 1896), which would drive a hierarchy of other musical dimensions such as rate of tonal change, tonal stability, number of independent streams, level of polyrhythm, additive rhythm, syncopation and many others discussed in Chapter Two.

o Automatic mixing via MIDI volumes and control parameters to suit acceptable production values, for example, balancing the parts and reducing muddiness. This would be applied to the system infrastructure rather than a specific morphing algorithm.

Potential new data gathering methods include:

o An online forum style questionnaire, where the aesthetics of the morphing examples are debated continuously and in more detail. This would also better suit the rapid iterative development methodology that is applied to the software. For example, if a particular deficiency emerges from the debate, it could be quickly addressed, until the various aesthetic protagonists are satisfied as much as is practical.

o A music competition where the morphs composed via LEMorpheus are covertly entered alongside a range of human composers’ morphs and judged by an unknowing expert listening panel. All of the pieces would be ranked and data could be gathered from the assessment criteria and comments filled out by the panel.

o Playing the LEMorpheus morphs alongside human composed morphs on a radio station and inviting listener feedback. This style of evaluation has also been employed by Pachet (2005). The advantage is that it covers a wide range of audiences in a realistic context, although, as it is a single event, there is less control and the responses are less able to be directed.

Potential developments relating to applications of morphing include:

o Optimisation. Increasing the time efficiency of the TraSe morph would enable changes to morph parameters and the source and target sequences to become the subjects of interactivity, rather than only the morph index.

o N-source morphing. Morphing between many sources would better suit applications such as computer games where the non-linear narrative to which the music adapts

282 contains multiple themes that should be integrated. This may also better suit Cartesian interfaces into the music, such as the Morph Table.

8.4 Concluding remarks

At a fundamental level, the research utilised known musical processes and concepts from standard music theory and algorithmic music, which was necessary to enable some degree of musical stylistic coherence. These were applied in a novel way to produce successful musical morphing between two distinct note sequences. However, considering the lack of theoretical texts on note level morphing, substantial effort was devoted to the exploration of new aesthetic possibilities. This included new processes that would be near intractable without the aid of a computer, for example, the evolutionary TraSe algorithm.

Perhaps the most interesting findings to emerge from the research were the differences in approach between the human composer/producer and the automated morphing algorithms I created. When confronted with a source and target that seemed incompatible, the human composer used dramatic changes but integrated them in a way that appeared natural. This highlighted the fact that smoothness is not always perceived in terms of moment to moment continuity of the musical surface, but sometimes in terms of how closely the moment to moment expectations are fulfilled – if dramatic changes are expected, they will be perceived as ‘smooth’ changes. This finding is interesting because it implies an effective new form of morphing at the level of musical structure, which has, until now, been largely overlooked within previous research, which tended to focus on continuity of the musical surface.

Another interesting outcome was the polarised response to particular automatically generated morphs – some people perceived them to be novel, innovative and skilful, while others expressed immense dislike. This is extremely rewarding as much algorithmic music so often achieves the exact opposite – a middle-ground that is inoffensive but not exceptional. The fact that at least some people had such an intensely positive response to some of the examples is also rewarding and shows promise that the techniques could be refined to be very successful in the future.

It is also pleasing that the current note level morphing techniques are applicable to real world contexts, as evidenced by the overwhelming agreement from the questionnaire respondents. The occasional remarks to the contrary were balanced by similar remarks pertaining to the human composed transitions. To further demonstrate this, the software was employed in real world contexts at multiple events throughout the duration of the study, including live

283 performances and interactive installations and was well received. Despite this, it should be noted that many technical hurdles must be overcome before more widely accessible commercial applications can be realised.

284 A Glossary of terms

Abstract (compositional approach): involves Circle of Chroma (CC): the circle of all pitch- defining relationships, implicit or explicit, classes, in chromatic order. between different forms of data, both musical and non-musical, in order to explore Circle of Fifths (CF): the circle of all pitch- musical possibilities. classes, in order of fifth intervals.

Algorithmic Music System (AMS): music Coherence (musical): the degree to which algorithm that consists of a large network of the music sounds as though the composer smaller musical algorithms. intended it to sound the way that it sounds.

Analytic (music algorithm): an algorithm Contextual breadth (of music algorithm): the which extracts data with a low musical amount of data available and typically used predisposition from data with a high musical the music algorithm. predisposition. Contrast (Markov Morph parameter): Agent (AI): an automated process, designed parameter which exaggerates the level of for a particular task. discretisation in the Markov Morph similarity matrix. Atonicity: Lack of tonal centre. Degree and Passing notes (DePa): A Capella: Without backing. A Cappella Hip- representation of pitch which includes Hop is voice only rapping and is usually scales, scale degrees and passing notes. intended to be remixed. Depth: see Markov order. Backbeat: the ‘two’ and ‘four’ of 4/4 metre. Earmark: an element of the music which Backwards (in Nearest Neighbour serves as a structural indicator. dissimilarity): the average dissimilarity of each note in the target and its nearest Electronic Dance Music (EDM) neighbour in the source. Effectiveness (musical): the degree to which Beat-class: the class of all beats with the the music is able to affect listeners in a way same onset within a meter or cycle. that was intended by the composer.

Breakbeat (BB): rhythmic style based on Evolutionary (computing): techniques that drum fills with emphasis on variation. involve mutation and selection. Bridge: intermediate section. Fill (drum): rhythmic deviation in the Broad context (see contextual breadth): percussion, often at the end of a cycle.

Cento: a piece (poem or music) composed Fitness function: evaluates which candidates from many segments of other pieces. will be selected from a pool.

Centonization: composition through Foreshadowing: hinting at a future event. recombination of segments. Forward calculation (in Nearest Neighbour Chick-A-Chick (CAC): a rhythmic event dissimilarity): the average dissimilarity of consisting of two strongly emphasised each note in the source and its nearest syncopated beats, each one directly on each neighbour in the target. side of a weakly emphasised beat. Four-on-the-Floor (FF): rhythmic style, characterised by a kick on each beat. 285 generation of notes in realtime from the Frames (in TraSe): a list of note sequences similarity measurements. leading from source to target. Markov order: when predicting the value of a Frame-limit (TraSe parameter): number of future variable from a sequence of past frames at which TraSe aborts. variables, the order is the number of past variables which are used. Function (music algorithm): how the algorithm relates to musical data, on a Markov depth: see Markov order. continuum from analytic through transformational to generative. Mashup: a style of electronic music involving the juxtapositioning of existing tracks. Garbage collection (Java): a system within the Java Virtual Machine which Meta-smoothness (musical): continuity of automatically disposes of unused memory. musical expectations for the listener.

Generative (music algorithm): music Metric Modulation: see temporal modulation. algorithm which generates data with a high level of music predisposition from data with Musical Instrument Digital Interface (MIDI): a a low level of music predisposition. protocol for musical sound synthesis data.

Heuristic (compositional approach): involves Modulus: mathematical function for implicit estimation and application of rules to wrapping numbers within a range. While fulfil a musical intention. input may be negative, output is always positive, as with a clock face. Heuristic: a “rule of thumb”, that is, a possibly sub-optimal, but effective rule. Modulation: see key modulation.

Hypermeter (music): a cyclic pattern of Morph: a section of music that is a hybrid emphasis spanning multiple bars. transition between a source and a target.

Interpolation (morphing technique): see Morph index: parameter controlling the parametric morph. influence of the source or target during the morph. Key modulation: the art of changing from one key to another. Mutation rate (TraSe parameter): specifies how many transformations may occur on Key profile: the set of pitch classes within a each iteration of the transformation chain. particular key, along with weightings which indicate the different strengths of each pitch Musical neutrality: for a music software class. environment to have no stylistic influence.

Macroperiod: from the start of a Narrow context: see contextual breadth. polyrhythmic cycle to the point where the differently sized rhythmic periods converge. No-change variable (TraSe parameter): a weighting for the “bypass” or “no-change” Mainstream Electronic Music (MEM): parameter configuration of each popular, mostly instrumental electronic transformation in the chain. music, characterised by looped layers of rhythmic, tonal and sonic parts. Note-level (music algorithm): dealing primarily with note events and data such as Markov Morph: a morphing algorithm I pitch, onset, duration and dynamic rather developed involving weighted selection of than sound waveforms. source or target, extraction of similarity with the recent output and probabilistic Note sequence: a list of notes, ordered in sequence according to their onsets. 286 higher than 160 BPM) to produce notes in Note group: a vertical grouping of notes. All realtime. Other functions are called to notes with the same onset will occupy the produce the notes. same note group. Polyrhythm: supposition of contrasting N-source morphing: morphing between rhythms, often with different periods. more than two note sequences (multiple sources, rather than a source and target). Predisposition (musical): the ease with which a data representation may be Null prediction (in Markov Morph): when it is converted into audible music. impossible to predict a note because all elements in the probability distribution equal Recombinant (music algorithm): an . algorithm which generates music through combining of segments of existing music in Open Sound Control (OSC): a customisable new ways. protocol for sound synthesis data. Remix: producing a new piece of music from Order: see Markov order the audio of a source track (often the multi- track masters), with techniques including: Octavised scale degree: the scale degree, rearranging, adjusting effects, adding plus the number of octaves multiplied by the material, mixing and . number of steps per octave. Simple Musical Algorithm (SMA): music Parametric morphing: morphing through algorithm which exists as a single, simple conversion of source and target note component or function. sequences into multidimensional parameter envelopes; combination of source and target Smoothness (musical): the level of envelopes, weighted on the morph index; perceived moment to moment continuity and conversion of the combined envelopes within a piece of music. back into note sequences. Syncretism (music): when new styles or Perfect transformation : In TraSe, a perfect cultures of music are created from a blend of transformation is a theoretical transformation existing cultures or music. that has the capacity to produce an infinite number of patterns in the selection pool, Textural (music): music in which the with an even spread of dissimilarity to the continuous sonic texture, spectrum or timbre target throughout. In a perfect more noticeable than discrete events. Such transformation, when the transform speed textures are often created from a complex is there will be cycles before combination of many elements. convergence. Tetrachord: the four pitches spanned by the Pitch to Noise Ratio (PNR): the degree to interval of a perfect fourth. For example, the which tuned pitches are apparent within the first, second, third and fourth scale degrees. music, relative to untuned sounds or noise. Temporal Modulation: proportional changes Pivot chords: chords which are constructed between different tempi. A technique from pitches that occur in both the source pioneered by Elliot Carter. and target keys. Tonal Change (TC): the rate at which the Pivot notes: notes with pitches that occur in tonality changes, from constant drone both source and target keys. through to unpredictable solo.

Play cycle: the play cycle is a function that is Tonal Stability (TS): the strength of the constantly iterated, usually at intervals of a sense of tonality, primarily with reference to quarter beat (an eighth beat if the tempo is the tonic, fourth and fifth, but also to lower than 60 BPM, or a half beat, when enculturated pitch schemas. 287 conversation with a human judge, who Transformational (music algorithm): music decides which of them is the machine. If the algorithm where the content of the input is judge is unable to distinguish them, the transformed to generate the output, machine has passed the test. however, the data representations used for input and output have the same level of Turntablism: the practice of using the musical predisposition. turntables and mixer as a musical instrument, rather than simply a playback Transform speed (TraSe parameter): device. influences the number of frames that will occur in TraSe, by defining the level of Urlinie: in Schenkerian analysis, the dissimilarity with target that will be aimed for ‘fundamental line’, reduced from analysis of by the transformation to which the transform the pitch content and representing the speed refers. underlying tonal changes in the music.

Transform-Select (TraSe): a morphing Veridical (music psychology): a form of algorithm I developed involving an iterative expectation that arises from repeated process of transformation and selection. The listening to a particular piece of music. source is transformed into a pool of potential candidates from which a single candidate is Weighted-selection: a morphing technique selected according to similarity with target. developed by Daniel Oppenheim in which The transformation is repeated with each individual notes from either the source or new candidate until it becomes the target. target are selected for playback, weighted on the morph index. Trial (compositional approach): involves generating many potential patterns and Xenochrony: the juxtapositioning of different searching to find the ones that best fit an layers from different recordings. Term explicit criteria or goal. coined by Frank Zappa.

Triangle inequality: property of a metric measure, whereby the distance between

Turing test: a test whereby a machine and a human engage in a natural language

288 B Pseudocode of methods for combining envelopes and generating notes // GET-MORPHED-ENVELOPE-NOTE generates notes for the current play-cycle by combining source and target envelopes // Inputs: ES,ET hold envelopes for each dimension. ES is from source, ET from target. For example, ES.pitch is the pitch envelope from the source. t the time, in beats, since system playback started s the loop length, in beats (smallest common multiple of source and target lengths). res the play-cycle resolution, in beats. Usually 0.25. mi the morph index, ranging from 0 to 1. v the time, in beats, since the last note was played rec user switch, to recalculate the area under the onset envelope since the last note was played (onset area tracker) each frame or not. resco user switch, to reset the onset area tracker to zero on the first beat or not. pc user switch, to constrain the result of the pitch interpolation or not. firstFrame parameter indicating wether or not this is the first frame of the morph. // global parameters: a ‘inter-onset area tracker’, the current area under the onset envelope since a note was last played. Is used to decide whether to create a note for this play cycle. inc user parameter that influences the incidence of note generation rem user switch, whether to leave the remainder when updating the onset area tracker // functions: GET-VALUE Takes an envelope and a time, returning the envelope value at that time. SIMULATE-PLAY-CYCLES Calculates ‘a’, for a given mi and t. Described in detail below. QUANTIZE snaps the first input the nearest multiple of the second input // output: N The note that is to be played at the current time. N = null is a rest GET-MORPHED-ENVELOPE-NOTE(envelope ES, envelope ET, double t, double s, double res, double mi, double v, oolean rec, oolean firstFrame) {

IF(firstFrame == true || rec == true) { a = SIMULATE-PLAY-CYCLES(ES, ET, t, mi, v, res, a) // described below }

phase = mi * GET-VALUE(ES.phase, MOD(t,s)) + // morphed phase offset (1-mi) * GET-VALUE(ET.phase, MOD(t,s)) QUANTIZE(phase, res) ct = MOD(t-phase + s, s) // current position in envelopes, given the phase offset

IF(resco == true && ct == 0) { a = v*v }

IF( v – inc >= a/v) { // check to create a note N.pitch = mi * GET-VALUE(ES.pitch, ct) + (1 – mi)* GET-VALUE(ET.pitch, ct) IF(pc == true) { N.pitch = LOCK-TO-KNOWN-PITCHES(ES, ET, N) } N.duration = mi * GET-VALUE(ES.duration, ct) + (1 – mi) * GET-VALUE(ET.duration, ct) N.dynamic = mi * GET-VALUE(ES.dynamic, ct) + (1 – mi) * GET-VALUE(ET.dynamic, ct) IF(rem == false) { // if not leaving a remainder a = 0 // reset the area to 0 } ELSE { a = MOD(a, v*v) } //leave a remainder

} ELSE { N = null } // don’t make a note, make a rest

v = mi * GET-VALUE(ES.onset, ct) + // update the inter-onset value (1 – mi) * GET-VALUE(ET.onset, ct) a = a + v*res // update the onset area tracker return N }

Figure B-1 Pseudocode for GET-MORPHED-ENVELOPE-NOTE, which generates notes by morphing the source and target note envelopes together.

289

// SIMULATE-PLAY-CYCLES simulates the effect of play cycles on the onset area tracker // Inputs: ES,ET hold envelopes for each dimension. ES is from source, ET from target. For example, ES.pitch is the pitch envelope from the source. t the target beat that the simulation will proceed up to. mi the morph index. v the time, in beats, since the last note was played. res the frame resolution in beats. a ‘inter-onset area tracker’. the current area under the onset envelope since a note was last played. // Global parameters: inc user parameter that influences the incidence of note generation. rem user switch, to leave a remainder when updating a or not. SIMULATE-PLAY-CYCLES(envelopes ES, envelopes ET, double t, double mi, double v, double res, double a) { tr = 0 // initialise a tracker, starting from the beginning v = mi * GET-VALUE(ES.onset, tr) + // initialise inter-onset value (1 – mi) * GET-VALUE(ET.onset, tr) a = v*v // initialise the area tracker to simulate first frame note

WHILE(tr < t) { // update current value of the inter-onset and the area v = mi * GET-VALUE(ES.onset, tr) + (1-mi) * GET-VALUE(ET.onset, tr)

//simulate the creation of a note IF( v*v – inc <= a) { // update the onset area tracker:

IF(rem == false) {// if not leaving a remainder a = 0 //reset the area to 0 } ELSE { a = MOD(a, v*v) // leave a remainder } }

a = a + v*res // increment the area tracker since last note

tr = tr + res // increment the current simulated position

} }

Figure B-2 Pseudocode for SIMULATE-PLAY-CYCLES, which simulates a number of play cycles, in order to calculate the value of the inter-onset area tracker for the current point.

290 C Printed output from the Markov Morph algorithm

Notes on print out symbols: The “<<<<” indicates the beat at which a note was generated. The integer beneath the double array indicates which note was selected as a match for the seed by the random process during that quarter-beat. The position under the array visually indicates which slot in the array was selected. Where this integer is followed directly by “L”, stream loss has occurred and the double directly following “L” is the position of the segment within the 16 beat loop that is used as note data. beat 40.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.3, 0.9, 1.0, 0.5, 0.0, 0.0, } total = 3.26 m_avg = 2.16 sel note 12<<<< beat 40.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 1.0, 0.7, 0.1, 0.0, } total = 2.35 m_avg = 2.16 sel note 15<<<< beat 40.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 1.0, 0.8, 0.4, } total = 2.45 m_avg = 2.16 sel note 15<<<< beat 40.75 = { 0.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 1.0, 0.7, } total = 2.19 m_avg = 2.16 sel note 1<<<< beat 41.0 = { 1.0, 0.8, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.9, } total = 3.16 m_avg = 2.16 sel note 2<<<< beat 41.25 = { 0.2, 0.5, 1.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.05 m_avg = 2.16 sel note 5<<<< beat 41.5 = { 0.0, 0.0, 0.1, 1.0, 0.5, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.88 m_avg = 2.16 sel note 4<<<< beat 41.75 = { 0.0, 0.0, 0.0, 0.1, 1.0, 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.66 m_avg = 2.16 sel note 5<<<< beat 42.0 = { 0.0, 0.0, 0.0, 0.0, 0.1, 1.0, 0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.61 m_avg = 2.16 sel note 6<<<< beat 42.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 1.0, 0.3, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.70 m_avg = 2.16 sel note 7<<<<

Figure C-1 Print out of the similarity matrix, the total sum of similarity values, the moving average of this sum and the selected note for each quarter beat of a variation on a short chromatic run from tonic to dominant and back. Circle of Chroma pitch similarity is the only similarity measure used. Markov order is 5, contrast is moderately low. Only the last portion of the entire print out has been presented here, the final value of the moving average being the important piece of information. beat 48.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.02 m_avg = 1.14 sel note 9<<<< beat 49.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.11 m_avg = 1.14 sel note 10<<<< beat 49.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.11 m_avg = 1.14 sel note 11<<<< beat 49.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } total = 1.02 m_avg = 1.14 sel note 12<<<< beat 49.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, } total = 1.04 m_avg = 1.14 sel note 13<<<< beat 50.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, } total = 1.05 m_avg = 1.14 sel note 14<<<< beat 50.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } total = 1.05 m_avg = 1.14 sel note 15<<<< beat 50.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, } total = 1.04 m_avg = 1.14 sel note 16<<<< beat 50.75 = { 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.02 m_avg = 1.14 sel note 1<<<<

Figure C-2 Print out of the similarity matrix, the total sum of similarity values, the moving average of this sum and the selected note for each quarter beat of a variation on a short chromatic run from tonic to dominant and back. Circle of Fifths pitch similarity is the only similarity measure used. Markov order is 5, contrast is moderately low. This is the last portion of the complete print out, the final value of the moving average being the key piece of information.

291 beat 22.0 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.34 m_avg = 1.84 sel note 4<<<< beat 22.25 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.54 m_avg = 1.84 sel note 5<<<< beat 22.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.3, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.54 m_avg = 1.84 sel note 6<<<< beat 22.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.36 m_avg = 1.84 sel note 7<<<< beat 23.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 1.25 m_avg = 1.84 sel note 13<<<< beat 23.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.9, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.32 m_avg = 1.84 sel note 14<<<< beat 23.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, } total = 1.53 m_avg = 1.84 sel note 22<<<< Figure C-3 Print out of the similarity matrix, the total sum of similarity values, the moving average of this sum and the selected note for each quarter beat of a variation on the complete chromatic scale. Circle of Fifths pitch similarity is the only similarity measure used. Markov order is 5, contrast is moderately low. This is the last portion of the complete print out, the final value of the moving average being the key piece of information. beat 11.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 1.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.03 m_avg = 1.93 sel note 7<<<< stream loss this bar : 0.0 rate: 0.0 beat 12.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 1.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.05 m_avg = 1.93 sel note 9<<<< beat 12.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 1.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.02 m_avg = 1.93 sel note 10<<<< beat 12.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 1.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.02 m_avg = 1.93 sel note 9<<<< beat 12.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 1.0, 0.6, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } total = 2.04 m_avg = 1.93 Figure C-4 Print out of the similarity matrix, the total sum of similarity values, the moving average of this sum and the selected note for each quarter beat of a variation on the complete chromatic scale. Circle of Chroma pitch similarity is the only similarity measure used. Markov order is 5, contrast is moderately low. This is the last portion of the complete print out, the final value of the moving average being the key piece of information.

292

beat 169.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 169.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 170.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 170.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 170.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 170.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 171.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 171.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 171.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 171.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 172.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 172.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 172.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 172.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 173.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 173.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } stream loss this bar : 0.0 rate: 0.0 beat 173.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 173.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } beat 174.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } beat 174.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, } beat 174.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, } beat 174.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, } beat 175.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, } beat 175.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } beat 175.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } beat 175.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, } beat 176.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, } beat 176.25 = { 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 176.5 = { 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 176.75 = { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 177.0 = { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 177.25 = { 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } stream loss this bar : 0.0 rate: 0.0 beat 177.5 = { 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 177.75 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 178.0 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 178.25 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 178.5 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 178.75 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 179.0 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 179.25 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 179.5 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 179.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 180.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 180.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 180.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 180.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 181.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } beat 181.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, }

Figure C-5 The similarity matrix generated at each 0.25 beat during the rendering of the Take On Me variation. The section in blue is when the additional similarity measure of modulus 3 beat space was incorporated to increase the accuracy; while the section following this uses only modulus 8 beat space.

293 beat 12.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 18 beat 12.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 18 beat 12.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 18<<<< beat 12.75 = { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 2 beat 13.0 = { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 19 beat 13.25 = { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 19 beat 13.5 = { 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, } seL note 19<<<< beat 13.75 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } seL note 20 beat 14.0 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } seL note 4 beat 14.25 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } seL note 20 beat 14.5 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } seL note 20 beat 14.75 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } seL note 4L14.75 beat 15.0 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, } seL note 4L15.0 beat 15.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } seL note 23 beat 15.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } seL note 23<<<< beat 15.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, } seL note 6

Figure C-6 Example print out of the similarity matrix for each quarter beat during the generation of the variation of the Take On Me loop using modulus 3 and 4 space as the similarity measure, Markov depth of 1 and maximal contrast. beat 16.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 14<<<< beat 16.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, } sel note 15 beat 16.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, } sel note 7 beat 16.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, } sel note 6 beat 17.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, } sel note 22 beat 17.25 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, } sel note 15l1.25 beat 17.5 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, } sel note 15l1.5 beat 17.75 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4 beat 18.0 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4 beat 18.25 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5 beat 18.5 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4<<<< beat 18.75 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5 beat 19.0 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5 beat 19.25 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4 beat 19.5 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4<<<< beat 19.75 = { 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5

Figure C-7 Example print out of the similarity matrix for each quarter beat during the generation of the variation of the Take On Me loop using linear pitch as the similarity measure, Markov order of 1 and maximal contrast.

294 beat 16.0 = { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, } sel note 22l0.0 beat 16.25 = { 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } sel note 23 beat 16.5 = { 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, } sel note 23<<<< beat 16.75 = { 0.0, 0.9, 0.1, 0.0, 0.0, 0.2, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.2, 0.0, 0.0, 1.0, } sel note 2 beat 17.0 = { 0.0, 0.9, 0.1, 0.0, 0.0, 0.2, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4, 0.0, 0.2, 0.0, 0.0, 1.0, } sel note 2<<<< beat 17.25 = { 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 3 beat 17.5 = { 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 3<<<< beat 17.75 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4 beat 18.0 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4 beat 18.25 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4 beat 18.5 = { 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 4<<<< beat 18.75 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5 beat 19.0 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5 beat 19.25 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5 beat 19.5 = { 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, } sel note 5<<<< beat 19.75 = { 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, } sel note 6

Figure C-8 Print out of the similarity matrix for each quarter beat during the generation of the variation of the Take On Me loop weighting pitch at 100, circle of fifths space at maximum, start time at 66, Markov depth of 1 and moderate contrast. It should be noted that some similarity values that are smaller than 0.1 could not be represented easily within this space and were rounded to 0 for the print out.

295 D TraSe algorithm

Figure D-1 Degrees 1-4 of polynomial fittings to the curve generated by the add/remove computation time. The curves generated by degrees 3 and 4 are overlaid on the original, demonstrating the closeness of their fit.

296 Note: In the print out below, the notes in the note sequences are either described in the form onset:pitch (DePa pitch, including octave), or, for more detail, with s for start time (onset), p for pitch and D for duration. difnn is the result of the Nearest Neighbour dissimilarity measure. step is the current iteration of the transform chain. The text immediately after transearch is the name of the transformation in the transformation chain that is being applied. bypass specifies whether or not that transformation is being bypassed. original part transearch Rate selected = 6 . 0.05624999999999 0.0:70, selected = 3 . mid = 6 result 9994, s 0.75 0.75:76, mid = 3 result = p:82 D: 0.22 1.5:70, 2.0:72, = 5, s 1.5 p:70 2.75:68, D: 0.225, s 2.0 3.5:72, s 0.0 p:70 D: p:74 D: 0.225, s 0.0 p:70 D: 0.16875, s 0.0 s 2.75 p:66 D: target part = 0.16875, s 0.0 p:70 D: 0.225, s 3.5 0.0:70, p:70 D: 0.05624999999999 p:7 0.5:72, 1.5:78, 0.05624999999999 9994, s 0.75 4 D: 0.225, 2.25:74, 9994, s 0.75 p:76 D: 0.22 transforming 2.5:84, p:76 D: 0.22 5, s 1.5 p:70 chain 6 Octave 3.25:82, 5, s 1.5 p:70 D: 0.225, s 2.0 find... bypass = difnn = D: 0.225, s 2.0 p:72 D: 0.225, false 0.20754280833333 p:72 D: 0.225, s 2.75 p:68 D: 333 s 2.75 p:68 D: 0.225, s 3.5 0.225, s 3.5 p:7 transearch transform step 1 p:7 2 D: 0.225, Octave 2 D: 0.225, transforming selected = 3 . transforming chain 4 Pitch mid = 3 result chain 2 Phase Stretch = at step : 1 find... bypass = find... bypass = false false part s 0.0 p:70 D: 0.0:70, 0.05624999999999 0.75:76, transearch Phase transearch Pitch 9994, s 0.0 1.5:70, 2.0:72, selected = 16 . Stretch p:70 D: 0.16875, 2.75:68, mid = 16 result selected = 14 . s 0.75 p:82 D: 3.5:72, = mid = 7 result 0.22 transforming = 5, s 1.5 p:70 chain 0 Divide D: 0.225, s 2.0 and Merge s 0.0 p:70 D: p:74 D: 0.225, find... bypass = 0.05624999999999 s 0.0 p:70 D: s 2.75 p:66 D: false 9994, s 0.0 0.05624999999999 0.225, s 3.5 p:70 D: 0.16875, 9994, s 0.0 p:7 s 0.75 p:76 D: p:70 D: 0.16875, 4 D: 0.225, transearch 0.22 s 0.75 p:82 D: Trans Divide and Merge 5, s 1.5 p:70 0.22 forming chain 7 selected = 4 . D: 0.225, s 2.0 5, s 1.5 p:70 add remove mid = 2 result p:72 D: 0.225, D: 0.225, s 2.0 = s 2.75 p:68 D: p:74 D: 0.225, 0.225, s 3.5 s 2.75 p:66 D: transearch add p:7 0.225, s 3.5 remove s 0.0 p:70 D: 2 D: 0.225, p:7 selected = 12 . 0.05624999999999 transforming 4 D: 0.225, mid = 7 result 9994, s 0.75 chain 3 transforming = p:76 D: 0.225, harmonise chain 5 s 1.5 p:70 D: find... bypass = inversions 0.225, false find... bypass = s 0.0 p:70 D: s 2.0 p:72 D: harda = 0[,0 false 0.16875, s 0.0 0.225, s 2.75 ,0 ], p:70 D: p:68 D: 0.225, 1[,0 ], 0.05624999999999 s 3.5 p:72 D: 2[,0 ], transearch 9994, s 0.75 0.225, s 0.0 3[,0 ], inversions p:82 D: 0.22 p:70 4[,0 ], selected = 6 . 5, s 1.5 p:70 D: 0.16875, 5[,0 ], mid = 6 result D: 0.225, s 2.0 transforming 6 null = p:74 D: 0.225, chain 1 Rate } s 2.75 p:66 D: find... bypass = 0.225, s 3.5 false transearch s 0.0 p:70 D: p:7 harmonise 0.16875, s 0.0 4 D: 0.225, s p:70 D: 2.5 p:84 D:

297 1.38186274509803 selected = 0 . } 87, mid = 2 result = transearch transearch harmonise Octave transearch add selected = 6 . selected = 3 . remove s 0.0 p:70 D: mid = 6 result mid = 3 result selected = 10 . 0.16875, s 0.5 = = mid = 8 result p:72 D: 0.475, = s 1.5 p:70 D: 0.225, s 2.0 s 0.0 p:70 D: s 0.0 p:70 D: p:74 D 0.16875, s 0.5 0.16875, s 0.5 s 0.0 p:70 D: : 0.225, s 2.5 p:72 D: 0.475, p:74 D: 0.475, 0.16875, s 0.0 p:84 D: 0.475, s 1.5 p:70 D: s 1.5 p:70 D: p:70 D: s 3.5 p:74 D: 0.225, s 2.0 0.225, s 2.0 0.05624999999999 0.225, p:74 D p:76 D 9994, s 0.75 transforming : 0.225, s 2.5 : 0.225, s 2.5 p:82 D: 0.22 chain 1 Rate p:84 D: 0.475, p:94 D: 0.475, 5, s 1.5 p:70 find... bypass = s 3.5 p:74 D: s 3.5 p:76 D: D: 0.225, s 2.0 false 0.225, 0.225, p:74 D: 0.225, transforming transforming s 2.5 p:84 D: chain 4 Pitch chain 7 add 1.38186274509803 transearch Rate Stretch remove 87 selected = 3 . find... bypass = , s 2.75 p:66 mid = 3 result false D: 0.225, s 3.5 = transearch add p:74 D: 0.225, remove s 0.5 p:72 D: transearch Pitch selected = 12 . 1.32303921568627 s 0.0 p:70 D: Stretch mid = 6 result 36 0.16875, s 0.5 selected = 12 . = , p:72 D: 0.475, mid = 7 result chaco = 3 s 1.5 p:70 D: = 0.225, s 2.0 s 0.0 p:70 D: part 1 = p:74 D 0.16875, s 0.5 0.0:70, : 0.225, s 2.5 s 0.0 p:70 D: p:74 D: 0.475, 0.5:72, p:84 D: 0.475, 0.16875, s 0.5 s 1.5 p:70 D: 0.75:82, s 3.5 p:74 D: p:74 D: 0.475, 0.225, s 2.0 1.5:70, 2.0:74, 0.225, s 1.5 p:70 D: p:76 D 2.5:84, transforming 0.225, s 2.0 : 0.225, s 2.5 2.75:66, chain 2 Phase p:76 D p:94 D: 0.475, 3.5:74, find... bypass = : 0.225, s 2.5 s 3.5 p:76 D: false p:94 D: 0.475, 0.225, s 3.25 target = s 3.5 p:76 D: p:82 D: 0.0:70, 0.225, 1.440686274 0.5:72, 1.5:78, transearch Phase transforming 5098032, 2.25:74, selected = 16 . chain 5 2.5:84, mid = 16 result inversions 3.25:82, = find... bypass = transearch add false remove transform step 2 selected = 12 . s 0.0 p:70 D: mid = 7 result 0.16875, s 0.5 transearch = p:72 D: 0.475, inversions at step : 2 s 1.5 p:70 D: selected = 6 . 0.225, s 2.0 mid = 6 result s 0.0 p:70 D: part p:74 D = 0.16875, s 0.5 0.0:70, : 0.225, s 2.5 p:74 D: 0.475, 0.5:72, p:84 D: 0.475, s 1.5 p:70 D: 0.75:82, s 3.5 p:74 D: s 0.0 p:70 D: 0.225, s 2.0 1.5:70, 2.0:74, 0.225, 0.16875, s 0.5 p:76 D 2.5:84, transforming p:74 D: 0.475, : 0.225, s 2.5 2.75:66, chain 3 s 1.5 p:70 D: p:84 D: 3.5:74, harmonise 0.225, s 2.0 1.13700980392156 transforming find... bypass = p:76 D 81, s 3.25 p:82 chain 0 Divide false : 0.225, s 2.5 D: and Merge harda = 0[,0 p:94 D: 0.475, 1.44068627450980 find... bypass = ], s 3.5 p:76 D: 32, false 1[,0 ], 0.225, s 3.5 p:76 D: 2[,0 ], transforming 0.225, 3[,0 ], chain 6 Octave chaco = 3 transearch 4[,0 ], find... bypass = Divide and Merge 5[,0 ], false part 2 =

298 0.0:70, 0.225, s 2.0 1.13700980392156 0.5:74, 1.5:70, p:76 D 81, s 3.25 p:82 s 0.0 p:70 D: 2.0:76, 2.5:84, : 0.225, s 2.5 D: 0.3601715, s 0.16875, s 0.5 3.25:82, p:84 D: 3.5 p: p:74 D: 0.475, 3.5:76, 1.13700980392156 76 D: 0.225, s 1.5 p:70 D: 81, s 3.25 p:82 transforming 0.225, s 2.0 target = D: 0.3601715, s chain 4 Pitch p:76 D 0.0:70, 3.5 p: Stretch : 0.225, s 2.5 0.5:72, 1.5:78, 82 D: find... bypass = p:86 D: 2.25:74, 1.08051470588235 false 1.13700980392156 2.5:84, 25, s 3.5 p:76 81, s 3.25 p:84 3.25:82, D: 0.225, D: 0.3601715, s transforming transearch Pitch 3.5 p: transform step 3 chain 2 Phase Stretch 76 D: 0.225, find... bypass = selected = 8 . transforming false mid = 7 result chain 7 add = remove at step : 3 transearch Phase part selected = 16 . s 0.0 p:70 D: transearch add 0.0:70, mid = 16 result 0.16875, s 0.5 remove 0.5:74, 1.5:70, = p:74 D: 0.475, selected = 10 . 2.0:76, 2.5:84, s 1.5 p:70 D: mid = 7 result 3.25:82, 0.225, s 2.0 = 3.5:76, s 0.0 p:70 D: p:76 D transforming 0.16875, s 0.5 : 0.225, s 2.5 chain 0 Divide p:74 D: 0.475, p:86 D: s 0.0 p:70 D: and Merge s 1.5 p:70 D: 1.13700980392156 0.16875, s 0.5 find... bypass = 0.225, s 2.0 81, s 3.25 p:84 p:74 D: 0.475, false p:76 D D: 0.3601715, s s 1.5 p:78 D: : 0.225, s 2.5 3.5 p: 1.19833333333333 p:84 D: 76 D: 0.225, 33, transearch 1.13700980392156 transforming s 2.0 p:76 D: Divide and Merge 81, s 3.25 p:82 chain 5 0.225, s 2.5 selected = 4 . D: 0.3601715, s inversions p:86 D: mid = 2 result 3.5 p: find... bypass = 1.13700980392156 = 76 D: 0.225, s false 81, s 3.25 p:84 3.5 p:82 D: D: 0.36017 1.08051470588235 15, s 3.5 p:76 s 0.0 p:70 D: 25, transearch D: 0.225, 0.16875, s 0.5 transforming inversions p:74 D: 0.475, chain 3 selected = 6 . s 1.5 p:70 D: harmonise mid = 6 result transearch add 0.225, s 2.0 find... bypass = = remove p:76 D false selected = 11 . : 0.225, s 2.5 harda = 0[,0 mid = 7 result p:84 D: ], s 0.0 p:70 D: = 1.13700980392156 1[,0 ], 0.16875, s 0.5 81, s 3.25 p:82 2[,0 ], p:74 D: 0.475, D: 0.3601715, s 3[,0 ], s 1.5 p:70 D: s 0.0 p:70 D: 3.5 p: 4[,0 ], 0.225, s 2.0 0.16875, s 0.5 76 D: 0.225, s 5[,0 ], p:76 D p:74 D: 0.475, 3.5 p:82 D: 6[,6 ,0 ], : 0.225, s 2.5 s 1.5 p:78 D: 1.08051470588235 7 null p:86 D: 1.19833333333333 25, } 1.13700980392156 33, transforming 81, s 3.25 p:84 s 2.0 p:76 D: chain 1 Rate transearch D: 0.3601715, s 0.225, s 2.5 find... bypass = harmonise 3.5 p: p:86 D: false selected = 4 . 76 D: 0.225, 1.13700980392156 mid = 6 result transforming 81, s 3.25 p:84 = chain 6 Octave D: 0.36017 transearch Rate find... bypass = 15, s 3.5 p:76 selected = 3 . false D: 0.225, s mid = 3 result s 0.0 p:70 D: 2.25 p:74 D: = 0.16875, s 0.5 1.42107843137254 p:74 D: 0.475, transearch 86, s 1.5 p:70 D: Octave chaco = 4 s 0.0 p:70 D: 0.225, s 2.0 selected = 3 . 0.16875, s 0.5 p:76 D mid = 3 result part 3 = p:74 D: 0.475, : 0.225, s 2.5 = 0.0:70, s 1.5 p:70 D: p:84 D: 0.5:74, 1.5:78,

299 2.0:76, 81, s 3.25 p:84 s 1.5 p:80 D: 0.725, s 2.25 2.25:74, D: 0.475, 0.725, s 2.25 p:74 2.5:86, transforming p:74 D: 3.25:84, chain 2 Phase D: 1.38700980392156 3.5:76, find... bypass = 1.38700980392156 81, s 3.25 p:82 false 81, s 3.25 p:86 D: target = D: 0.475, 1.17995098039215 0.0:70, transforming 64, 0.5:72, 1.5:78, transearch Phase chain 5 2.25:74, selected = 16 . inversions 2.5:84, mid = 16 result find... bypass = transearch add 3.25:82, = false remove selected = 10 . transform step 4 mid = 5 result s 0.0 p:70 D: transearch = 0.16875, s 0.5 inversions p:74 D: 0.475, selected = 6 . at step : 4 s 1.5 p:78 D: mid = 6 result s 0.0 p:70 D: 0.725, s 2.25 = 0.16875, s 0.5 part p:74 p:74 D: 0.475, 0.0:70, D: s 1.5 p:80 D: 0.5:74, 1.5:78, 1.38700980392156 s 0.0 p:70 D: 0.725, s 2.25 2.0:76, 81, s 3.25 p:84 0.16875, s 0.5 p:74 2.25:74, D: 0.475, p:74 D: 0.475, D: 2.5:86, transforming s 1.5 p:80 D: 1.38700980392156 3.25:84, chain 3 0.725, s 2.25 81, s 3.25 p:82 3.5:76, harmonise p:74 D: transforming find... bypass = D: 1.17995098039215 chain 0 Divide false 1.38700980392156 64, s 2.5 p:84 and Merge harda = 0[,0 81, s 3.25 p:86 D: 1.3818 find... bypass = ], D: 0.475, 627450980387, false 1[,0 ], transforming chaco = 3 2[,0 ], chain 6 Octave 3[,0 ], find... bypass = part 4 = transearch 4[,0 ], false 0.0:70, Divide and Merge } 0.5:74, 1.5:80, selected = 0 . 2.25:74, mid = 2 result transearch transearch 2.5:84, = harmonise Octave 3.25:82, selected = 6 . selected = 3 . mid = 6 result mid = 3 result target = s 0.0 p:70 D: = = 0.0:70, 0.16875, s 0.5 0.5:72, 1.5:78, p:74 D: 0.475, 2.25:74, s 1.5 p:78 D: s 0.0 p:70 D: s 0.0 p:70 D: 2.5:84, 0.725, s 2.25 0.16875, s 0.5 0.16875, s 0.5 3.25:82, p:74 p:74 D: 0.475, p:74 D: 0.475, D: s 1.5 p:78 D: s 1.5 p:80 D: transform step 5 1.38700980392156 0.725, s 2.25 0.725, s 2.25 81, s 3.25 p:84 p:74 p:74 D: 0.475, D: D: transforming 1.38700980392156 1.38700980392156 at step : 5 chain 1 Rate 81, s 3.25 p:84 81, s 3.25 p:86 find... bypass = D: 0.475, D: 0.475, part false transforming transforming 0.0:70, chain 4 Pitch chain 7 add 0.5:74, 1.5:80, Stretch remove 2.25:74, transearch Rate find... bypass = 2.5:84, selected = 3 . false 3.25:82, mid = 3 result transearch add transforming = remove chain 0 Divide transearch Pitch selected = 11 . and Merge Stretch mid = 5 result find... bypass = s 0.0 p:70 D: selected = 8 . = false 0.16875, s 0.5 mid = 7 result p:74 D: 0.475, = s 1.5 p:78 D: s 0.0 p:70 D: transearch 0.725, s 2.25 0.16875, s 0.5 Divide and Merge p:74 s 0.0 p:70 D: p:74 D: 0.475, selected = 2 . D: 0.16875, s 0.5 s 1.5 p:80 D: mid = 2 result 1.38700980392156 p:74 D: 0.475, =

300 find... bypass = selected = 6 . 509803921564, false mid = 6 result s 0.0 p:70 D: harda = 0[,0 = 0.16875, s 0.5 ], transearch add p:74 D: 0.475, 1[,0 ], remove s 1.5 p:80 D: 2[,0 ], s 0.0 p:70 D: selected = 8 . 0.725, s 2.25 3[,0 ], 0.16875, s 0.5 mid = 6 result p:74 4[,0 ], p:74 D: 0.475, = D: 5[,0 ], s 1.5 p:78 D: 1.38700980392156 } 0.725, s 2.25 81, s 2.5 p:84 p:74 s 0.0 p:70 D: D: transearch D: 0.16875, s 0.5 1.38186274509803 harmonise 1.38700980392156 p:72 D: 87, s 3.25 p:82 selected = 6 . 81, s 2.5 p:82 1.09406862745097 D: 1.1799 mid = 6 result D: 97, s 1.5 p:78 509803921564, = 1.38186274509803 D: 0.725, transforming 87, s 3.25 p:80 s 2.25 p:74 D: chain 1 Rate D: 1.1799 1.38700980392156 find... bypass = s 0.0 p:70 D: 509803921564, 81, s 2.5 p:84 false 0.16875, s 0.5 transforming D: p:74 D: 0.475, chain 6 Octave 1.38186274509803 s 1.5 p:80 D: find... bypass = 87, s 3.25 p transearch Rate 0.725, s 2.25 false :80 D: selected = 3 . p:74 1.17995098039215 mid = 3 result D: 64, = 1.38700980392156 transearch chaco = 2 81, s 2.5 p:84 Octave D: selected = 3 . part 5 = s 0.0 p:70 D: 1.38186274509803 mid = 3 result 0.0:70, 0.16875, s 0.5 87, s 3.25 p:82 = 0.5:72, 1.5:78, p:74 D: 0.475, D: 1.1799 2.25:74, s 1.5 p:80 D: 509803921564, 2.5:84, 0.725, s 2.25 transforming s 0.0 p:70 D: 3.25:80, p:74 chain 4 Pitch 0.16875, s 0.5 D: Stretch p:74 D: 0.475, target = 1.38700980392156 find... bypass = s 1.5 p:78 D: 0.0:70, 81, s 2.5 p:84 false 0.725, s 2.25 0.5:72, 1.5:78, D: p:74 2.25:74, 1.38186274509803 D: 2.5:84, 87, s 3.25 p:82 transearch Pitch 1.38700980392156 3.25:82, D: 1.1799 Stretch 81, s 2.5 p:82 509803921564, selected = 6 . D: transform step 6 transforming mid = 7 result 1.38186274509803 chain 2 Phase = 87, s 3.25 p:80 find... bypass = D: 1.1799 false 509803921564, at step : 6 s 0.0 p:70 D: transforming 0.16875, s 0.5 chain 7 add part transearch Phase p:74 D: 0.475, remove 0.0:70, selected = 16 . s 1.5 p:78 D: 0.5:72, 1.5:78, mid = 16 result 0.725, s 2.25 2.25:74, = p:74 transearch add 2.5:84, D: remove 3.25:80, 1.38700980392156 selected = 11 . transforming s 0.0 p:70 D: 81, s 2.5 p:82 mid = 6 result chain 0 Divide 0.16875, s 0.5 D: = and Merge p:74 D: 0.475, 1.38186274509803 find... bypass = s 1.5 p:80 D: 87, s 3.25 p:80 false 0.725, s 2.25 D: 1.1799 s 0.0 p:70 D: p:74 509803921564, 0.16875, s 0.5 D: transforming p:74 D: 0.475, transearch 1.38700980392156 chain 5 s 1.5 p:78 D: Divide and Merge 81, s 2.5 p:84 inversions 0.725, s 2.25 selected = 2 . D: find... bypass = p:74 mid = 2 result 1.38186274509803 false D: = 87, s 3.25 p:82 1.38700980392156 D: 1.1799 81, s 2.5 p:84 509803921564, transearch D: s 0.0 p:70 D: transforming inversions 1.38186274509803 0.16875, s 0.5 chain 3 87, s 3.25 p:80 p:72 D: harmonise D: 1.1799 1.09406862745097

301 97, s 1.5 p:78 transforming transforming D: 0.725, chain 3 chain 5 s 2.25 p:74 D: harmonise inversions s 0.0 p:70 D: 1.38700980392156 find... bypass = find... bypass = 0.16875, s 0.5 81, s 2.5 p:84 false false p:72 D: D: harda = 0[,0 1.09406862745097 1.38186274509803 ], 97, s 1.5 p:78 87, s 3.25 p 1[,0 ], transearch D: 0.725, :80 D: 2[,0 ], inversions s 2.25 p:74 D: 1.17995098039215 3[,0 ], selected = 6 . 1.38700980392156 64, 4[,0 ], mid = 6 result 81, s 2.5 p:84 transforming 5[,0 ], = D: chain 1 Rate } 1.38186274509803 find... bypass = 87, s 3.25 p false transearch s 0.0 p:70 D: :82 D: harmonise 0.16875, s 0.5 1.37028774509803 selected = 6 . p:72 D: 86, transearch Rate mid = 6 result 1.09406862745097 selected = 3 . = 97, s 1.5 p:78 mid = 3 result D: 0.725, transearch add = s 2.25 p:74 D: remove s 0.0 p:70 D: 1.38700980392156 selected = 6 . 0.16875, s 0.5 81, s 2.5 p:84 mid = 6 result s 0.0 p:70 D: p:72 D: D: = 0.16875, s 0.5 1.09406862745097 1.38186274509803 p:72 D: 97, s 1.5 p:78 87, s 3.25 p 1.09406862745097 D: 0.725, :80 D: s 0.0 p:70 D: 97, s 1.5 p:78 s 2.25 p:74 D: 1.17995098039215 0.16875, s 0.5 D: 0.725, 1.38700980392156 64, p:72 D: s 2.25 p:74 D: 81, s 2.5 p:84 transforming 1.09406862745097 1.38700980392156 D: chain 6 Octave 97, s 1.5 p:78 81, s 2.5 p:84 1.38186274509803 find... bypass = D: 0.725, D: 87, s 3.25 p false s 2.25 p:74 D: 1.38186274509803 :80 D: 1.38700980392156 87, s 3.25 p 1.17995098039215 81, s 2.5 p:84 :80 D: 64, transearch D: 1.17995098039215 transforming Octave 1.38186274509803 64, chain 4 Pitch selected = 3 . 87, s 3.25 p transforming Stretch mid = 3 result :82 D: chain 2 Phase find... bypass = = 1.37028774509803 find... bypass = false 86, false chaco = 0 s 0.0 p:70 D: transearch Pitch 0.16875, s 0.5 part 6 = transearch Phase Stretch p:72 D: 0.0:70, selected = 16 . selected = 7 . 1.09406862745097 0.5:72, 1.5:78, mid = 16 result mid = 7 result 97, s 1.5 p:78 2.25:74, = = D: 0.725, 2.5:84, s 2.25 p:74 D: 3.25:82, 1.38700980392156 s 0.0 p:70 D: s 0.0 p:70 D: 81, s 2.5 p:84 target = 0.16875, s 0.5 0.16875, s 0.5 D: 0.0:70, p:72 D: p:72 D: 1.38186274509803 0.5:72, 1.5:78, 1.09406862745097 1.09406862745097 87, s 3.25 p 2.25:74, 97, s 1.5 p:78 97, s 1.5 p:78 :80 D: 2.5:84, D: 0.725, D: 0.725, 1.17995098039215 3.25:82, s 2.25 p:74 D: s 2.25 p:74 D: 64, 1.38700980392156 1.38700980392156 transforming target part = 81, s 2.5 p:84 81, s 2.5 p:84 chain 7 add 0.0:70, D: D: remove 0.5:72, 1.5:78, 1.38186274509803 1.38186274509803 2.25:74, 87, s 3.25 p 87, s 3.25 p 2.5:84, :80 D: :80 D: transearch add 3.25:82, 1.17995098039215 1.17995098039215 remove 64, 64, selected = 12 . mid = 6 result =

Figure D-2 An example of printed output from the TraSe algorithm.

302 E Results from the online morphing questionnaire Participants Æ One Two Thee Four Example one break into tracks and Pull out the hats and bass of the riffs, look at source, because these elements ways to do are producing the swing. Pull substitution. breakdown. out the hats of the target, which Gauge distance. tempo Techniques Consistent pulse. is producing the 'straightness', Because distance is change, Harmonic steps. crossfading the whole pattern large, cut in chunks maybe back and forth a few times, then associated introducing the full target with a time sounds. signature or key change. Rhythm density, texture and pace tempo, timbre, different and tempo, style and feel are groove are the Factors require reduction. rhythm different, requiring a main Timbre, pulse simple cut problems common and can be maintained. Example two Slow down. Write bridge with similar materials. Thicken texture with Resolving note that lingers, intro reverb. Key modulation. Techniques \ \ to target (roll), then kick in 2-4 bar intro to target. completely. Rapid switch to full target. Emphasise groove (*bass and drums) in target - dry. vibes inspired modal none, with the proposed Factors \ all elements oscillation and technique thickening of texture. Example three dump source quickly. drop guitar (cause it sucks) and Techniques \ \ Focus on the irregular bring in dark blips. target. the whole source makes it harder.."It sounds like it was written by the rhythms are close. Synth pads owner's 10 year old kid are close (sound-wise) and Factors \ \ whose world experience could sound dark in a different is MacDonald's. The context. target was composed by someone who has visited other planets." Example four straight cut with a roll to pre- drop melody. Focus on Techniques \ \ empt it. rhythm. rhythmic stylistic Factors \ \ none differences

Figure E-1 Compositional techniques and specific factors for each example from participants one to four. “\” indicates that the participant used the same techniques and perceived the same factors as in the previous example. “(* … )” contains interpretative remarks from the author. Most of the comments have been condensed in order to be presented easily here. Continued below.

303

Participants Æ five Six Seven eight Nine Example one Merge snare in src and Integrate beat and metre experiment with organ in trg on same first, then evolve tempo shift or beat. Layer the harmony. slow tempo complete rhythm changes one at a time. change. No worries with cut. Analyse Tempo switch. Timbre metre. Find instruments which elements evolve switch. Swap with similar timbre to are compatible. rhythms patterns/timbre of lead. sync the cross with. Sustained break-down. and use honk sound from Techniques rhythm. Pull notes and long loop of Thinning. timbres. target. circle of fifths sounds out. keys would be Change tempo, Fade in, modulation. shorteded to match fade in new beat. fade out. Breakdown source. target. Aux sounds of Merge stripped High-pass on bass. first could be shortened elements, let Introduce target riff. into a honk, and timbre source elements Build drums. Loop half could be changed at go and add of target riff in a build same time. target. up. tempo. Also strong sounds are most identify the metre 4/4 to noticeable. The shared regular beat swung (*4/4 dominance of the in the target triplets to 4/4 bassline could be rhythms, Factors and just straight). Pitch is mood makes it hard. exploited. Chord of timbres use that different, but less source and odd sounds during the an issue if of target could be transition. removed integrated interestingly. completely. Example two speed up tempo. Evolve try for double foreshadow excepts timbre. Mod key. time. thin to transpose from target at slow Exchange instruments basic elements. source up. tempo. transposition. one at a time, (*layering Drop beat out of Techniques Discordise \ use vibes in source to the new sounds into the source. Bring in it. Speed it bridge to target. Have a current patterns) change beat of target, up slowly. bridging sound. Switch the bass and drums replace other to target. slowest of all. elements. minor to major will be timbre and tempo Factors Na Na \ difficult make it difficult Example three subvert the connect drums. Guitar bright down an octave into sounds. evolve string into bass. loose melodic bass of target. Turn source Increase bass density content. Would Techniques \ introduce cymbal from piano into and staccato. Guitar be too difficult to target early on. Build the evolve into strings. merge. drums. Evolve strings precussion to target. line. mood and tempo and mode density difficult. Density and Factors melody difficult. \ makes it articulation different (* harder. but not difficult) Example four Melodies could be continuated strip quickly, then jump transition. Maybe into a sudden merge bass change timbre and a few overlaps. They Techniques and immediate \ lines. melody for slow are similar. short build change. Rhythm evolution up and then straight cut be replaced gradually. easy. Similar no problems with Factors Easy \ tempo differences

Figure E-2 Continuation of the previous table, participants five to nine.

304 Paritipant: Music 1 Lemu Music 1 Human Music 2 Lemu Music 2 Human Smooth. diffuse. low odd, OK. Smooth good interpart focus one: response not morph. transition. integrity. attempted. shifts. Effect smoothness negative. NA smoothness ineffective NA smooth/effective y/n NA (R: n/y) y/n NA two: response OK. Too dense. Intense. Better movement perceived. better and smoother No smth, too busy. Fade Effect NA smooth and not smooth. NA smth smooth/effective n/n ?/y ?/? y/? good. Sounds blend structure not smooth. three: response bad. Blend very forced. good. well. Bad not smth. Natrl. commn start smooth, good. Mid Effect not smooth. Forced. Smoothness good elmts. not. smooth/effective n/n (R: almost y/n) n/y y/y n/n four: response boring - predictable OK. good detail, contour. "intense negative" nice. Too long no smooth. Rnd stable metre more Effect smoothness dull surprise more important. substitution important n/n for meta. (R: y/n for smooth/effective y/n n/y n/y local) gd meta, foreshdw, beat, exel. less natrl, sbtle, five: response middle not coherent excellent. Natural. brk. contins unsmooth, chaotic, smooth, beat gd. Too most smooth (R: meta), Effect smoothness positive. inhuman. empty. good. smooth/effective n/n y/y y/y y/y functional. hard to dance nice. Brk smooth, good. Used quirky lead bad, spoils whole six: response to. coherent. target. morph smooth, gd. bits not no smoothness, bad Effect smoothness good. meta smoothness good smooth effect y/y (R: meta- smooth/effective y/y y/y n/n smoothness) evil chaos.Pitch, rhythm stonger. Drop in OK. Not solid with seven: resp. weak - no highlights. clash excellent. melody not smooth can be lead smth, bad. Rhythm Effect partly smooth, effective. smooth, not pleasing. effective. switches not smh, gd. y/n, n/n for different y/n for lead & n/y for smooth/effective n/y y/n elements. rhythm ugly then too high. eight: response Too long. mystical. Ugly at the end good happy music Spacious. smthnes stoppd fst smooth switches were smoothness was Effect smoothness was good. transition good effective y/y (R: meta smooth/effective y/n disliked long opening. y/y y/y Small changes good smoothness) lost direction. dislike gd brk, mode shift. call & Surprisingly good. gd drum ovrlay, nine: response break, honk, resp. Rhythm bad. Interesting. foreshdw. dissonance. not smooth, bad break. Jarring in bass and kick long length helped Effect From break to target smthness gd, bizarre. bd. smoothness good. n/n in 1st half, n/y in 2nd smooth/effective n/n y/y y/y half

Figure E-3 Condensed summary of participants’ responses to both LEMorpheus and human- composed morphs for the musical examples 1 and 2. (R: … ) is an interpretative comment that clarifies an ambiguous response. Abbreviations are: gd- good, smth – smooth, natrl – natural, commn – common, elmts – elements, rnd – random, brk – break, foreshdw - foreshadow, contins – continuous, excel – excellent, resp – response, stoppd – stopped, fst – fast, ovrlay – overlay. Responses to examples three and four are continued below.

305

Paritipant: Music 3 LEMorpheus Music 3 Human Music 4 LEMorpheus Music 4 Human slowly pulled back in one: response very effective subtle but effective. NA energy Smth envelopment of medium smthnss, dull Effect NA NA mood. morph. smooth/effective NA NA NA NA gd, shifting timbres, v. smth, dvlpmnt not two: response abrupt rhythms, bad nice feel. harmony switch no smth, cause rhythm Effect positive effect smoothness positive smoothness positive dense y/y - cmmn elmnts smooth/effective y/y n/n y/y maintained strong. rhythm switch bad cross-fade. Guitar better, but too much three: response too early Sound x-over is not like. sucks. blend. Don't blend - cut. good. y/y (R: meta).Could be smoothness bad. Should Effect smthnss bad. Gtr too dom smoothness bad, weak smther cut smooth/effective (R: y/n) (R: n/y) y/n pitch issues y/n gd. sim of src and trg four: response "churns my innards" primitive chaotic helped predictable, less Effect NA smoothness was good smthnss bad. Malformd annoying smooth/effective (NA) / n n/y y/y y/n smth but haphhazard unpleasant, tense, excellent. Maintained excel. Strngs subvrtd clean. Bass draws five: response forced. Obvious tempo feel of source. Smooth cleverly attention increase. cross-over. no smthnss smoothness was Effect smoothness was positive. v transparent. bit jarring discomforting. positive y/y effectively turned y/y - cause of track y/y smth chnges smooth/effective n/n - discontinuous. bad mood styles effective. bad. Middle bettr thn not right, OK . Frshdw OK. Smoothness OK. Happy mood six: response expectd. gd. expected throughout no smoothness = smoothness was Effect Meta-smoothness good good. Drums are smooth negative positive y/y smthness helped a smooth/effective n/n altered flow, density. y/y y/y little weak, chaos, sad, mel weak beat drp gd. gd. Subtle. Needs great. improvement on seven: resp. Dischord Strong pnctuation prev. Smth mel gd, abrupt smth, needs drops, Effect too smooth smth, works. beat gd enrgy. y/OK for melody, n/y for smooth/effective y/n smoothness negative y/y y/y beat. morph better than unsettling cross- eight: response disjointed smoothness was good. originals. rhythms. smthnss had a good lack of smthnss was Effect smoothness was good effect. ugly. y/y smthnss was smooth/effective n/n y/y y/y effective. bad. No clever OK. Odd bits. part shift good. Happy. Distinct gd. Happy. Start out of nine: response integration. good. parts. sync. Smth gd. Start weak. smthnss gd, series of out of sync was not Effect smoothness was bad end gd loops smooth. y/n end smooth, start y/y chord modulation y/y(R: mybe n/y, unrel smooth/effective n/n and then y/y weak. smooth. loops)

Figure E-4 Condensed summary of participants’ responses to both LEMorpheus and human- composed morphs for the musical examples 3 and 4. Additional abbreviations are: Gtr – guitar, dom – dominant, strngs – strings, subvrtd – subverted, bttr – better, thn – than, expectd – expected, discord – dischordant,

306

Participant Change in example 1 LEMorpheus Changes in example 1 Human one: chng1 :5 var/lay: bad. Abrupt. (R: general shift) :9 sound: Odd/bad. Too obvious (donkey) var/lay: bad. lack enrgy. (R: muddiness in change 2 :12 the mix ) :14 layer: sudden tempo and shuffle change change 3 :21 var/layer: target reached :21 layer: drums sound - distraction from changes (R: and two: chng1 :4 rhythmic changes act as forshadowing :11 forshdow) change 2 :7 sound :14 layers: bringing out the organ is good :20- lay: drum brk gd: leavs spce, fulfills rhythm change 3 :11 focus from sound to keys 1 chnge sound: distraction from chngs, completes change 4 :22 auxilliary perc hits :23 premption var: dissipation of high end sound pre- three: ch1 :3 aux: cowbell. Not enough space. :3 empts change. lay/key var: pitch clash (R: bass and the change 2 :6 pads) :7 lay/var: organ. Foreshadow. change 3 :14 var: bass transition is nice :14 lay: rhythm cut to kick, tempo gained. var/lay: new rhythm emerges. Stops lay: new rhythm. not forced as it is at orignl change 4 :17 awkwardness :21 tempo. four: ch1 :4 var/lay: unexpected change overall :8 var: good step. (R: harmonic change?) change 2 :08 var: simple substitution :14 layer: good but organ should go. var: predictable (R: either the chords or the change 3 :14 sound) :19 layer: took too long change 4 :19 var: predictable (R: probably the sound?) :21 layer: good entry var: ovrall chnge forshdwd. Imprtnt bass snd: bckgrnd snds bit messy. Lay: orgn five: ch1 :7 note enters :8 preemptive layer: cut is full of pleasant expectation, but change 2 :10 var: incompatible beats were mixed :14 too long change 3 :14 var: less incomparible but still too busy :18 var: no variation. too empty. Redundant. lay: good kick to new track. Could have change 4 :20 var: complete :21 been ealier var: bass (R: he obviously means the six: chng1 :0 chords/lead ) :0 over all changes change 2 :7 var: bass; aux: the descending sound NA change 3 :14 var: bass NA change 4 :21 bass NA lay: gd - basic elements that warp well only seven: ch1 :7 sound: nightmareish :14 remain lay: great! New beat drop in, just at the right change 2 :24 var/lay: good. beat completes it. :21 time. eight: ch1 :9 var: bad rhythm. :10 sound: honk good change 2 :17 var: keys not work with other sounds :17 lay/struct: bad. Took too long change 3 :20 sound: good. Hooter :23 lay: bad too silent. change 4 :23 lay: latin riff sounds out of place. :25 lay: too much all at once. :8- var: bd. bass chng w/o lead, tonl cntre nine: ch1 :7 var: interest. Range, metre. (R: and key) 10 ambiguous. var: bad. Bass too high in mix. Dynamics too change 2 :18 high :14 lay: good. Back to tonal centre var: gd. Rhythmic bridge. Phrasing, texture, change 3 :13 timbre. :21 lay: good. Cut to target var/lay:+ new drm:snd pace, mtre signl change 4 :3 mode chnge NA

Figure E-5 (cont) Important perceived changes in LEMorpheus and human morphs of the first example. Var means variation, lay means layer. Other abbreviations continued from previous tables. Rows with NA responses deleted to help fit.

307

Participant Change in example 2 LEMorpheus Changes in example 2 Human layer: keys unexpectd, but forshdw one: chng1 :5 Var: odd notes. Beginning of key mod. :2 change. layer: drums set up strong change in change 2 :14 Lay/var: odd perc. Hits. Target coming in. :18 momentum layer gap: drums drop. Unexpected. change 3 :22 Var: weird. Combination doesn't fit. Too busy. :23 new enrgy entrs change 4 NA layers: drums added partially. two: chng1 :4 focus to bass ® :17 Preempts change change 2 :18 layers: fade in of piano good transition intro :22 layers: synth timbre shifts the focus var: the var on the vibes gives nice change 3 :27 focus: "farty" synth highlights chaos :28 continuity layer(intrapart): add complemntry change 4 :32 layers: flutey lead confirms new pattern :33 rhodes finalises Lay/var: hints of double time entering the lay/var: forshad/intro of target lead three: ch1 :13 drums :08 successful lay/var:gd w/o cliche dble-tme, clvr rhythm change 2 :23 intgrtion :17 lay: double-time clichéd var/lay: gd bass snd of trg emrges. Fits either change 3 :27 rhythm :26 lay: not matching up var/lay: gd, unexpected rhythm integration, like change 4 :30 ch2. :30 focus: source is gone four: ch1 :6 var: interesting triplets in vibes :2 lay: subtle intro of Rhodes change 2 :14 var: loss of rhythm in keys :8, :13 lay/var: well formed. lay: good. Metric shift in place befre change 3 :16 var: too random :17 harmonic chnge. change 4 :32 var: unexpected endpoint, given transition. :23 lay: bas well set (groove works) var: harmonic and pattern changes. Signl five: ch1 :9 chnge. :8 lay/var: lead foreshadow. Bit jarring var: tempo inc. Ovrll effect signals middle of change 2 :19 transitn ;19 lay: drums too quick lay: great break. Puts focus on the change 3 :28 var: integration. Experimental but cool. :23 bass. lay/var: high-pitched sample really completes lay/var: good bridge - elements from change 4 :32 it. :26 both six: chng1 :12 var: increase in density :0 layer: rhodes distrated change 2 :20 var: timbre change a bit rough :17 lay: good drums change 3 :30 var: tempo up. :22 lay: bass timbre change 4 NA seven: ch1 :18 sound/lay: bad. synthy squelch doesn't work :17 good. Beat drops in well. good. Second drop in solidifies it's change 2 :16 var: OK. bits of drums enter. :22 presence. change 3 change 4 NA eight: ch1 :7 var: odd. Key change :3 lay: Ok. Synth intro. change 2 :15 var/lay/focus: bad. Bass sustain. :24 sound: odd. squelchy bass lay: piano drops out (R: aft- change 3 :20 var/lay: bad. unusual timbre/rhythm. Lead :45 shadowed) change 4 :30 var/lay: cymbals highlight drums nine: ch1 :12 var: good. Key change, bass. :8 lay: good. Foreshadow rhodes. var/lay: good. Counter melody from change 2 :16 var/lay: interesting. Rhodes/vibes :11 target lead var/lay: dnse tonl texture and key chnge. change 3 :23 tempo up :17 lay: good. Drums interesting change 4 :30 var/lay: tempo, overall intensity increase. :22 lay/var: good. Break.

Figure E-6 Important perceived changes in LEMorpheus and human morphs of the second example. Abbreviations continued from previous figures.

308

Participant Change in example 3 LEMorpheus Changes in example 3 Human layr/var: melody doubling timbre is subtly one: chng1 :10 layers:new timbre, more aggressive :8 introduced change 2 :18 layers: stabilises due to bass. :17 layers: sudden alteration of drum part layers: bass clearly announces new change 3 NA :26 section change 4 NA var: rhythmc chnge (R: shuffle to two: chng1 :13 layers:high lead with same pattern :9 straight) change 2 :17 layers: bass – also shifts focus to the rhythm :12 layers/variation: intro of lead change 3 :19 lay/var: upwd transp of stretched pads nice. :17 layers: rhtyhm drops in variation: lead melody converges and is change 4 :24 appealing :24 layers: full groove (R: bass added) three: ch1 :12 layer: target comes in :5 layer: lead layer: gtr ends. Logical and impactful change 2 :15 fade: source guitar fades weakly. :9 place to end it var: pads transition well, darkening change 3 :18 target reached (R: actually, fair bit to go!) :15 whole piece change 4 NA :17 layer: drums enter too early lay/var: good that the guitar sound has four: ch1 :11 changed :7 layer: foreshadow lay/var: harmny reslvd. Tempo shift. Textre change 2 :17 open :10 var: glitch is annoying lay:var: reduction of src was gd. Thinning. change 3 :25 Good. :15 var: harmonic shift change 4 NA :18 layer: full drums fixes metre five: ch1 :9 var: shrtr loops in pads take focus from guitar :6 lay: lead fits natually var/focus: drum pat of targ shift focus from change 2 :12 guitar :11 var: shuffle to straight was too abrupt lay: good intro, but tempo increase too change 3 :17 var: more bass pulls towards the target :17 overt var/lay: staccato (R: lead) finalises target var: lead pattern splicing is excellent and change 4 :26 pattern :27 creative six: chng1 :11 lay: new timbre :6 layer: lead overlay. Unexpected. OK. lay: liked this bit (R: bass kicking in was change 2 :18 good) :14 var: confusing overall. var: bass notes lengths (R: subtle tempo layer: reduction and addition of drums change 3 :28 change) :18 adds stability. change 4 NA :24 layer: bass added. OK. seven: ch1 :10 lay: bad. target lead. Dischordant (R:?) :9 good. (R:swing to straight) var/lay: good. lead of srce ends, conflict change 2 :20 released :16 good. New beat drops in propely bad: melodic elements don't have to be change 3 :26 var/lay: good. Convergence. :14 here. change 4 NA eight: ch1 :12 var: sour key change in bass (R: muddiness) :9 var: bad glitch change 2 :18 var: good. Lead :13 lay/var: bad. lead drops int. change 3 :21 var: good. High perc. :15 var: bad. tempo increase . change 4 :30 var: tempo. Out of control :20 var: too much repetition nine: ch1 :7 lay/var: odd bad. Harmony :6 lay: bass conflicts change 2 :11 lay/var: odd bad. Harmony, density :13 var: harmony key modulation good lay: good. Synth intro through cross-over change 3 :16 lay/var: OK. Overall clearing up :7 good. lay/var: gd. Bass riff stdy rhythm. ovrall var: gd. Bass intrdced. Solid cause of change 4 :17 coherence :14 space for it

Figure E-7 Important perceived changes in LEMorpheus and human morphs of the third example.

309

Participant Change in example 4 LEMorpheus Changes in example 4 Human one: chng1 :7 var/lay: marimba "enters", but does not fit NA change 2 :15 var: continuous change in many parts is unsettling NA change 3 :22 new feel is finally settled on. NA change 4 NA NA aux: fills destabilise rhythmic two: chng1 :5 var/lay: high frequency introdcs new marmba pattrn :7 focus var/focus: offbeatsynth (R: change 2 :11 aux: fill smooths rhythmic transition (R:at :11 ??) :15 hocketing with drums) var/focus: marimba (higher change 3 :15 var/lay: higher frequency shifts focus to new bass :21 pitches) change 4 :25 var: offbeat creates new rhythmic focus :26 var/layer: marimba solidifies var: rhythm is stripped to make three: ch1 :8 var/lay: two conflicting keys (R: not like key shift) :12 room for target. change 2 :15 var: switch to 4/4 needs more pnctuatn, eg kick roll :16 var: kick resolves var: bass changes key and change 3 :22 var: 3/4 element have gone :18 rhythm change 4 NA :22 var: source rhythm dissappear four: ch1 :7 var: marimba. First hint :11 var: pitches are random change 2 :11 var: harmony bumpy :18 var: overall change was too late var: simplification/reduction is change 3 :17 var: rhythm thinning, had to be done :22 good change 4 :24 var: clarification of rhythm/meter NA five: ch1 :8 var: extension of source. Highlights change :6 non-existent jarring? var: good change in marimba change 2 :13 var: denser drums signal change :11 riff change 3 :15 var: effectve bridge via creation of new mrmba mel :18 var: clunky change in melody var: mrimba ovr bass effectve change 4 :23 var: bassline completes the morph nicely :22 (R: trg converged) six: chng1 :8 var: key/melody change. Bit jumbly :12 var: melody. OK change 2 :16 var: key/mel change. OK :14 var: drums. OK change 3 :23 var: reduction. Good :22 var: melody change smoother change 4 NA var/lay: gd. Change suddn. seven: ch1 :16 var: good. Melodic elements sit well :11 Natural, not forcd blend. change 2 :23 var: good. Target beat settles in :18 var/lay: good. Beat revealed var/lay: good. Stripped back. change 3 NA :22 Beat emphasised. change 4 NA eight: ch1 :11 var: gloomy. key change :0 nochange: bad. Polyrhythms var: gd. upwd transposition change 2 :14 var: thin. Melody drops out (R: shiftd off first beat) :17 brings out new pattern var: boring. Melody too change 3 :26 var: good. Marimba drops out. :20 repetitive var: boring. Melody too change 4 NA :25 repetitive nine: ch1 :8 var: good. interesting key change :0 nochange: bad. Rhythm out. var: rhythm lining up. Overall change 2 :16 var: natural :11 changes good. change 3 :22 var: bad. Harmony too simple. :16 lay: cool riff (R: from target?) change 4 :4bars change evry four bars :21 var/lay: drums highlight melody.

Figure E-8 Important perceived changes in LEMorpheus and human morphs of the fourth example.

310

Example 1 LEMorpheus one keep drums steady. changes in chunks. Write new material two remove dense activity in transition. Make a contrapuntal movement three chunks (series of hard cuts), not gradual changes, and pull out sounds. four No substitution - cut to original bridge, increase complexity to make trg less predictable. Cut to target five thin it. Fade out the saw lead. Fade in organ later six thin out notes in transition. Make transition longer seven thin the synth. They don't warp well and aren't necessary. Longer. Marry rhythm better eight change timbre early. Use new timbres to develop rhythm (R: unclear wether "as" rhythm, or to "punctuate" it) nine attenuate bass (already too loud in target). Explore the interesting key changes for longer. Example 2 LEMorpheus one use a number of more distict change points (chunking) rather than a continual smooth transition. two consciously establish patterns rather than change too much. Reduce texture and activity of both sections three chunkify fade of source material so it cuts out at 20 seconds. four extend the source to make the transition shorter. Too meandering during transition. Rewrite whole thing. five No improvements. six make it slower seven (R: chunkify) more succinct. Less fade, more drops. eight leave some of the old timbres going until end, speed them up to new tempo. Not much else. nine bring lead in earlier, to reference the final melody Example 3 LEMorpheus one drum fills to punctuate change in feel. two Same as before three Kill the guitar at 0:4 and let cross-fade the rest. four Cut source early. Long fade up of target. five No changes six solo bass & drums. extends melody based on existing middle section. seven Chunkify eight fade in, fade out. Change tempo gradually throughout, rather than right at the end nine tidy up the start. Remove clutter in the bottom end. Example 4 LEMorpheus one less subtle movements, more stepped changes. two NA three straight cut with a roll just before hand. four Thin the source melody/harmony sooner to work with the shifting groove five None six remove marimba. Longer, work with less instruments in parallel. seven make it longer. Strip it back to only beat and one other element, to make it phat. eight NA nine De-chunkify. "put in a few overlapping sections to move away from the group of loops put together feel"

Figure E-9 Proposed modifications to each of the four LEMorpheus morphs from each of the nine participants.

311

Example 1 Human morph one smooth out brkdwn w/ src & trg material to reduce shock. not need to be extreme - can be more parts in it. two maintain common elements. Decrease the intensity of their activity. Smooth out texture and harmony three not much. Small sounds from source to appear over the target. four faster cut. Addition of new sound just for bridge. five shift the entry of target forward. Strip back some of the overlapping sounds early on. six stretch out the length seven No change eight take out the break. Keep honk. Changes too dramatic, too short (nothing to hang on to). nine thin back in first half. Cut honk until target. Build it up more to the breakdown. Example 2 Human morph one maintain momentum of drums. Fix tonality of early snippet of trg to be that of src so there is more stability. two approach is not particularly dependent on the specific music - maintain continuous elements, thin it out three move target back half a bar. four possibly add new element that will disappear. Make source fatter, make target leaner and meaner. five remove second half of the lead loop (first half is OK). Or, reduce the dynamics of the second half. six pull out the rhodes. Total rewrite seven No change eight (R: introduce target sooner) put more techno into it, because it doesn't change sufficiently nine remove notes from target vibes, post breakdown for clearer harmony Example 3 Human morph one NA two not as acceptable as previous (Lemu) three hold off target rhythm just for a little bit. four change cheezy organ. Smooth the source. Rewrite the ET theme. five Fix the suffle to straight problem. Stretch the tempo increase so it isn't as extreme six introduce drums earlier, reduce instruments. Different source material. seven loose the melodic elements in the middle. Thin it. eight speed up slower. Introduce timbre more gradually nine tidy up bas conflictions Example 4 Human morph one NA two NA three No segueing four delete elements, not exchange them. Simplify for clarity. Reduction then addition, rather than substitution. five would not change the marimba at 12. thin it out six reduce the hats. Thin out data sooner. seven No changes eight add more melody at the end. Change cross-rhythms and use them as a feature. nine tidy up the snare (R: an artifact from the high EQ of the composer’s mixing)

Figure E-10 Proposed modifications to each of the four human morphs from each of the nine participants.

312 Bibliography

Abrams, S., Oppenheim, D., Pazel, D. and Wright, J. 1999. "Higher-level Composition Control in Music Sketcher". Proceedings of the International Computer Music Conference 1999. Bejing, ICMA: pp. 13-16. Adam, S. 2002. "Control and Mapping Strategies for HybriD". Proceedings of the Australasian Computer Music Conference 2002. Melbourne, ACMC: pp. 1-5. Alpern, A. 1995. “Techniqes for Algorithmic Composition”. Hampshire, Hampshire University. accessed on 21st of February, 2004. site: alum.hampshire.edu/~adaF92/algocomp/algocomp95.html#Notes_1 Álvarez, D. A. 2005. “Pymprovisator project is suspended”, Source Forge. accessed on 10th of February, 2005. site: pymprovisator.sourceforge.net/ Amitani, S. and Hori, K. 2002. "Supporting Musical Composition by Externalizing the Composer's Mental Space". Fourth Conference on Creativity and Cognition. Loughborough, ACM Press: pp. 165-172. Andersen, T. H. 2005. "In the Mixxx: Novel Digital DJ Interfaces". Conference on Human Factors in Computing Systems CHI '05. Portland, USA, ACM Press: pp. 1136-1137. Anonymous 2007. “bjork remix website”. accessed on 31st of May, 2007. site: sunday-in-the-park.com/bjork/ Apple 2006. “Clint Bajakian: Adaptive Musical Architectures”. accessed on 8th of May, 2006. site: www.apple.com/pro/audio/bajakian/index3.html Ariza, C. 2005. "Navigating the landscape of computer aided algorithmic composition systems". Proceedings of the International Computer Music Conference 2005. Barcelona, ICMC: p. 765. Arlin, M. I. 2000. "Metric Mutation and Modulation: The Nineteenth-Century Speculations of F. J. Fétis". Journal of Music Theory. 44 (2): pp. 261-322. Arom 1991. "African polyphony and polyrhythm". Cambridge, Cambridge University Press. Bachmann, T. 2001. "Autogam". France. accessed on the 8th of March, 2004. site: autogam.free.fr Baddeley, A. 1994. "The Magical Number Seven: Still Magic After All These Years?" Psychological Review. 101 (2) : pp. 353-356. Bartók, B. 1950. "The Influence of Peasant Music on Modern Music". Tempo. 14 (2): pp. 19-24. Battier, M. 2003. "A Constructivist approach to the analysis of electronic music and audio art - between instruments and faktura". Organised Sound. 8 (3): pp. 249-255. Beard, D. and Gloag, K. 2005. "Musicology: the key concepts". Abingdon, Oxon, Routledge. Beatnik 2002. “eXtensible Music Format”, Beatnik. accessed on the 5th of March 2004. site: www.beatnik.com/pdf_files/xmf_datasheet.pdf

307 Bel, B. and Kippen, J. 1992. "Bol Processor grammars". Understanding Music With AI, M. Balaban, K. Ebcioglu and O. Laske. Menlo Park, AAAI Press: pp. 366-400. Benadon, F. 2004. "Towards a theory of tempo modulation". Proceedings of the 8th International Conference on Music Perception and Cognition: ICMPC8. Evanston, IL, ICMPC: pp. 563-567. Bencina, R. 2004. "AudioMulch". Melbourne, AudioMulch. accessed: 1st of August, 2007. site: www.audiomulch.com Bencina, R. 2005. "The Metasurface - Applying Natural Neighbour Interpolation to Two-to-Many Mapping". New Interfaces for Musical Expression NIME05. Vancouver, BC, Canada, NIME: pp. 101-104. Bencina, R., Kaltenbrunner, M. and Costanza, E. 2006. “reacTIVision software”, Pompeu Fabra University. accessed on 15th of August, 2006. site: www.iua.upf.es/mtg/reacTable/?software Berg, P. 1992. “AC Toolbox”, The Hague, . accessed on 9th of July, 2007. site: www.koncon.nl/ACToolbox/ Berger, J. 1995. “Morphing Music at a high-level of structure”, CCRMA. accessed on 6th of September, 2005. site: www.ccrma.stanford.edu/~brg/research/morph/morph.html Berry, R., Makino, M., Hikawa, N. and Suzuki, M. 2003. "The Augmented Composer Project: The Music Table". International Symposium on Mixed and Augmented Reality ISMAR 2003. Tokyo, Japan: pp. 338-339. Berry, R., Makino, M., Hikawa, N., Suzuki, M. and Inoue, N. 2006. "Tunes on the table". Multimedia Systems. 11 (3): pp. 280-289. Berry, R., Rungarityotin, W., Dorin, A., Dahlstedt, P. and Haw, C. 2001. "Unfinished Symphonies - songs of 3 1/2 worlds". Artificial Life Models for Musical Applications ECAL2001. Prague, Editoriale Bios: pp. 51-64. Biles, A. 2002. "GenJam: Evolutionary Computation Gets a Gig". Third Conference on Information Technology Curriculum CITC3. Rochester, New York, SIGITE. accessed on 2nd of August, 2007. site: www.it.rit.edu/~jab/CITC3/GenJamPaper.pdf Biles, A. 2004. “GenJam”. accessed on 20th of April, 2004. site: www.it.rit.edu/~jab/GenJam.html Biles, A. 2008. “Evolutionary Music Bibliography”. accessed on: 15th of February, 2008. site: www.it.rit.edu/~jab/EvoMusic/EvoMusBib.html Brandon, A. 2003. "Adaptive Audio: a report". in The fat man on game audio: tasty morsels of sonic goodness, G. Sanger. Indianapolis, Indiana, New Riders Publishing: pp. 194-209. Brandon, A. 2006. “Engine Roundup”, MIDI Manufacturers Association. accessed on 14th of February, 2006. site: www.iasig.org/pubs/industry/engines.shtml Bregman, A. S. 1990. "Auditory scene analysis: the perceptual organization of sound". Cambridge Mass., MIT Press. Brewster, B. and Broughton, F. 1999. "Last night a DJ saved my life". London, Headline Book Publishing. Brown, A., Sorenson, A. and Dillon, S. 2004. "Jam2Jam". Exploding Art. accessed on 2nd of August, 2007. site: www.explodingart.com

308 Brown, A. R. 2007. "Software development as music education research". International journal of Education and the Arts. 8(6): pp. 1-13. accessed on 20th of August, 2007. site: ijea.asu.edu/v8n6/ Burke, P., Polansky, L. and Rosenboom, D. 2005. “HMSL”, SoftSynth. accessed on 7th of March, 2006. site: www.softsynth.com/hmsl/ Burton, A., Piché, J., Philips, D., Bourgeois, S., Champlain, Y. d., Beck, B., Sobolik, P. and Steiner, H. 2004. "Cecilia 2.0.5", Sourceforge. accessed: 22nd of June 2004. site: cecilia.sourceforge.net/index.html Butler, M. 2006. "Unlocking the groove". Bloomington, Ind., Indianna University Press. Cage, J. 1938. "Metamorphosis fof piano I-V". in John Cage: Early Piano Music [Compact Disc], Henk, H. 2002., ECM Records. Cakewalk 2004. “Writing CAL Programs”. Massachusetts, Twelve Tone Systems. accessed on 22nd of June, 2004. site: www.cakewalk.com/devxchange/writingcal.asp Callender, C. 2004. "Continuous Transformations". Music Theory Online, Society for Music Theory. 10 (3): pp. 2-44. Canazza, S., Poli, G. D., Drioli, C., Roda, A. and Vidolin, A. 2001. "Expressive Morphing for Interactive Performance of Musical Scores". First International Conference on WEB Delivering of Music (WEDELMUSIC`01), Florence, Italy, IEEE Computer Society: pp. 116- 122. Carter, E. 1960. "Shop talk by an american composer". The Musical Quarterly. XLVI: pp. 189-201. Chamagne, M. and Ninh, L. Q. 2005. Max Object Database, MaxObjects.com. accessed on 9th of February, 2005. site: www.maxobjects.com/ Chapman, J. 2005. "The creative practice of composition in a cross-cultural musical setting". Speculation and Innovation: applying practice led research in the Creative Industries. Queensland University of Technology. accessed on: 2nd of August. site: www.speculation2005.net Chapman, J. 2006. “African Composition Vocabulary”. accessed on 24th of May, 20067. site: www.jimrhythm.com/learning/phd/CompVocab.htm Chen, C. C. J. and Miikkulainen, R. 2001. "Creating Melodies with Evolving Recurrent Neural Networks". International Joint Conference on Neural Networks. Washington, IEEE. accessed: 2nd of August, 2007. site: nn.cs.utexas.edu/downloads/papers/chen.ijcnn01.pdf Chew, G. and McKinnon, J. 2007. “Centonization”, Grove Music Online. accessed on 23rd of March, 2007. site: www.grovemusic.com Chokalis, E. 1999. “What are DNA groove templates”. Toronto, Numerical Sound. accessed on 1st April, 2007. site: www.numericalsound.com Chomsky, N. 1956. "Three models for the description of language". IRE Transations on Information Theory. 2: pp. 113-124. Chomsky, N. 1957. "Syntactic structures". The Hague, the Netherlands, Mouton. Clark, A. 2007. “Defining Adative Music”, Gamasutra. accessed on 18th of April, 2007. site: www.gamasutra.com/features/20070417/clark_01.shtml

309 Clarke, E. 2004. "Empirical methods in the study of performance". Empirical musicology: aims, methods, prospects, N. Cook and E. Clarke. Oxford, Oxford University Press: pp. 77-102. Clarke, E. and Cook, N. 2004. "Empirical Musicology". Oxford, Oxford University Press. Cliff, D. 2000. "Hang the DJ: Automatic Sequencing and Seamless Mixing of Dance Music", Hewlett Packard. accessed on: 2nd of August, 2007. site: www.hpl.hp.com/techreports/2000/HPL-2000-104.html Cohn, R. 1992. "Transpositional Combination of Beat-Class Sets in Steve Reich's Phase-Shifting Music". Perspectives of New Music. 30 (2): pp.146-177. Collins, N. 2001. "Algorithmic composition methods for breakbeat science". Proceedings of Music Without Walls. De Montfort University. accessed on: 2nd of November, 2007. site: www.cus.cam.ac.uk/~nc272/papers/pdfs/acmethodsforbbsci.pdf Collins, N. 2006. [Pers. Comm.] “Interpolation vs Morphing”. Brisbane. Cook, P. 1998. "Toward the perfect audio morph? singing voice synthesis and processing". Proceedings of the effects workshop (DAFX). Barcelona. accessed on: 2nd of November, 2007. site: www.iua.upf.es/dafx98/papers/ Cope, D. 1991. "Computers and musical style". Computer music and digital audio series ; v. 6. Madison, Wis., A-R Editions. Cope, D. 1996. "Experiments in Musical Intelligence". Computer Music and Digital Audio Series; v. 12. Madison, Wis., A-R Editions. Cope, D. 1997. “Techniques of the contemporary composer”. New York, London Schirmer Books; Prentice Hall International. Cope, D. 2004. “David Cope”, University of Santa Cruz. accessed on 20th of October, 2004. site: arts.ucsc.edu/faculty/cope/ Cope, D. 2005. "Computer Models of Musical Creativity". Mass., MIT. Cope, D. and Hofstadter, D. R. 2001. "Virtual music : computer synthesis of musical style". Cambridge, Mass., MIT Press. Cost-287 2007. “ConGAS Gesture Controlled Audio Systems”. accessed on 17th of April, 2007. site: www.cost287.org/ Courson, H. d. and Akendengue, P. 1996. [Compact Disc] "Lambarena: Bach to Africa: an hommage to Albert Schweitzer". Paris, Sony Classical. Cycling'74 2004. "MAX/MSP". San Francisco, Cycling'74. accessed on: 2nd of November, 2007. site: www.cycling74.com/products/maxmsp Damerau, F. J. 1964. "A technique for computer detection and correction of spelling errors." Communications of the ACM 7(3): pp. 171-176. Dannenberg, R. 2002. "Nyquist", Carnegie Mellon University. accessed on: 21st of February, 2005. site: www-2.cs.cmu.edu/~music/music.software.html Dannenburg, R. 1984. "An online algorithm for realtime accompaniment". International Computer Music Conference ICMC1984. Paris, ICMA: pp. 193-198. Dechelle, F., Schwarz, D., Schnell, N. and Tisserand, P. 2004. "jMax", Source Forge/IRCAM. accessed on: 2nd of August, 2007. site: sourceforge.net/projects/jmax/

310 Desain, P. and Honing, H. 1989. "Quantization of Musical Time: A Connectionist Approach". Computer Music Journal. 13 (3): pp. 56-66. Desain, P. and Honing, H. 1992. "Music, mind and machine : studies in computer music, music cognition and artificial intelligence". Amsterdam, Thesis Publishers. Desain, P., Honing, H., Michon, J. A., Sadakata, M., Schouten, Y. and Trilsbeek, P. 2005. “Music Mind Machine Publications”, NICI. accessed on 14th of February, 2005. site: www.nici.kun.nl/mmm/publications.html Deutch, D. (ed) 1982. “The psychology of music”. (1st Edition). New York, Academic Press (now owned by Elsevier). Didkovsky, N. 1997. “Shubert Impromptu Morpher”. accessed on 16th of February, 2006. site: www.punosmusic.com/pages/schubert/schubertapplet.html Didkovsky, N. and Burke, P. 2001. "Java Music Specification Language, an introduction and overview". International Computer Music Conference ICMC2001. La Habana, Cuba. accessed on: 2nd of August, 2007. site: www.doctornerve.org/nerve/pages/articles.htm Didkovsky, N. and Burke, P. 2004. “JMSL Tutorial: JScore Implementing your own Binary Copy Buffer Transform, part two”. accessed on: 16th of February, 2006. site: www.algomusic.com/jmsl/tutorial/jscoretoot07.html Dodge, C. and Jerse, T. A. 1997. "Computer Music". New York, Schirmer Books. Eck, D. 2003. “Composing Music with LSTM -- Blues Improvisation”. accessed on: 20th of August, 2004. site: www.iro.umontreal.ca/~eckdoug/blues/ Eck, D. and Schmidhuber, J. 2002. "A First Look at Music Composition using LSTM Recurrent Neural Networks". Manno, Switzerland, Instituto Dalle Molle di studi sull' intelligenza artificiale. Edlund, J. 2004. “The Virtues of the Musifier: A Matter of View”, Interamus. accessed on: 16th of February, 2006. site: www.interamus.com/techTalk/musificationAndView.html Edlund, J. 2006. “Morphing Demo Server”, InterAmus Music Systems. accessed on: 13th of April, 2006. site: www.interamus.com Electronic Arts 1998. [PC Computer Game] "Need for Speed III: Hot Pursuit", Electronic Arts. Emagic 2004. “Logic Pro 6 - Technical data”, Emagic. accessed on: 22nd of June, 2004. site: www.emagic.de/ Experia 2006. “Midicretor+”, Experia. accessed on: 17th of April, 2007. site: www.experia-innovations.co.uk/ Factor Five 2000. "MusyX", Factor Five. accessed on: 8th of March, 2003. site: www.factor5.com Fanshawe, D. 2007. “The official African sanctus website”. accessed on: 1st of June, 2007. site: www.africansanctus.com Farbood, M. 2000. “Hyperscore: A New Approach to Interactive, Computer- Generated Music”. Massachusetts, MIT. accessed on 27th of April 2004. site: web.media.mit.edu/~mary/thesisprop/ Fay, T. M., Selfon, S. and Fay, T. J. 2003. "DirectX 9 Audio Exposed". Plano, Texas, Wordware.

311 Ferris, D. 2000. "C. P. E. Bach and the art of strange modulation". Music Theory Spectrum. 22 (1): pp. 60-88. Foster, A. 2007. “Soundhive audio production for multimedia”. Widgee, Australia, Soundhive. accessed on: 4th of June, 2007. site: www.soundhive.com/ Fraietta, A. 2001. "Algorithmic Composer". Sydney Australia. accessed on: 16th of April, 2004. site: www.users.bigpond.com/angelo_f/AlgorithmicComposer/ Fraser, M. 2002. [Masters Thesis] "An Automatic Software DJ". Software Engineering. Brown, G. (advisor).Sheffield, University of Sheffield. Fujio, T. and Shiizuka, H. 2003. "A system of mixing songs for automatic DJ performance using genetic programming". 6th Asian Design International Conference. Tsukuba. accessed on: 2nd of November, 2007. site: www.idemployee.id.tue.nl/g.w.m.rauterberg/conferences/CD_doNotOpen/ ADC/final_paper/496.pdf Gannon, P. 2004. "Band in a Box". Victoria, Canada, PG Music Inc. accessed on: 22nd of June, 2004. site: www.pgmusic.com. Gartland-Jones, A. 2002. "Can a Genetic Algorithm Think Like a Composer?" Proceedings of the 5th International Conference on Generative Art, C. Soddu. Politecnico di Milano University, Milan: pp14.11-14.12. Georges, A. and Flohr, F. 2002. "System and methods for creating, modifying, interacting with and playing musical compositions". US Patent No: 7015389. Medialab Solutions LLC. Washington, DC, US Patent and Trademark Office. Gillies, M. 2007. “Bela Bartok”, Grove Music Online. accessed on: 27th of March, 2007. site: www.grovemusic.com Gitlin, L. 2004. “DJ Makes Jay-z Meet Beatles”, Rolling Stone Magazine. accessed on: 24th of March, 2007. site: www.rollingstone.com/news/story/5937152/dj_makes_jayz_meet_beatles Glass, P. 2000. "Metamorphosis I-V". in Glass Cage, [Compact Disc] Arabesque Recordings. Grant, C. P. 1977. "The real relationship between Kirnberger's and Rameau's ‘Concept of the Fundamental Bass’." Journal of Music Theory 21(2): pp. 324-338. Green, D. 1979. "Form in tonal music: an introduction to analysis". New York; Holt, Rinehart and Winston. Grove 2007. “Medley”, Grove Music Online. accessed on: 31st of May, 2007. site: www.grovemusic.com Hannon, E., Snyder, J., Eerola, T. and Krumhansl, C. 2004. "The role of melodic and temporal cues in perceiving musical meter". Journal of Experimental Psychology / Human Perception and Performance. 30 (5): pp. 956-974. Hawkins, S., Scott, D. and Sweeney-Turner, S. 2007. “Pop musicology online”. accessed on 29th of May, 2007. site: www.popular-musicology- online.com/ Hild, H., Feulner, J. and Menzel, D. 1992. "HARMONET: a neural net for harmonizing chorals in the style of J.S. Bach". in Advances in Neural

312 Information Processing v.4, R. P. Lippmann, J. E. Moody and D. S. Touretzky (eds). San Francisco, CA, Morgan Kaufmann. Hill, B. 2005. "Breaking down the breakdown: the use of timbres in contemporary dance music sub-genres". Proceedings of the Australasian Computer Music Conference ACMC2005: Generate and Test. Brisbane, QUT, ACMA: pp. 75-84. Hiller, L. and Isaacson, L. 1958. "Musical Composition with a High-Speed Digital Computer". in Machine Models of Music. Schwanauer, S. and Levitt, D. 1993. Cambridge, Mass. The MIT Press. Hindemith, P. 1989. [Compact Disc] "Symphonic metamorphosis of themes by Weber". Hindemith. Cleveland, OH, Telarc. Hirata, K. and Aoyagi, T. 2003. "Computational Music Representation Based on the Generative Theory of Tonal Music and the Deductive Object-Oriented Database". Computer Music Journal. 27 (3): pp. 73-89. Hirst, D. 2003. "Developing a cognitive framework for the interpretation of acousmatic music". Australasian Computer Music Association Conference ACMC2003: Converging Technologies:, Perth, ECU, ACMA: pp. 43-57. Horner, A. and Goldberg, D. E. 1991. “Genetic Algorithms and Computer Assisted Composition”. Proceedings of the International Computer Music Conference. pp. 479-482. Hunt, A., Kirk, R. and Neighbour, M. 2004. "Multiple Media Interfaces for Music Therapy". IEEE Multimedia. 11 (3): pp. 50-58. Huron, D. 2006. "Sweet Anticipation". Cambridge, Mass., MIT Press. IBM Computer Music Center 1999a. [Computer Program] "Music Sketcher". in Composing Music with Computers. Miranda, E. 2001. Oxford, Focal Press. IBM Computer Music Center 1999b. [Help file] “Music Sketcher Help: What is smart harmony?”, IBM. in [Computer Program] "Music Sketcher". in Composing Music with Computers. Miranda, E. 2001. Oxford, Focal Press. Ingham, S. 1996. "MIDI Morphing: interpolation algorithms for Standard MIDI Files in the Patchwork environment". Australasian Computer Music Conference ACMC1996. Brisbane, QUT, ACMA. pp.1-3. Jacob, B. 1996. "Algorithmic composition as a model of creativity". Organised Sound. 1 (3): pp. 157-165. James, R. (AKA Aphex Twin) 2003. [Compact Disc] "Twenty-six mixes for cash", Warp Records. Jehan, T. 2005. [PhD Thesis] "Creating Music by Listening". Program in Media Arts and Sciences. Machover, T. (advisor) Massachusetts, MIT. accessed on: 2nd of August, 2007. site: web.media.mit.edu/~tristan/phd/ Jordà, S., Kaltenbrunner, M., Geiger, G. and Bencina, R. 2005. "The ReacTable*". International Computer Music Conference ICMC2005. Barcelona, Spain, UPF, ICMA: pp. 579-582. Kaltenbrunner, M. 2006. “Tangible musical interfaces - music tables”, UPF. accessed on: 6th of December, 2006. site: mtg.upf.es/reactable/?related

313 Kato, H. 2006. “ARToolkit”, Washinton University. accessed on: 6th of December, 2006. site: www.hitl.washington.edu/artoolkit/ Kay, S. 2004. “What is KARMA?”, Karma Lab. accessed on: 15th of April 2004, 2004. site: www.karma-lab.com/KARMA/What_Is_KARMA.html Keller, R. 2003. [Masters Thesis] "Mapping the soundscape: rhythm and formal structure in electronic dance music". School of Music. Clendinning, J.P. (advisor) Tallahassee, Florida State University. Kerman, J. and Tomlinson, G. 2003. "Listen". 5th ed. New York, NY, Bedford/St. Martins. Kernfield, B. 2007. “Beats in Jazz”, Grove Music Online. accessed on: 9th of March, 2007. site: www.grovemusic.com Kirchmeyer, H. 1968. "On the historical constitution of a rationalistic music". Die Reihe. 8: pp. 11-24. Klippel, C. 2005. “Karma - A new, upcoming Dataflow-Environment ala pd/jmax/max”. accessed on 14th of February, 2005. site: karma.mamalala.de/ Krueger, M. 1991. "Artificial reality II". Reading, Mass, Addison-Wesley. Krumhansl, C. 1979. "The psychological representation of musical pitch in a tonal context". Cognitive Psychology. 11: pp. 346-374. Krumhansl, C. 1983. "Perceptual structures for tonal music". Music Perception. 1 (1): pp. 28-62. Krumhansl, C. L. 2000. "Rhythm and pitch in music cognition". Psychological Bulletin. 126 (1): pp. 159-179. Laurson, M. 1996. "Patchwork". Helsinki, Sibelius Academy. Lerdahl, F. and Jackendoff, R. 1983. "A generative theory of tonal music". Cambridge, Mass, MIT Press. Letz, S. 2004. “Midishare”, Grame. accessed on 14th of April, 2004. site: www.grame.fr/MidiShare/ Lewin, D. 1981. "On harmony and meter in Brahm's Op. 76, No.8". 19th-Century Music. 4 (3): pp. 261-265. Lewis, G. 1999. "Interacting with Latter-Day Musical Automata". Contemporary Music Review. 18 (3): pp. 99-112. Lewis, G. 2007. “Voyager”, EMF. accessed on 27th of April, 2007. site: emfinstitute.emf.org/exhibits/voyager.html London, J. 2007. “Rhythm: complex rhythms and complex metres”, Grove Music Online. accessed on: 9th of March, 2007. site: www.grovemusic.com Loy, G. and Abbott, C. 1985. "Programming languages for computer music synthesis, performance and composition". ACM Computing Surveys (CSUR). New York, ACM Press. 17 (2): pp. 235-265. M2 Communications 1997. "SSEYO's interactive Koan Music Control to change Internet music forever". M2 Presswire. MadWaves 2004. [Device] "MadPlayer". London, MadWaves. Maniates, M. R., Branscombe, P. and Freedman, R. 2007. “Quodlibet”, Grove Music Online. accessed on 23rd of March, 2007. site: www.groveonline.com Marcus, S. L. 1992. "Modulation in arab music: documenting oral concepts, performance rules and strategies". Ethnomusicology. 36 (2): pp. 171-195.

314 Marques, M., Oliveira, V., Vieira, S. and Rosa, A. C. 2000. "Music composition using genetic evolutionary algorithms". Proceedings of the 2000 congress on evolutionary computing, IEEE. 1: pp. 714-719. Mathews, M. V. and Rosler, L. 1969. "Graphical Language for the Scores of Computer-generated Sounds". in Music by Computers, H. V. Foerster and J. W. Beauchamp. New York, John Wiley and Sons, Inc.: pp. 84-114. Mazzola, G. 2002. "The topos of music: geometric logic of concepts, theory and performance". Basel, Birkhäuser Verlag. McConnell, S. 1998. "Software Project Survival Guide". Washington, Microsoft Press. McCormack, J. 2003. "Evolving sonic ecosystems". Kybernetes. 32 (1): p. 184. McCormack, J. 2006. [Pers. Comm.] “Interpolation vs Morphing”. Brisbane. McNamara, C. 2007. “Basics of Conducting Focus Groups”, Management Help. accessed on: 17th of May, 2007. site: www.managementhelp.org/evaluatn/focusgrp.htm Merriam, A. P. 1964. "The anthopology of music". Illinois, Northwestern University Press. Meyer, L. 1956. "Emotion and Meaning in Music". Chicago, University of Chicago Press. Michie, C. 2003. “Frank Zappa Recording History”, Penton Media. accessed on: 23rd of March, 2007. site: mixonline.com Microsoft 2004. “DirectMusic Producer Download”. accessed on: 12th of October, 2004. site: www.microsoft.com/downloads Miranda, E. R. 2001. "Compsing Music with Computers". Music Technology v. 12, F. Rumsey (ed.). Oxford, Focal Press. Miranda, E. and Biles, A. (eds) 2007. "Evolutionary computer Music". London: Springer Verlag. MMA 2003. “Adaptive Audio”. accessed on: 23rd of January, 2005. site: www.adaptiveaudio.org/ Momeni, A. and Wessel, D. 2003. "Characterizing and Controlling Musical Material Intuitively with Geometric Models". Proceedings of the 2003 Conference on New Interfaces for Musical Expression NIME2003. Montreal, Canada: pp. 54-62. Moroni, A., Manzolli, J., Van Zuben, F. and Gudwin, R. 1990. "Vox Populi: An Interactive Evolutionary System for Algorithmic Music Composition". Leonardo Music Journal, 10: pp. 49-54. Mozer, M. 1994. "Neural network music composition by prediction: Exploring the benefits of psychoacoustic constraints and multi-scale processing". Connection Science. 6: pp. 247-280. accessed on: 2nd of November, 2007. site: www.cs.colorado.edu/~mozer/papers/reprints/music.pdf Muzzulini, D. 1995. "Musical Modulation by Symmetries". Journal of Music Theory. 39 (2): pp. 311-327. Native Instruments 2007. “Reaktor”, Native Instruments. accessed on 24th of April, 2007. site: www.nativeinstruments.de Nattiez, J. J. 1990. "Music and discourse: towards a semiology of music". Princeton, NJ, Princeton University Press.

315 Neil, C. 2005. "Adaptive audio in games". Australasian Computer Music Conference ACMC2005: Generated and Test. Brisbane, QUT, ACMA: p. 6. Neill, B. 2002. "Pleasure beats: rhythm and the aesthetics of current electronic music". Leonardo Music Journal. 12: pp. 3-6. Noll, T. 2001. “Geometry of Chords”, Technical University of Berlin. accessed on 31st of May, 2007. site: flp.cs.tu-berlin.de/~noll/GeometryOfChords.pdf Olofson, D. 2005. "Audiality", Olofson.net. accessed on: 8th of February, 2005. site: audiality.org. Oppenheim, D. 1993. "Slappability: A New Metaphor for Human Computer Interaction". Music Education: An Artificial Intelligence Approach AI- ED93, Edinburgh, Scotland: pp. 92-107. Oppenheim, D. 1997. "Interactive system for compositional morphing of music in realtime". US Patent No: 533636, IBM, Washington, DC, U.S. Patent and Trademark Office. Oppenheim, D. 2007. [Pers. Comm.] “Email discussion on timbre integration” Oppenheim, D. 1995. "Demonstrating MMorph: A System for Morphing Music in Real-time". International Computer Music Conference ICMC1995. Banff Canada, ICMA: pp. 479-480. Orpen, K. S. and Huron, D. 1992. "Measurement of Similarity in Music: A Quantitative Approach for Non-parametric Representations." Computers in Music Research 4: pp. 1-44. Pachet, F. 1997. "Computer analysis of jazz chord sequences: is solar a blues?" in Readings in music and artificial intelligence, E. R. Miranda (ed.), Amsterdam, Harwood Academic Publishers. Pachet, F. 1999. "Surprising Harmonies". International journal on computing anticipatory systems. 4. accessed on: 2nd of August. site: www.csl.sony.fr/downloads/papers/1999/pachet-casys1999.pdf Pachet, F. 2002. "Playing with Virtual Musicians: the Continuator in Practice". IEEE Multimedia. 9: pp. 77-82. Pachet, F. 2002. "The Continuator: Musical Interaction with Style". International Computer Music Conference ICMC2002. Göteborg, Sweden, ICMA: pp. 211 - 218. Pachet, F. 2004. "Beyond the cybernetic jam fanasty: the continuator". IEEE Computers Graphics and Applications. 24: p. 31-35. Pachet, F. 2004. “Continuator 1: Raise the machine”, Sony Computer Science Laboratory. accessed on: 20th of October, 2004. site: www.csl.sony.fr/~pachet/Continuator/index.html Pachet, F. 2005. [Pers. Comm.] “Discussion at Live Algorithmic Music workshop”. Goldsmith College, London. Paine, G. 2004. "Gesture and Musical Interaction: Interactive Engagement Through Dynamic Morphology". New Interfaces to Musical Expression NIME04. Shizouka, Japan, NIME: pp. 80-76. accessed: 2nd of August, 2007. site: www.nime.org/2004/NIME04/paper/NIME04_2A03.pdf Paine, G. 2007. “Installation Works”. accessed on: 27th of April, 2007. site: www.activatedspace.com/Installations/Installations.html

316 Papadopoulos, G. and Wiggins, G. 1999. "AI Methods for Algorithmic Composition: A Survey, A Critical View, and Future Prospects". Symposium on Musical Creativity AISB'99. Edinburgh, Scotland, UK. accessed on: 2nd of August, 2007. site: citeseer.ist.psu.edu Paul, L. J. 2003. “Audio Prototyping with Pure Data”, Gamasutra. accessed on 9th of February 2005. site: www.gamasutra.com/resource_guide/20030528/paul_pfv.htm Pearce, M. T. and Wiggins, G. A. 2001. "Towards a framework for the evaluation of machine compositions". Proceedings of the Symposium on Artificial Intelligence and Creativity in the Arts and Sciences AISB'01. Brighton, UK, SSAISB: pp. 22-32. Pearce, M. T., Meredith, D., Wiggins, G. A. (2002). "Motivations and methodologies for automation of the compositional process." Musicae Scientae 6(2): pp. 119-147. Perry, D. 2003. “Amplitude”, IGN Entertainment. accessed on: 22nd of June, 2004. site: ps2.ign.com/articles/390/390620p1.html PGMusic 2004. “Band In A Box 2004”. accessed on: 20th of October, 2004. site: www.pgmusic.com/products_bb.htm Phon-Amnuaisuk, S., Tuson, A. and Wiggins, G. 1999. "Evolving musical harmonisation". International Conference on Adaptive and Natural Computing Algorithms ICANNGA'99. Slovenia. accessed on: 2nd of August, 2007. site: www.doc.gold.ac.uk/~mas02gw/papers/ICANNGA99.pdf Plogue 2004. "Bidule". accessed on: 2nd of August, 2007. site: www.plogue.com Poel, B. v. d. 2005. “Musical MIDI Accompaniment”. accessed on: 10th of February, 2005. site: mypage.uniserve.com/~bvdp/mma/index.html Polansky, L. 1987. "Distance Music I-VI for any number of programmer/performers and live, programmable computer music systems". Perspectives of New Music. 25 (1): pp. 537-544. Polansky, L. 1991. [Compact Disc] "51 Melodies", Artifact Recordings/Frog Peak Music. Polansky, L. 1992. "More on Morphological Mutations:Recent Techniques and Developments". International Computer Music Conference ICMC1992. San Jose, ICMA: pp. 57-60. Polansky, L. 1996. "Bedhaya Guthrie/Bedhaya Sadra for Voices, Kemanak, Melody Instruments, and Accompanimental Javanese Gamelan". Perspectives of New Music. 34 (1): pp.28-55. Polansky, L. 1996. "Morphological Metrics". Journal of New Music Research. 25 (4): pp. 289-368. Polansky, L. 2006. “midifiles.demo.front”, Dartmouth college. accessed on: 8th of March, 2006. site: eamusic.dartmouth.edu/~larry/midifiles.demo.front.html Polansky, L. 2006. [Pers. Comm.] Email discussions. Polansky, L. and Erbe, T. 1996. "Spectral Mutation in Soundhack". Computer Music Journal. 20 (1): pp. 92-101. Polansky, L. and McKinney, M. 1991. "Morphological Mutation Functions: Applications to Motivic Transformations and to a New Class of Cross-

317 Synthesis Techniques". International Computer Music Conference ICMC1991. Montreal, ICMA: pp. 234-241. Pope, S. T. 1991. "The well tempered object". Cambridge, Mass., The MIT Press. Pratt, G. 1998. "Aural synthesis". Aural awareness: principles and practice. New York, Oxford University: pp. 31-45. Pressing, J. 1992. "Synthesizer performance and real-time techniques". Oxford, Oxford University Press. Puckette, M. 2005. “Pure Data”, Source Forge. accessed on 9th of February, 2005. site: sourceforge.net/projects/pure-data/ Puckette, M. and Apel, T. 1998. "Real-time audio analysis tools for Pd and MSP". Proceedings of the International Computer Music Conference. San Francisco, International Computer Music Association: pp. 109-112. Puckette, M. and Zicarelli, D. 1990. [computer program] "MAX", Opcode. RAD 2004. “Miles SDK Features”, RAD Game Tools. accessed on: 20th of April 2004. site: www.radgametools.com/msssdk.htm#midi Raphael, C. 2003. "Orchestra in a Box: A System for Real-Time Musical Accompaniment". International Joint Conference on Artificial Intelligence IJCAI2003, Rencon Workshop. accessed on: 2nd of August, 2007. site: xavier.informatics.indiana.edu/~craphael/papers/ijcai03.pdf Rare 1998. [computer game] "Banjo Kazooi". Japan, Nintendo. Reger, M. 1903. "On the theory of modulation". Munich, Kalmus. (undated reprint edition by Van Nuys, CA, Alfred). Renz, C. 2005. “MIDI-Morph-0.02”. accessed on: 15th of December, 2006. site: search.cpan.org/~crenz/MIDI-Morph-0.02/ Reti, R. 1958. "Tonality, Atonality, Pantonality: A Study of Some Trends in Twentieth Century Music". Westport, CT, Greenwood Press. Richards, T. 2007. “NVIVO”, QSR International. accessed on: 30th of July, 2007. site: www.qsrinternational.com Roads, C. (ed.) 1985. "Composers and the computer". The computer music and digital audio series. Los Altos, Calif., William Kaufmann. Roads, C. 1996. "The Computer Music Tutorial". Cambridge, Massachusetts, MIT Press: pp. 823. Rodet, X. and Lefevre, A. 1997. "The Diphone program: new features, new synthesis engines and experience of musical use". Proceedings of International Computer Music Conference ICMC1997. Thessaloniki, Greece, ICMA: pp. 418-421. Roeder, J. 2003. "Beat-Class Modulation in Steve Reich's Music". Music Theory Spectrum. 25 (2): pp. 275-304. Rosenboom, D. 1982. "The Qualities of Change: ‘ON BEING INVISIBLE’: Steps Towards Transitional Topologies of Musical Form". Oakland, CA, Mills College. accessed on: 2nd of August, 2007. site: music.calarts.edu Rowe, R. 1993. "Interactive Music Systems". Cambridge, Massachusetts, MIT Press. Rowe, R. 2001. "Machine musicianship". Cambridge, Mass., MIT Press. Rowe, R. 2007. “Video of concert using Cypher”, New York University. accessed on: 27th of April, 2007. site: homepages.nyu.edu/~rr6/CigarLocation.mov

318 Royce, W. 1970. "Managing the development of large software systems". Proceedings of IEEE WESCON: pp. 1-9. Russell, S. and Norvig, P. 2004. "Artificial Intelligence: A Modern Approach". Prentice Hall Series in Artificial Intelligence. New Jersey, Prentice Hall. Russo, W. 1968. "Jazz composition and ". Chicago, University of Chicago Press. Sachs, C. 1953. "Rhythm and tempo". New York, Norton. Sanger, G. A. 2003. "The fat man on game audio: tasty morsels of sonic goodness". Indianapolis, Indiana, New Riders Publishing. Saslaw, J. 2007. “Modulation”, Grove. accessed on: 24th of March, 2007. site: www.grovemusic.com Schenker, H. 1935. "Der Freie Satz (Free Composition)". New York, Longman, 1979. Schnittke, A. 1971. “Polystylistic tendencies in modern music”. A Schnittke reader. J. D. Goodliffe. Bloomington, Indiana, Indiana University Press. Schoenberg, A. 1978. "Theory of Harmony". Berkeley, University of California Press. Schoenberg, A. 2006. "Technique of joining". in The musical idea and the logic, technique and art of its presentation, P. Carpenter and S. Neff (translators), Indianna University Press: p. 174. Schüler, N. 2002. "On Classifying Computer-Assisted Music Analysis". Computer Applications in Music Research, N. Schüler (ed.). Frankfurt, Peter Lang. Schwanauer, S. M. and Levitt, D. A. 1993. "Machine Models of Music". Cambridge, Mass., The MIT Press. Schwarz, D. 2004. [PhD Thesis] “Data driven concatenative sound synthesis”. Université Paris 6, Paris. accessed on: 19th of February. site: recherche.ircam.fr/equipes/analyse-synthese/schwarz/ Seargent, W. 1933. "Bernard Ziehn; Precursor". The Musical Quarterly. 19 (2): pp. 169-177. Shapiro, P. 2000. "Modulations: A History of Electornic Music - Throbbing Words on Sound". New York, Distributed Art Publishers. Sharp, D. 2006. LMUSe. accessed on 6th of December, 2006. site: www.geocities.com/Athens/Academy/8764/lmuse/lmusej.html Shepard, R. N. 1964. "Circularity in judgements of relative pitch". The Journal of the Acoustical Society of America. 36 (12): pp. 2346-2353. Shepard, R. N. 1982. "Geometical approximations to the structure of musical pitch". Psychological Review. 89: pp. 305-333. Sides, A. 1996. [compact disc] "Disney's Music from the Park", The Walt Disney Company. Smith, T. 1996. "Anatomy of a Fugue." Accessed on: 14th of February, 2008, site: jan.ucc.nau.edu/~tas3/fugueanatomy.html. Sommerer, C. and Mignonneau, L. 1998. "Art@Science". Vienna/New York, Springer. Sorenson, A. and Brown, A. 2004. “jMusic”. Brisbane, QUT. accessed on: 13th April 2004. site: jmusic.ci.qut.edu.au

319 Sorenson, A. and Brown, A. 2000. “An introduction to jMusic”. Australasian Computer Music Association Conference ACMC2000: interFACES. Brisbane, ACMA. SoundTrek 2005. "Jammer Professional 5". Suwanee, USA, SoundTrek. accessed on: 10th of February, 2005. site: www.soundtrek.com Spiegel, L. 1987. “A Short History of Intelligent Instruments”, The Computer Music Journal. 11 (3). accessed on: 22nd of June, 2004. site: retiary.org/ls/writings/cmj_intelligt_instr_hist.html SSEYO 2004. "KOAN", TAO. accessed on: 8th of March, 2004. site: www.sseyo.com SSEYO 2004. “Koan Interactive Audio Platform - Vector Audio / Generative Music”, SSEYO. accessed on: 15th of April 2004. site: www.sseyo.com/koan/koanVectorAudio_GenerativeMusic.html Sturm, B. 2004. "MATCONCAT: An application for exploring concatenative sound synthesis in MATLAB". Proceedings of the Digital Audio Effects Workshop DAFx04. Naples, Italy: pp. 323-326. Sushi 2002. “Review of FreQuency”, GameReviewers. accessed on: 22nd of June, 2004. site: gamereviewers.com/sushi/playstation2/music/frequency.html Synthtopia 2004. “Controversial Grey Album Mixes Beatles with Rap”, Synthtopia. accessed on 23rd of March, 2007. site: www.synthtopia.com/content/2004/02/16/controversial-grey-album-mixes- beatles-with-rap/ Tagg, P. 1982. "Analysing popular music: theory, method and practice". Popular Music. 2 (Theory and Method issue): pp. 37-67. Taube, H. 2004. "Notes from the metalevel: introduction to algorithmic music composition". London, Taylor & Francis Group. Temperley, D. 2001. "The cognition of basic musical structures". Mass., MIT Press. Temperley, D. 2007. "Music and probability". Cambridge, Mass., MIT. Tenney, J. and Polansky, L. 1979. "Temporal Gestalt Perception in Music". Journal of Music Theory. 24 (2): pp. 205-241. Thaut, M. 2005. "Rhythm, music and the brain: scientific foundations and clinical applications". Studies on new music research v. 7. London, Routledge. Thom, B. 2000. "BoB: an interactive improvisational companion". Fourth international conference on autonomous agents Agents2000. Barcelona, Spain, ACM Press: pp. 309-316. Thompson, T. 2007. “KeyKit”, AT&T. accessed on: 24th of April, 2007. site: nosuch.com/keykit/ Thomson, W. 1999. "Tonality in music". San Marino, Everett Books. Tokita, A. M. 1996. "Mode and scale, modulation and tuning in Japanese shamisen music: the case of Kiyomoto narrative". Ethnomusicology. 40 (1): pp. 1-33. Towsey, M., Brown, A., Wright, S. and Diederic, J. 2001. "Towards melodic extension using genetic algorithms". Educational Technology and Society. 4: pp. 54-65.

320 Travis, J. 1938. "Irish National Music". The Musical Quarterly. 24 (4): pp. 451- 480. Truax, B. 1976. "A Communicational Approach to Computer Sound Programs". Journal of Music Theory. 20 (2): pp. 227-300. Truchet, C., Assayag, G. and Codognet, P. 2001. "Visual and Adaptive Constraint Programming in Music"., International Computer Music Conference ICMC2001. La Havana, Cuba, IMA Press. accessed on: 2nd of August, 2007. site: www.ircam.fr UI Software 2005. "MetaSynth 4", UI-Software. accessed on: 22nd of February, 2005. site: www.uisoftware.com/MetaSynth/ Vercoe, B. 1984. "The synthetic performer in the context of live performance". International Computer Music Conference ICMC1984. Paris, France: pp. 199-200. Vercoe, B. 2000. "Forward". in The Csound book, R. Boulanger (ed.). Cambridge, Mass, MIT Press. Viñao, A. 1996. “Masago's Confession”, Viñao. accessed on 23rd of March, 2007. site: www.vinao.com Warburton, D. 1988. "A Working Terminology for Minimal Music". Integral. 2: pp. 135-159. Waterman, R. 1948. ""Hot" rhythm in negro music". Journal of the American Musicological Society. 1 (1): pp. 24-37. Weinberger, N. M. 1999. "Music and the Auditory System". in The Psychology of Music 2nd Ed, D. Deutsch. San Diego, California, Academic Press: pp. 47-83. Whitmore, G. 2003. “Design With Music In Mind: A Guide to Adaptive Audio for Game Designers”, Gamasutra. accessed on 27th of April, 2004. site: www.gamasutra.com/resource_guide/20030528/whitmore_02.shtml Whittall, A. 2007. “Form”, Grove Music Online. accessed on: 30th of May, 2007. site: www.grovemusic.com Wiggins, G., Miranda, E., Smaill, A. and Harris, M. 1993. "A Framework for the evaluation of music representation systems". Computer Music Journal. 17 (3): pp. 31-42. Wikipedia 2006. “Classical music era”, Wikipedia: the Free Encyclopaedia. accessed on 15th of February, 2006. site: en.wikipedia.org Wikipedia 2007. “Potpourri”. Wikipedia: the Free Encyclopaedia. accessed on 31st of May, 2007. site: en.wikipedia.org Windsor, W. L. 1995. [PhD thesis] "A perceptual approach to the description and analysis of acousmatic music". Music Department. Clarke, E. (advisor) London, City University. Windsor, W. L. 2004. "Data collection, experimental design and statistics in musical research". in Empirical musicology: aims, methods, prospects, Cook, N. and Clarke, E. (eds.) Oxford, Oxford University Press: pp. 197- 222. Winkler, T. 1998. "Composing Interactive Music". Cambridge, Mass., MIT Press. Winkler, T. 2007. “Video of concert using followplay”, Brown University. accessed on: 27th of April, 2007. site: www.brown.edu/Departments/Music/sites/winkler//music/int_concert/

321 int_concert_poster.mov Winter, R. 2005. [Honours report]"Interactive Music: Compositional Techniques for Communicating Different Emotional Qualities". Electronic Engineering. Freiburg, A. (advisor) York, University of York. Wishart, T. 1996. "On sonic art". Contemporary music studies v. 12, S. Emmerson (ed.). Amsterdam, Harwood Academic Publishers. (revised from 1985 orig.) Wishart, T. 2000. [compact disc] "Red Bird/Anticredos", EMF. Wooller, R. 2003. "A Brief Analysis of Club Drum and Bass: Compositional Structures and Sonic Forms". Australasian Computer Music Association Conference ACMC2003: Converging Technologies. Perth, ACMA. Wooller, R. 2004. “LEMu Feature Guide”. LEMu. accessed on: 15th of April 2004, 2004. site: www.lemu.org/files/featureGuide.html Wooller, R. 2006. "Review of compositional morphing: works, techniques, applications and possibilities". Australasian Computer Music Conference ACMC2006:Trans. Adelaide, ACMA. Wundt, W. 1896. “Grundriss der Psychologie”. Leipzig, Wilhelm Engelmann. translated as “Outlines of Psychology”. 1897. C.H. Judd (translator). Leipzig, Wilhelm Engelmann (Reprinted 1999. ,Thoemmes.). Xenakis, I. 1992, "Formalized music : thought and mathematics in composition". Harmonologia series v. 6. Stuyvesant, NY, Pendragon Press. Zicarelli, D., Offenhartz, J., Widoff, A. and Chadabe, J. 1986. "M", Intelligent Music. accessed on: 6th of June, 2004. site: www.cycling74.com/support/questionsm.html

322