University of Calgary PRISM: University of Calgary's Digital Repository
Graduate Studies The Vault: Electronic Theses and Dissertations
2013-12-23 An Application of Calculated Consonance in Computer-Assisted Microtonal Music
Burleigh, Ian George
Burleigh, I. G. (2013). An Application of Calculated Consonance in Computer-Assisted Microtonal Music (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/24833 http://hdl.handle.net/11023/1225 doctoral thesis
University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY
An Application of Calculated Consonance in Computer-Assisted Microtonal Music
by
Ian George Burleigh
A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDIES IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE INTERDISCIPLINARY DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF COMPUTER SCIENCE and SCHOOL OF CREATIVE AND PERFORMING ARTS — MUSIC CALGARY, ALBERTA DECEMBER, 2013
© Ian George Burleigh 2013 Abstract
Harmony (the audible result of varied combinations of simultaneously sounding tones) ought to, for the most part, sound pleasing to the ear. The result depends, among other factors, on a proper choice of the pitches for the tones that form harmonious chords, and on their correct intonation during musical performance.
This thesis proposes a computational method for calculation of relative consonance among groups of tones, and its possible practical applications in machine-assisted arrangement of tones, namely the choice of tone pitches and their microtonal adjustment. The consonance of tone groups is calculated using a model that is based on the physiological theory of tone con- sonance that was published by Hermann Helmholtz in the middle of the 19th century.
Given a group of tones that have fixed pitches, changes in the aggregate dissonance caused by adding another “probe” tone of a variable pitch can be represented as a “dissonance landscape”. Local minima in the “height” of the landscape correspond to local minima of the aggregate dissonance as a function of the pitch of the probe tone. Finding a local dissonance minimum simulates the actions of a musician who is “tuning by ear”. The set of all local minima within a given pitch range is a collection of potentially good pitch choices from which a composer (a human, or an algorithmic process) can fashion melodies that sound in harmony with the fixed tones.
Several practical examples, realized in an experimental software, demonstrate applications of the method for: 1) computer-assisted microtonal tone arrangement (music composition), 2) algorithmic (machine-generated) music, and 3) musical interplay between a human and a machine.
The just intonation aspect of the tuning method naturally leads to more than twelve, poten- tially to many, pitches in an octave. Without some restrictions that limit the complexity of the process, handling of so many possibilities by a human composer and their precise rendi- tion as sound by a performing musician would be very difficult. Restricting the continuum of possible pitches to the discrete 53-division of the octave, and employing machine-assistance in their arrangement and in sound synthesis make applications of the method feasible.
ii Preface
There was Eru, the One, who in Arda is called Il´uvatar;and he made first the Ainur, the Holy Ones, that were the offspring of his thought, and they were with him before aught else was made. And he spoke to them, propounding to them themes of music.
And it came to pass that he declared to them a mighty theme, unfolding to them things greater and more wonderful that he had yet revealed: “Of the theme that I have declared to you, I will now that ye make in harmony together a Great Music. Ye shall show forth your powers in adorning this theme, each with his own thoughts and devices.”
And a sound arose of endless interchanging melodies woven in a harmony that passed beyond hearing into the depths and into the heights, and the music and the echo of the music went out into the Void, and it was not void.
J.R.R.Tolkien, Ainulindal¨e(The Music of the Ainur), The Silmarillion, 1977.
Music is made from melodious strands of tones (voices) that are “woven in a harmony” into greater patterns and structures. The weaving and interplay of melodic strands (counter- point) bring into existence chords (combinations of simultaneously sounding tones) and thus harmony. The composer may deal with all the voices as equally important, or choose one voice as the primary one that carries the main melody and make the rest to function as an accompaniment. In any case, the proper combination of tones that sound together is a most essential factor in making music.1
Weaving of musical patterns is not unlike solving a combinatorial puzzle: a task to organize a large number of items so that they fit well together. There are many possible combinations of the items; most of the combinations have to be outright rejected since they do not lead to any potential solution. Somewhere among the remaining combinations may lie possible answers; there could be several or perhaps many of those. Some may be but acceptable, some may be good, and occasionally there are some that are exquisite.
1“Harmony”. Oxford Music Online.
iii A common musical puzzle is the problem of creating a melody that fits a given harmonic progression. (This text proposes a computational method that can be used to assist in solving such puzzles.) A famous example is Charles Gounod’s “Ave Maria”,2 superposed over J.S. Bach’s Prelude in C.3 The beautifully clear structure of the Prelude with a mobile, yet firmly grounded harmonic progression makes it a great foundational material for composition of musical variations. In Chapter 4 we shall discuss several melodies and other musical structures that were fitted over the chord progression of the Prelude, to demonstrate the feasibility of our method and the practicability of the experimental software in which the method has been implemented.
Weaving musical patterns is a craft and an art, practised mainly by music composers and performing musicians. Music composers create musical compositions (that include solutions of various musical puzzles) and write them down using musical notation in the form of musical scores. Performing musicians interpret the scores, in a live performance or during a studio recording, turning them into sound. It takes talent and many years of training for both music composers and performing musicians to acquire necessary skills and insight. The composer and the musician can be the same person: some top performers of classical music and many songwriters of popular music write and perform their own compositions and songs. Improvising musicians, in particular the players of Renaissance, Baroque, Eastern, and jazz music, invent musical themes (or create them from prepared and practised melodic fragments) at the time of their performance.
Fitting numerous composed or improvised (that is, invented on the spot in “real” time) melodies over an existing harmonic progression is one of the essential aspects of jazz music. Jazz musicians do often borrow chord progressions from other sources, but almost always create their own original melodic themes. Some of such newly created themes even have become a part of the “jazz songbook”, a corpus of “standard” jazz songs. For example, Miles Davis’s song “Donna Lee”4 is a bebop melody written over the more traditional harmonic progression of the song “Back Home Again in Indiana”.5 The chord progression of George Gershwin’s song “I Got Rhythm”6 became known as “rhythm changes”7 and was used in
2M´editationsur le premier pr´eludede S. Bach, 1853. 3Prelude and Fugue No.1 in C major, BWV 846. 4“Donna Lee (1947)”. Wilson et al., JazzStandards.com — the first and only centralized information source for the songs and instrumentals jazz musicians play most frequently. 5“Indiana (Back Home Again in Indiana) (1917)”. ibid. 6“I Got Rhythm (1930)”. ibid. 7“Jazz Theory: Rhythm Changes”. ibid.
iv so many jazz songs, that it is very frequently used as one of the most common background accompaniments for improvisation at jam sessions.8 Chapter 5 describes an experimental real-time system for a musical interplay of a human and a machine, as if they were two improvising jazz musicians.
Music theorists analyze existing compositions, attempt to discover inner principles and rules that guided their weaving, and then construct theories that attempt to explain how the compositions were built and why their tone patterns do (or do not) sound in harmony. Many musical works can be then subsequently created following the rules of a given music theory, within the limits of the musical genre for which the theory was constructed. Great composers, however, often in their work stretch or even break the boundaries of established theories and thus advance musical practice.
And then there are engineers. Engineers may not have the creative talent or feelings of artists, nor the insight of music theorists; but engineers recognize structured patterns and with that ability they build machines. Computers are the ultimately flexible and adaptable machines brought to action by computer programs, machines that can with remarkable success assist composers and musicians to solve musical puzzles, write musical scores, and render them as sound. This text describes one such attempt to build a suite of software applications that assist composers in creating music with higher level of tuning complexity and precision than otherwise possible. The approach comes not from emulating established musical practices, but from following natural (acoustical) principles of musical sound. A particular attention is given to presenting an argument for the choice of just intonation, and to the musical consequences that result from it.
8Gatherings where jazz musicians play “jazz standards” and take long improvised “solos”.
v To the fellowship of the upper room and my favourite Martians
vi Table of Contents
Abstract ii
Preface iii
Dedication vi
Table of Contents vii
List of Tables ix
List of Figures x
Definition of terms xii
Online materials xvi
1 Introduction and background 1 Music and machines ...... 2 Computer-assisted music ...... 3 Working definitions ...... 5 Western music ...... 9 Music theory rudiments ...... 10 Music notation ...... 14 Division of the octave ...... 17 The twelve-tone division of the octave ...... 19 Just intonation ...... 20 Computer-assisted microtonal music ...... 22 Experiments and experimental tools ...... 22 The outline of the work ...... 25
2 Tone sensations 27 The pitch spiral ...... 27 Sonograms ...... 30 Reassigned sonograms ...... 33 The harmonic series ...... 34 Harmonics of the singing voice ...... 37 The intervals of the harmonic series ...... 46 Consonance and dissonance, redefined ...... 49 Helmholtz’s Tonempfindungen ...... 50 Experiments ...... 52
3 Octave divisions 61
vii Octave equivalence ...... 61 Division of the octave by cycles of fifths ...... 62 Temperaments — how do they sound? ...... 69 Pythagorean (the fifth) tuning ...... 70 Just tuning ...... 71 The 53-tone microtonal division of the octave ...... 75 Microtonal notation of the 53-tet system ...... 78
4 Calculated consonance and its applications 85 Dissonance curves and landscapes ...... 85 Dissonance landscape displayed on the pitch spiral ...... 87 Sequencing music in 53-tone division ...... 90 The C53 Suite: Variations on Prelude in C, BWV 846 ...... 94
5 Related work: interactive music 107 Qvin Musical Processes ...... 108
6 Conclusions and future work 116 Future work ...... 117
A C53 Suite 119
B The pitch spiral 129
C Included media and software 133
Glossary of terms 134
Bibliography 136
viii List of Tables
1.1 The eight musical intervals ...... 12 1.2 Perfect intervals ...... 13 1.3 Imperfect intervals ...... 13 1.4 Intervals and their inversions ...... 14
2.1 Intervals of the harmonic series ...... 36
3.1 Equal divisions of the octave ...... 81 3.2 Just tuning and 12-tet approximated in 53-tet ...... 82 3.3 The 53-tone equal division of the octave ...... 83 3.4 The 53-tone notation ...... 84
ix List of Figures
1.1 Piano keyboard ...... 11 1.2 C major scale ...... 11 1.3 Enharmonically equivalent altered notes...... 12 1.4 Prelude in C, opening measures ...... 16
2.1 The pitch spiral ...... 28 2.2 Source and difference tones on the pitch spiral ...... 29 2.3 Various grids on the pitch spiral ...... 30 2.4 Sonograms of a whole-tone scale ...... 32 2.5 The whole-tone scale in Qvorum ...... 34 2.6 Harmonics of a complex tone, shown on the pitch spiral ...... 38 2.7 Tuˇzba(Desire) ...... 39 2.8 A sonogram of the singing voice, linear frequency scale ...... 39 2.9 A sonogram of the singing voice, logarithmic frequency scale ...... 40 2.10 Time-frequency reassigned sonogram of Example 2.2 ...... 41 2.11 Example 2.2, strong harmonics only ...... 42 2.12 Example 2.2, 12-tet grid ...... 42 2.13 Example 2.2, fundamental harmonics ...... 43 2.14 Example 2.2, the first phrase ...... 43 2.15 Jabluˇnka (Apple Tree) ...... 44 2.16 A sonogram of a violin/voice duet...... 44 2.17 Two shades of B4 ...... 45 2.18 Jabluˇnka — microtonal notation ...... 45 2.19 Example 2.6A ...... 57 2.20 Example 2.6B ...... 58
3.1 Octave equivalence: C2, C3, C4, and C5 ...... 61 3.2 Octave equivalence with difference tones ...... 62 3.3 The cycle of fifths with moments of symmetry ...... 63 3.4 The Pythagorean comma ...... 65 3.5 Tempering out the Pythagorean comma ...... 66 3.6 Just and equally tempered fifths ...... 67 3.7 Pythagorean, just and equally tempered thirds ...... 67 3.8 Tempering out the syntonic comma ...... 68 3.9 The meantone temperament (unequal semitones) ...... 68 3.10 Just C chromatic scale on a tuning lattice ...... 72 3.11 Continued tuning lattice ...... 72 3.12 Comma drift ...... 73 3.13 Comma shift ...... 73 3.14 The 53-tone system ...... 77
x 3.15 Just, 12-tet, and 53-tet fifths ...... 77 3.16 Just, 12-tet, and 53-tet thirds ...... 78 3.17 12-tet and 53-tet major triads ...... 78 3.18 The 53-tet and 12-tet notations compared ...... 80
4.1 The dissonance curve ...... 86 4.2 Helmholtz’s construction of the dissonance landscape ...... 86 4.3 The dissonance landscape of simple tones ...... 87 4.4 The dissonance landscape of complex tones ...... 87 4.5 The dissonance landscapes on the pitch spiral ...... 89 4.6 The original sequencer Qvinta written in Java ...... 93 4.7 The current C++/Qt version of Qvinta ...... 93 4.8 Rusty Reflections, the opening ...... 100 4.9 Farmer’s Fuguetta, the opening ...... 102 4.10 Farmer’s Fuguetta, the ending ...... 103
5.1 The Qvin IDE ...... 109 5.2 The Qvin Musical Processes plugin ...... 109 5.3 ‘qmp’ user interface example ...... 111
B.1 The pitch spiral ...... 131
xi Definition of terms
This is an interdisciplinary text. We use a number of terms that are, no doubt, familiar to many of the readers, but which meaning often differs depending on the context where they are used. In the back of this text you will find a more complete glossary of many such terms, but there are a few that need to be defined before we proceed further, since they are used frequently from the very beginning. Their meaning could be otherwise vague or confounding. This is how they are used within this text:
Sound: (noun) A physical entity; a mechanical disturbance created by vibrating mechanical “sonorous” bodies that propagates as a sound wave through elastic media, typically through air.9 In this text we restrict the meaning of ‘sound’ to “audible sound”, i.e. such sound as can be sensed by human ear.
Sound: (verb) To create such sound wave (above).
Sound element: A coherent sound with clear beginning and ending. For example, a tone played on a musical instrument and notated by a single note in the musical score is a sound element.
Amplitude: The maximum displacement of vibrations or of a point on a (sound) wave. In this text, the term is interchangeably used to refer to the amplitude of a sound, to sound intensity, or sound loudness. If used in a strict sense, ‘amplitude’ refers to the maximum displacement within a complete cycle of the vibrations, and ‘instantaneous amplitude’ refers to the current amplitude at a given time instant.10
Amplitude envelope: A curve (a function of time) that limits the amplitude of a sound element. A proper amplitude envelope must be a smooth non-negative curve that begins and ends with zero.
Sensation: A physical experience (related to the human body) that results from stimulation of a sense organ, sensory nerve, or sensory area in the brain. Sensations can be in many
9“Sound”. Encyclopædia Britannica Online. 10“Amplitude (physics)”. ibid.
xii cases objectively observed and studied.11
Sound sensation: An experience that results from stimulation of the ear (as a sense organ), carried by the auditory nerve and processed by a respective sensory area of the brain. Periodic changes in air pressure cause the sound sensation of a tone.
Perception: A process in which sensations are transformed into a mental experience (re- lated to the human mind), making inferences about the sources of the sensory stimuli, and forming mental concepts that reflect the outside reality. Perception draws not only on the current sensation, but also on past experience of relevant sensations. Perception can be observed only by the perceiver and therefore remains subjective, although it can be studied indirectly.12
Sound perception: A mental experience that is the outcome of a sound sensation. Sound perception also draws on past experience of sound sensations, such as the exposure to various kinds of musical sound, and other types of conditioning. For example, “ear training” in the course of which musicians learn to recognize musical intervals develops listening attention and “musical memory”.
Musical sound: Sound with certain qualities that make it usable in music. Although there are many kinds of musical sound, the only kind of musical sound specifically referred to in this text is the tone.
Tone: An elementary musical sound. A tone is a sound with a periodic waveform of a given stable frequency. It can be either a sustained sound, or a sound element. (‘Tone’ can also refer to a basic musical interval that has a size of two semitones.)
Note: A graphical symbol that represents a tone (as a sound element) in musical notation; also a name that designates the pitch of a tone (see ‘Pitch’ below).
Frequency: The number of cycles of a periodic wave in one second.
Pitch: A perceived quality of a tone that is related to its frequency. Pitch is a “position” on an imagined scale between “low” and “high” ends of the audible sound spectrum.
11“Sensation”. Encyclopædia Britannica Online. 12“Perception”. ibid.
xiii A slower frequency of vibrations corresponds to a lower pitch, a faster frequency to a higher pitch. The terms ‘low’ and ‘high’, with respect to the pitch of a tone, are probably based on a physical analogy: higher tension of a string, naturally associated with movement upward, causes the string to vibrate faster.13 In the usual Western musical context, pitch is specified by a note name: a letter from A to G with a subscript that indicates the octave range to which the pitch belongs and an optional accidental
sign (], [, etc.) that indicates a “chromatic” alteration of the pitch, e.g. C4 or G5].
Interval: Used strictly to mean musical interval: a difference between the pitches of two tones.
Comma: A very small musical interval, significantly less than a semitone.
Simple tone: A tone with a sinusoidal waveform; causes a sensation of a plain musical timbre (tone “colour”). Among traditional (non-electronic) musical instruments, the clean sound of the flute is close to that of a simple tone.14
Complex tone: A tone with an arbitrary periodic waveform that is not sinusoidal. By Fourier theorem, a complex tone is a sum of simple tones which frequencies form a harmonic series. Complex tone causes a sensation of a richer musical timbre. Most musical instruments, namely bowed string instruments of the violin family, reed wood- wind instruments, or brass instruments produce complex tones.15
Harmonic: (noun) One of the simple tones which sum forms a complex tone.
Partial: (noun) ‘Partial’, short for “partial tone”, is used in musical contexts as a synonym for ‘harmonic’. In this text, we use the term ‘partial’ only to refer to data items produced by the sound analysis library “Loris”.
Fundamental: (adjective) Applied to ‘frequency’ or ‘harmonic’. The “fundamental fre- quency” is the basic frequency of a complex waveform. The “fundamental harmonic”
13Goetschius, The structure of music, p.16. 14The flute sound has most energy in the fundamental (first) harmonic, and relatively little energy above the second harmonic. Owing to the plain character of the flute sound, it is easy to experience strong sensations of difference tones when two flutes play together. 15The instrument players do control, among other aspects of sound, the harmonic content of the produced complex tones: by choosing the place and style of bowing, by controlling the vibrations of the reed with their embouchure, etc.
xiv is the basic component of a complex waveform: a simple tone with the fundamental frequency.
12-tet: Twelve-tone equal temperament.
53-tet: Fifty-three-tone equal temperament.
Key: Except when discussing the piano keyboard, where ‘key’ means the lever that is pushed down in order to strike a string and produce a sound, ‘key’ refers throughout the text to a musical key: a tonal system derived from a diatonic (heptatonic) scale and named after its central note, the tonic, e.g. C major, D minor, etc.
xv Online materials
The work presented in this text is interdisciplinary. It includes a theoretical discourse, pre- recorded sound examples, software tools where a user by graphical means controls produced sound, and a dynamic computer system for musical interplay between a human and a ma- chine. Due to the restrictions of the traditional written thesis format,16 some of those will be only verbally discussed in the text. The reader should, however, listen to the provided sound examples and experiment with the “pitch spiral”; both activities are most essential for appreciation of the discussed aspects of musical sound. The “pitch spiral” applet and sound examples are provided on web pages that accompany this text:
Main web page http://qvin.in/CC Interactive “pitch spiral” applet http://qvin.in/CC/PitchSpiral Sound experiments http://qvin.in/CC/Experiments The C53 Suite of musical etudes http://qvin.in/CC/C53
16A multimedia-friendly format, such as a set of dynamic, hyperlinked web pages, would be more suitable for this type of work.
xvi Chapter 1: Introduction and background
Principia essentialia rerum sunt nobis ignota.1
A violinist bows the strings of a violin. Their vibrations pass through the bridge onto the soundboard, excite acoustic pressure waves that propagate through air, eventually strike the eardrum and through the mechanical action of the ossicles in the middle ear enter the cochlea. Up to this point, the physical principles of sound creation and propagation are well under- stood and can be directly observed and measured. The processes that take place within and past the inner ear, processes that are essential in the sensation and perception of sound, are much less apparent and measurable, and can be directly investigated only with difficulty. We are therefore compelled to study this final part of the sound creation–transmission–sensation– perception chain as a “black box”, by observing the causality between the character of sound waves and their subjective perception. This limitation, of course, by no means invalidates the conclusions that can be drawn from such observations.
Interdisciplinary text: This text is written for a mixed audience of readers: for com- puting scientists2 that are interested in music, and for musicians (theorists, composers, per- formers) that do realize and acknowledge the importance of physics and mathematics in arts. Many concepts that are well-known to one group of readers may be new to the other; we shall attempt to approach the material in such a way that it can be easily understood by both groups. It will therefore occasionally be necessary to review introductory terms and ideas as they are encountered. That itself is a useful exercise: reviewing foundational concepts in a course of study is certainly worth the time, since they are closely related to understanding of fundamental principles. We hope that the gentle reader finds even the familiar narrative interesting.
1Sed quia principia essentialia rerum sunt nobis ignota, ideo oportet quod utamur differentiis accidentalibus in designatione essentialium. “Since the essential principles of things are hidden from us we are compelled to make use of accidental differences as indications of what is essential.” Thomas Aquinas, Commentary on Aristotle’s De Anima, Book One, I/1 §15. http://dhspriory.org/thomas/DeAnima.htm 2The usual terms are “computer science” and “computer scientist”. Yet computer science studies computa- tion, (the principles, methods, and applications of computing), and not computers. Computers are tools of computer science, but not necessarily objects of the study. For this reason it may be preferable to use the terms “computing science” and “computing scientists”, or even the terms (of European origin) “informatics” and “informatician”.
1 Music and machines Douglas Hofstadter in his seminal book G¨odel,Escher, Bach explored the potential and also the limits of formal symbolic systems that are the very foundation of computing, and also made many references to visual arts (M.C. Escher’s drawings) and music (J.S. Bach’s canons and fugues). As much as there is a strong connection between mathematics, computation, and music, Hofstadter did not suggest that computation would or could take on a creative role of an artist anytime soon:
Music is a language of emotions, and until programs have emotions as complex as ours, there is no way a program will write anything beautiful. There can be [...] shallow imitations of the syntax of earlier music, but [...] there is more to musical expression than can be captured in syntactical rules. There will be no new kinds of beauty turned up for a long time by computer music-composing programs.3
Hofstadter’s view has been, by his own account, shaken by compositions written by “Emmy” (EMI — Experiments in Musical Intelligence), a computer program written by David Cope.4 Emmy writes so-called recombinant music: it analyzes a given collection of musical works of a particular composer, breaks the works down to fragments, extracts semantic rules that supposedly guided how the fragments were originally combined, and reassembles them into new “templagiarised” compositions, that “sound like” Bach, Chopin, Beethoven, Gershwin, or Joplin. In Starring Emmy straight in the eye — and doing my best not to flinch,5 Hofs- tadter recounts how, during his lectures on EMI, audience were often unable to distinguish between a “genuine Bach invention or a Chopin mazurka”, and Emmy-created compositions in the style of Bach or Chopin.
From his experience with Emmy, Hofstadter drew a pessimistic conclusion that:
• Either Chopin (for example) is shallower than Hofstadter had ever thought (since Emmy-Chopin piece had spoken to Hofstadter the same as the genuine Chopin had), and • music is shallower than he had ever thought (since what applies to Chopin, may apply to other composers as well), or perhaps even that
3Hofstadter, G¨odel,Escher, Bach: An Eternal Golden Braid, “Ten Questions and Speculations”. 4Cope, Experiments in musical intelligence. 5Hofstadter, “Creativity, Cognition, and Knowledge: An Interaction”.
2 • the human soul/mind is a lot shallower than he had ever thought (since music created by a relatively simple computer and about twenty thousand lines of LISP code, can be deeply moving).
[...] the day when music is finally and irrevocably reduced to syntactic pattern and pattern alone will be, to my old-fashioned way of looking at things, a very dark day indeed.6
While Hofstadter was deeply worried that the creation of beauty could be delegated to machines, the audience of his lectures apparently were not. We should not be worried, either: Hofstadter’s experience (that music created by a computer “can be deeply moving”) confirms that music does not ultimately exist on its own, but that it depends on its reflection within a human mind: beautiful minds find beauty in (certain) music. Minds that are not sensitive to this kind of beauty do not perceive it, but those that do find, for example, serenity in choral music, magnificence in symphonic music, stimulation and excitement in jazz music, visceral motion in rock music, eroticism in soul music, etc. Even though most music is nowadays marketed, distributed, and consumed as a commodity, its full potential can still be reached only through participation: by creating music as a music composer, playing music as a performing musician, by dancing to the sound of music, whistling while you work, humming a merry tune, and first and above all: by attentive listening.
Hofstadter’s “dark day” may surely come, but it will be not the day when music is reduced to a syntactical pattern and when machines therefore compose beautiful music, but the day when we no longer listen to it.
Computer-assisted music Having satisfied ourselves that making music with the help of machines is not necessarily a bad thing, we may now proceed to narrow the scope of our investigation: since participation is essential in making music, we shall not consider replacing human composers, performers, and listeners by machines; we shall use the computer only to assist with making music: as a laboratory for experiments with sound, as a tool for music composition (and real-time improvisation), and as a musical instrument.
6Cope later responded to Hofstadter’s and others’ comments in: Cope, Virtual Music: Computer Synthesis of Musical Style.
3 In this text we shall always keep in mind a distinction between ‘computer-assisted music’ and ‘algorithmic music’: in computer-assisted music, the machine helps a human composer and/or a human performer, who remains in charge of the creative process. That is in contrast with algorithmic music where once the machine is set up, the process takes its course with only a limited interactive control. In both cases, however, the algorithms are always set up by a human who thus participates in the music creation.
It may be tempting to approach computer-assisted or algorithmic music by constructing a program that simulates or imitates a human composer; for example, one that combines the twelve tones of the Western musical scale, according to a set of rules captured in some music theory, into a musical composition in a familiar style.7 Such approach can yield quite satisfactory results.
Another frequently used method is mapping (projecting) of some computed mathematical structure, a system of simulated physics, or spatial gestures and other input data made with human–computer interfaces, onto a domain of tones and other musical sounds. Success of a mapping method depends on whether a good mapping function is found. Some of these systems do produce impressive and interesting results. For example, in “an experiment in a new kind of music,” the output data of twelve-cells wide Wolfram “elementary cellular automata” are mapped to twelve tones of the chromatic scale.8
Mapping of abstract mathematical structures, dynamic physical systems, or spatial “ges- tures” works very well for visual arts. For example, simple iterative mathematical operations performed on complex numbers yield the well-known Mandelbrot9 or Julia10 fractals. That is, however, not a sufficient reason to assume that a similar approach should necessarily work for music. There is a fundamental difference between the two: music is based mostly on immediate sensation of audible sound, its precise change in time, and associated complex abstract mental concepts; while visual art is based more on real images, their shapes and
7Emmy’s recombinant music may be an example of such method. It is difficult to say surely, because internal structure and function of EMI has not been published by its author, to my knowledge. An example of an “open source” system that does work by enforcing rules of counterpoint on a tone structure is COMPOzE (Henz, Lauer, and Zimmermann, “COMPOzE — Intention-based Music Composition through Constraint Programming”). 8WolframTones™ . 9“Mandelbrot Set”. Wolfram MathWorld™ . 10“Julia Set”. ibid.
4 colours.11
The problem of finding an appropriate mapping of various systems to music is quite elu- sive.12 In the area of gestural control of musical instruments, contemporary, easily accessible electronic technology allows intensive research and experimentation with “new interfaces for musical expression”.13 The innovative designs of various interfaces for human-computer in- teraction are stunning and offer many means of expression to electronic music composers and performers. The challenge is how to find and develop ways of using the new tools to their full potential, and how to construct different mappings for various styles of music, namely those that require musical sounds with a definite pitch.14 The evolution of technology and its possibilities takes place at a breathtaking speed. Musical genres, however, need time for their establishment.
Working definitions Before we further discuss computer-assisted music, we must address the question of what is music, even if we may be unable to answer it completely convincingly. It is also prudent that, in order to focus on the objectives of the work presented here, we quite narrow the working definitions, almost to the point of oversimplification.
To properly define ‘music’ (that which is a-musing)15 “would require discussion from many vantage points, including the linguistic, biological, psychological, philosophical, historical, anthropological, theological and even legal and medical, along with the musical in the widest sense.”16 Music has been described as “art concerned with combining vocal or instrumental sounds for beauty of form or emotional expression, usually according to cultural standards of
11“Music stands in a much closer connection with pure sensation than any of the other arts. The latter rather deal [...] with the images of outward objects [...] The plastic arts [...] their main purpose is to excite in us the image of an external object of determinate form and colour. [...] in music, the sensations of tone are the material of the art [...] we do not create out of them images of external objects or actions.” (Helmholtz, On the sensations of tone as a physiological basis for the theory of music, pp. 3–4) 12Interesting possibilities of mapping in the opposite direction, of music to topological structures, were explored, for example, in the work of Gueriono Mazzola (Mazzola, G¨oller,and M¨uller, The topos of music: geometric logic of concepts, theory, and performance) or Dmitri Tymoczko (Tymoczko, A Geometry of Music: Harmony and Counterpoint in the Extended Common Practice). 13NIME: International Conference on New Interfaces for Musical Expression. 14Using non-pitched sounds favours rhythm and sound “texture” over melody and harmony. 15From Greek mousik¯e(tekhn¯e), “(art) of the Muses” (“Music”. Oxford Dictionaries). 16“Music”. Oxford Music Online, (Nettl, Bruno).
5 rhythm, melody, and, in most Western music, harmony”,17 or “the art of arranging tones in aesthetically satisfying form, one after another (melody) and one against other (harmony), rhythmically subdivided, and put together in a completed work.”18
Musical sound
Music is an art form that uses sound as its material. Not all sound is music, but all music is realized through sound. To become music, sound must be somehow organized. Not all organized sound is music, but all music is made of organized sound.
Sound is the energy of mechanical vibrations that radiates as a wave from its source (a vibrating “sonorous” body) through an elastic medium (air) and reaches a listener’s ear. In the ear the mechanical energy of air pressure waves is captured by the eardrum and passed by the small bones of the inner ear into the cochlea, where it causes sympathetic vibration of the basilar membrane. The displacement of the basilar membrane is then converted by hair cells of the Corti organ into neural signals that ultimately lead to the perception of the sound.19 Sound generation and transmission are thus for a great part mechanical in nature and as such subject to laws of wave propagation and interference. Sound sensation is then a psychoacoustic phenomenon, i.e. the combined result of physical functioning of the ear and processing of the auditory stimuli by the mind.
Within this work, the smallest entities of musical sound are tones. A tone is a sound with a periodic waveform, sustained for the minimal length of time that is required for its perception as such by the sense of hearing. The sensation and perception of tones are the results of the periodic nature of the vibrations that generate tones. A periodic vibration, and the resulting wave, are characterized by their frequency, amplitude, shape (waveform), and phase.
The frequency of the wave is perceived as pitch, its amplitude as loudness, and its shape is the main contributing factor of the tone timbre. The phase of a sound wave is not detected by hearing, and plays a role only if two or more waves with comparable frequencies do interfere. 17“Music”. Encyclopædia Britannica Online. 18Die Kunst, T¨onein ¨asthetisch befriedigender Form nacheinander (Melodie) und nebeneinander (Harmonie) zu ordnen, rhyth- misch zu gliedern, und zu einem geschlossenen Werk zusammenzuf¨ugen. Brockhaus-Wallring deutsches W¨orterbuch, Wiesbaden, 1982. In: “Music”, Oxford Music Online. 19Some of this pathway is a “black box”, see p.1.
6 ‘Timbre’ is the characteristic perceived “colour” of the sound of various musical instruments (sources of musical sound). Timbre is the result of a frequency (pitch) spectrum, an amplitude envelope (loudness varying in time), and other factors, such as transient noises at the onset of the sound, or resonant frequencies of the instrument.
The terms ‘frequency’ and ‘amplitude’ are generally preferred in a scientific or engineering context, where quantitative measurements are important. In a casual or artistic context, with main focus on human perception, one would instead refer to pitch and loudness.
Organized sound
All music is made of “organized sound”. The term was introduced in the early twentieth century by Edgard Var`ese,and although it is now mostly used in the context of “music technology”,20 the concept applies to a variety of musical genres, beginning with traditional instrumental music. Var`esegave an account of how the term was coined, and also expressed some of his other views on art and music, including machine-assisted music, in a conversation with Alfred L. Copley:21
But how I envy you painters! You are in immediate communication with your audience. A canvas [is] a finished work. A score is only a blueprint at the mercy of performers and their ‘interpretations’. [...]
Musical freedom is of recent date. Thanks to electronics, composers are no longer deaf to the beauty of what once would have been called unpleasant noise. Music has at last decided to use its ears. [...] Electronics has freed music from the tempered system.
Electronics is an additive, and not a destructive element in the art and science of music. Western music has such a rich and varied patrimony, because new instruments have been added to the old ones. [...]
In the twenties [...] I got sick of the stupid phrase ‘Interesting, but is it music?’ After all, what is music but organised sound, all music! So, I said that my music was organised sound and that I was not a musician, but a worker in frequencies
20See Organised Sound, An International Journal of Music and Technology, ed. Leigh Landy, Cambridge University Press. Published since 1996. 21Var`eseand Alcopley, “Edgard Var`ese on Music and Art: A Conversation between Var`eseand Alcopley”.
7 and intensities.
A machine can only give back what is put into it. A bad musician with instru- ments will be a bad musician with electronics. The composer should, in building his sonorous constructions, have thorough knowledge of the laws governing the vibratory system.
It was Helmholtz who first started me thinking of music as masses of sound evolv- ing in space, rather than, as I was being taught, notes in some prescribed order.22
Let us introduce a quasi-technical quasi-definition of ‘music’ that we shall keep in mind during the course of our argument:
Music: A combination of sound elements, organized in a structured form. Music has aspects of science, craft, and art. The science of music investigates how sound elements (in particular, tones of a definite pitch) may be combined successfully for the desired effect; the discoveries of musical science make the basis of music theories. The purpose of music theories is to formulate rules for the craft of music composition, i.e. arranging tones in time into musically valid relationships. Finally, the art of music strives that the tone arrangements are meaningful structures with certain aesthetic qualities.
Sound element: A coherent (whole, unit) sound with a distinct beginning (onset) and ending, that itself is not an apparent combination of lesser (smaller, shorter etc.) sound elements.23 A sound element is constrained by an amplitude envelope.
Although the meaning of the term “sound element” is quite restricted in this context, it is related to the well-known concept of “sound object”. Curtis Roads defined nine levels of “time scales of music”, from “infinite” to “infinitesimal”. The fifth, middle level is the scale of “sound object”:24
A basic unit of musical structure, generalizing the traditional concept of note to include complex and mutating sound events [...] ranging from a fraction of a second to several seconds. 22Emphasis mine. 23The term ‘sound element’ is introduced to avoid confusion with ‘sound’ as a general physical entity. 24Roads, Microsound, p.3.
8 Pierre Schaeffer’s concept of “sound object” is:25,26
[...] every sound phenomenon and event perceived as a whole, a coherent entity, and heard by means of reduced listening27 [...] independently of its origin or its meaning.
In the remainder of this text, tones are the only type of sound elements that we shall discuss. Other types of sound elements (such as various percussive sounds) are less essential for the presented work.
Western music Taste in music varies from person to person. It depends on the current and past cultural standards, on a listener’s upbringing and schooling, acuteness of hearing and sound percep- tion, exposure to various styles of music as a listener, participation in musical activities as a performer, personal preferences, etc.
We assume that the reader of this text has been exposed to and is reasonably familiar with common styles of Western music, i.e. the music that evolved from its roots in the music of ancient Greece and Rome, through the one-voice chant and two-voice organum of the Middle Ages and the Renaissance polyphony of many voices, through the Baroque sonata, concerto and fugue, classical symphony, and romantic chromaticism, and finally into the musical maelstrom of the twentieth and twenty-first centuries.28
Western music relies on:
• Rhythm: The placement of sound elements in time.29 • Melody: A coherent sequence of tones, characterized by orderly change from pitch to pitch (by steps and leaps) and forming an orderly rhythmic structure.30,31
25Chion, Guide to Sound Objects, p.32. 26“Sound Object”. Landy et al., The ElectroAcoustic Resource Site (EARS). 27Listening to sound for its own sake. 28“Western music”. Encyclopædia Britannica Online. 29“Rhythm (music)”. ibid. 30“Melody (music)”. ibid. 31Melody is often referred to as the “horizontal dimension” of music, since it is related to the flow of time that is notated horizontally, left-to-right, in musical scores.
9 • Harmony: Two or more tones with different pitches that sound simultaneously.32,33 Harmony can be, namely, the result of combination of several melodies (voices) with different rhythmical structures (polyphony), of several voices that move rhythmically together (homophony), or of simultaneous variations of the same melodic line (het- erophony). Groups of (usually three or more) tones sounding together (vertically si- multaneous tones) are called chords.
The set of pitches used in most of Western music is based on the division of the octave into twelve parts (called semitones). As we shall see, while the twelve-tone division of the octave is natural34 and also quite practical,35 it contains inherent contradictions that lead to tuning problems that cannot be solved to a perfect satisfaction; all we can do is to deal with them as well as we can.
Music theory rudiments There are several ways how the term music theory is understood: as “rudiments” of music, as theoretical writings about all possible aspects of music (the nature of sound, acoustics, aesthetics, instrument building, and so on), and as studies of the structure and fundamental principles of music.36 The latter vary from collections of practical rules that were acquired mostly empirically from analyses of music created by notable composers of the past, through various esoteric schemes, such as the “telluric” concepts of Levy,37 to systems that attempt to explain music from natural principles, such as the works of Rameau,38 Vogel,39 Parncutt40 and, most importantly for this text, the work of Helmholtz.41
Teaching of rudiments of music is a part of basic musicianship training. It includes the elements of musical notation, time and rhythm, intervals, scales, modes, keys and chords. We shall now briefly review, as a necessary preparation for the argument that follows, some
32“Harmony (music)”. Encyclopædia Britannica Online. 33Also referred to as the “vertical dimension” of music, notated by vertical superposition of notes. 34“Existing in or derived from nature; not made or caused by humankind.”(“Natural” Oxford Dictionaries). 35The number twelve can be divided into two, three, four, or six equal, whole parts, which allows the construction of a number of symmetric (or not) musical pitch structures. 36“Theory”. Oxford Music Online, (Fallows, David). 37Levy, A theory of harmony. 38Rameau, Treatise on Harmony. 39Vogel, On the relations of tone. 40Parncutt, Harmony: A Psychoacoustical Approach. 41Helmholtz, On the sensations of tone as a physiological basis for the theory of music.
10 aspects of the rudiments; namely the basics of the system of intervals and scales.
The diatonic and chromatic scales
The usual approach to teaching music theory rudiments begins with the Western diatonic scale in the key of C major, as represented by the seven white keys on the piano keyboard (Fig. 1.1), and notated in the standard Western notation by the seven “natural” (not raised or lowered) notes (Fig. 1.2). The white keys and the corresponding notes are named using the first seven letters of the Latin alphabet: CDEFGAB.42
The white keys are either immediately adjacent (E–F , B–C) and form the interval of a semitone, or have one black key in between them and form the interval of a tone (two semitones).
Figure 1.1: Piano keyboard
CDEFGAB
Figure 1.2: C major scale
The black keys do not have assigned their own letter names. The name of each black key is a combination of the letter assigned to either of the two neighbouring white keys and an “accidental” sign that indicates the “altering” — either raising (“sharpening”, by the “sharp” sign ]), or lowering (“flattening”, by the “flat” sign [) — of the pitch by the interval of a semitone.
Each black key can thus be named in two different ways: for example, the key between C and D is both C] and D[. Each such two names are so-called enharmonically equivalent, i.e. “harmonically the same” (Fig. 1.3). On the piano keyboard, they correspond to the same black key and therefore their pitches are exactly the same, but as we shall see, that does not
42Which is the straight sequence ABCDEFG, rotated to constitute the “major” (Ionian) mode.
11 have to be necessarily so. If precise tuning is of importance, enharmonic notes do differ by a small interval (a comma).
C♯/D♭ D♯/E♭ F♯/G♭ G♯/A♭ A♯/B♭
Figure 1.3: Enharmonically equivalent altered notes.
The seven notes of the diatonic scale, together with the five altered notes, constitute a twelve- tone, symmetric chromatic scale with one size of the step (the semitone). This system of 7 + 5 = 12 tones in the chromatic scale is periodic and repeats every octave (the span of eight white keys).
Intervals
Within the octave there are eight possible intervals between two notes, named after the number of diatonic steps (white keys) they span. Table 1.1 lists the intervals, each with an example rooted (starting) on C (intervals can be rooted on any note).
key span interval name example 1 unison C–C 2 (the) second C–D 3 (the) third C–E 4 (the) fourth C–F 5 (the) fifth C–G 6 (the) sixth C–A 7 (the) seventh C–B
8 octave C4–C5
Table 1.1: The eight musical intervals
There are two kinds of intervals: perfect and imperfect. The perfect intervals are the unison, the fourth, the fifth, and the octave. The perfect intervals can be rooted on any of the white keys and retain their size in semitones, as shown in Table 1.2, with one notable exception for the fourth and one for the fifth. The two exceptional intervals, the augmented fourth F –B and the diminished fifth B–F , are equal to the tritone (TT), the “devil in the music”, three tones or six semitones in size.
12 interval name abbrev. semitones on white keys (perfect) unison P1 0 C–C, D–D, etc. (perfect) fourth P4 5 C–F , D–G, etc., except F –B (perfect) fifth P5 7 C–G, D–A, etc., except B–F
(perfect) octave P8 12 C4–C5, D4–D5, etc.
Table 1.2: Perfect intervals
The octave is an interval that serves as an equivalence relation (see p.47). That means that all Cs are “equivalent”, or “similar” to each other, so are the Ds, Es, etc. This is why the keyboard and the system of note names are periodic, and do repeat every octave.
An interval can be inverted by moving its lower note an octave higher or by moving its higher note an octave lower.
Since the octave is an equivalence relation, there are in reality only two kinds of perfect intervals with respect to the octave, the octave itself and the fifth: the octave and the unison are in a way interchangeable (since the unison is an inversion of the octave), and so are the fifth and the fourth. The singular interval, the tritone, is its own inversion.
The imperfect intervals are the second, the third, the sixth and the seventh. Each has two species: major and minor, that differ in size by one semitone, as listed in Table 1.3.
interval name abbrev. semitones on white keys major second M2 2 C–D, D–E, F –G, G–A, A–B minor second m2 1 E–F , B–C major third M3 4 C–E, F –A, G–B minor third m3 3 D–F , E–G, A–C, B–D major sixth M6 9 C–A, D–B, F –D, G–E minor sixth m6 8 E–C, A–F , B–G major seventh M7 11 C–B, F –E minor seventh m7 10 D–C, E–D, G–F , A–G, B–A
Table 1.3: Imperfect intervals
Again, with respect to the octave, there are only two kinds of imperfect intervals: the third and the second, each in the major or minor variety. The sixths are inversions of the thirds
13 and the sevenths are inversions of the seconds; inverting an imperfect interval changes its character from major to minor and vice versa. Table 1.4 lists all intervals paired side-by-side with their inversions.
Note that the three major triads are rooted on C, F , and G, the three notes connected by two fifths: F –C–G. After C (the tonic), F and G are the most important notes, the subdominant and the dominant, in the key of C major. This fact — that only the three thirds rooted on the tonic, subdominant, and dominant (C–E, F –A, and G–B in the key of C major) and no other are major thirds — is perhaps the reason why the major mode is the one most frequently used in Western tonal music.43
interval semitones interval semitones P1 0 P8 12 m2 1 M7 11 M2 2 m7 10 m3 3 M6 9 M3 4 m6 8 P4 5 P5 7 TT 6 TT 6
Table 1.4: Intervals and their inversions
Music notation Western music uses the “staff notation” which is a result of centuries of evolution of Western music theory and practice based in the system of twelve tones in an octave.
Friedemann Sallis wrote about the relation of Western music and its system of musical notation:44
[...] change in the status and function and concept of Western music has usually been accompanied by changes in the way this music is notated. [...] in our musical
43The longest chain of alternating major and minor thirds in the diatonic scale is D–F –A–C–E–G–B–D. (The two adjacent minor thirds B–D–F do not form a perfect fifth.) Note that there are two possibilities for three interlinked triads (subdominant, tonic, dominant) within this chain: F –A–C, C–E–G, G–B–D (three major triads: the major mode), and D–F –A, A–C–E, E–G–B (three minor triads: the minor mode). 44Sallis and Burleigh, “Seizing the Ephemeral: Recording Luigi Nono’s “A Pierre dell’azzurro silenzio, in- quietum. A pi`ucori” (1985) at the Banff Centre”.
14 culture notation and theory are two inseparable sides of the same coin. [...] it [tends] to be more than a system of signs indicating how to produce a given piece of music [...] our system of notation has always been a visual representation of the music-theoretical concepts [...] as well as a set of pragmatic instructions for the performance of a specific work.
The staff is a grid of horizontal lines with spaces between them. Notes are graphical repre- sentations of tones; their vertical placement in the grid, on the lines or in the spaces between the lines, indicates the “height” of their pitch. The horizontal position of notes corresponds to the order of the tone sequence in time; the exact time of onsets of the tones and their duration are encoded through a variety of means: measure lines, various kinds of note heads that indicate the length of the tones, etc., but in principle the staff approximates a two- dimensional pitch×time coordinate system: the vertical coordinate represents tone pitch, and the horizontal coordinate is related to the time dimension.
The staff notation thus suggests the two-dimensional organization of sound elements (p.8) that form a musical piece. Notation systems that are more “machine-oriented” would indi- cate the pitch more precisely by the vertical position of the corresponding graphical symbol, and the onset and duration by its horizontal position and extent. A typical case of such machine-oriented notation is the “piano roll” that is ubiquitous in MIDI sequencers (his- torically preceded by player pianos45). Use of a discrete graphical system of notation was also advocated by Joseph Schillinger,46 and two-dimensional coordinate graphical systems are commonly used to display sonograms (see p.30).
The Western staff notation is widely accepted as a “common music language”47 and as such serves well its purpose. It is however, biased toward the diatonic tonal system. Notes on the staff directly represent the “natural” diatonic scale (as played on the white keys of the piano). The five “accidental” notes (black keys) that complete the chromatic scale are notated, using the accidental signs, as deviations from (alterations of) the “natural” notes. If pitches that are outside the twelve-tone system are to be included, they have to be notated, using some extended set of signs (such as quarter-tone accidental signs), as further deviations from either the diatonic or the chromatic scale. 45Pianos that were played by pneumatic actuators which action was controlled by a perforated paper roll. Player pianos were common in the late nineteenth and early twentieth centuries. 46Schillinger, The Schillinger System of Musical Composition. 47“Musical notation”. Encyclopædia Britannica Online.
15 Western musicians are trained almost exclusively with the use of the Western staff notation, and its role is so ingrained in music practice that one is strongly compelled to employ it to convey all musical ideas, even those that are outside the bounds of the Western twelve-tone system. The usual solution in such cases is to use some extensions of the staff notation to indicate deviations of pitches from those in the chromatic scale.
Fig. 1.4 shows the staff notation of several opening measures of J.S. Bach’s Prelude in C. The chord symbols placed above the top staff are a modern way to show the harmonic structure of the work.48 The Prelude is used throughout this text as a basis for musical examples that demonstrate the viability of the proposed tone arrangement method. Prelude in C, BWV 846a Johann Sebastian Bach C Dm7/C
G7/B C
Am/C D7/C
Figure 1.4: Prelude in C, opening measures
48For practical reasons, the harmonic structure is shown using the type of chord symbols used in contem- porary jazz and popular music, rather than the Roman numeral system for classical music, or the figured bass system of the Baroque era.
16 Division of the octave Most of works of Western music have been composed, notated, and performed using a twelve- tone division of the octave. The choice of twelve distinct pitches is not arbitrary; it is a natural consequence of acoustical properties of tones (p.62).
The problem of how to divide the fourth and the octave into smaller steps and the conse- quences of the chosen division have been intensively studied throughout the history of music until about the end of the nineteenth century:49 first in ancient Greece50 and mediaeval Persia,51 and then in the Western world primarily since the Renaissance in the works of mu- sicians and mathematicians such as Vincenzo Galilei,52 Ren´eDescartes,53 Marin Mersenne,54 Jean-Philippe Rameau,55 Leonhard Euler,56 Jean Lerond d’Alembert,57 Giuseppe Tartini,58 and a number of others.
The problem of the division of musical intervals translates to the arithmetical problem of expressing a ratio of small whole numbers as a product of several such ratios that represent the intervals that the original interval is divided into. For example, the octave (2:1) can be 2 3 4 divided into a fifth (3:2) and a fourth (4:3): 1 = 2 × 3 . The fourth (4:3) can be divided into 4 5 16 a major third (5:4) and a diatonic semitone (16:15): 3 = 4 × 15 , or a minor third (6:5) and a 4 6 10 minor tone (10:9): 3 = 5 × 9 . (An introduction to the intervals and their numerical ratios of frequencies starts on p.34; more in-depth discussion of this topic is beyond the scope of this text, and can be found in a large number of existing treatises.)
Music is number made audible.
Bo¨ethius,Anicius Manlius Severinus (480–524).59
49With the rise of chromaticism in music and with more frequent use of modulation to different keys, it became expedient to disregard the finer points of tuning. 50Barker, Greek Musical Writings. 51Shiloah, The Theory of Music in Arabic Writings c. 900–1900 . 52Galilei, Dialogue on Ancient and Modern Music (English translation 2004). 53Descartes, Compendium Musicae. 54Mersenne, Harmonie universelle. 55Rameau, Treatise on Harmony. 56Euler, Tentamen novae theoriae musicae. 57d’Alembert, El´emensde´ musique, th´eoriqueet pratique, suivant les principes de m. Rameau. 58Tartini, Trattato di musica secondo la vera scienza dell’armonia. 59In: Grattan-Guinness, The Fontana History of Mathematical Sciences.
17 Bo¨ethiusmight have been alluding to the fact that proportions of small numbers (ratios) lead to harmonious consonances.60 At first, ratios of vibrating string lengths (provided that the strings — their material, thickness, and tension — were otherwise the same) were used to reason about musical intervals. Ratios of string lengths are inversely proportional to the frequencies of their vibrations; since about the middle of the nineteenth century when it became technically possible to measure frequencies (called then “vibration numbers”), it is preferable to use ratios of frequencies, keeping in mind that the ratios need to be inverted when reading older texts.
Although there is not much known about the actual music of ancient Greece, there are many accounts of the tuning systems that were used. Greek music was based on tetrachords, scales of four notes bounded by the perfect fourth.61,62 The original three Greek genera of tetrachords were:63
• Diatonic:64 Two descending whole tones, and the remaining semitone. • Chromatic:65 A descending minor third and two semitones; a “colourful” genus. • Enharmonic:66 A descending major third, with the remaining semitone divided into two halves; a “harmonious” genus.
The above descriptions of the three genera are simplified and use the modern terms ‘whole tone’, ‘semitone’, ‘minor third’, and ‘major third’ that are quite imprecise in the context of Greek tetrachords. The Greeks used many alternative divisions of the fourth in each genus, defined by many specific number ratios that offered many subtle audible shades of the intervals.
The modern use of the terms ‘diatonic’, ‘chromatic’, and ‘enharmonic’ is more or less related to the ancient Greek genera:
• Diatonic: The modern diatonic (seven-tone, heptatonic) scale is made of five whole- tone steps and two remaining semitone steps.
60And also to regular rhytmic patterns. 61“Tetrachord”. Encyclopædia Britannica Online. 62Greek scales were descending; a descending fourth is an inversion of an ascending fifth. 63Barbera, The Euclidean Division of the Canon (Greek and Latin Sources). 64Diatonikos, ‘at intervals of a tone’, from dia- ‘through’ + tonos ‘tone’. (“Diatonic.” Oxford Dictionaries) 65Khr¯omatikos, from khr¯oma, ‘colour’. (“Chromatic.” ibid.) 66Enarmonikos, from en- ‘in’ + harmonia ‘harmony’. (“Enharmonic.” ibid.)
18 • Chromatic: The modern chromatic scale is a scale of twelve semitones. • Enharmonic: In modern music theory that assumes twelve-tone temperament (12- tet, see p.66), two tones that are named differently but fall on the same keys on the modern keyboard (such as F] and G[), are called “enharmonically equivalent” and are considered to be “the same”. However, in other systems of tuning (see p.70), such tones do differ by a small interval that is less than a semitone in size (a comma).
The twelve-tone division of the octave In Chapter 2 we shall briefly discuss the harmonic series, a natural property of many vibra- tory systems, such as of stretched strings or columns of air that are used to generate sound in musical instruments.
We will now demonstrate how the division of the octave in Western music into twelve tones naturally follows from a sequence of (perfect) fifths; the choice of the sequence of fifths for division of the octave is itself a consequence naturally following from the harmonic series. Similarly as the fourth was differently divided in the Greek genera, the octave can also be divided into twelve parts in numerous ways.
Early music used so-called Pythagorean tuning (the “fifth-tuning”) that emphasizes the pu- rity of the fifths. When the thirds became an integral part of harmony, it required adoption of the so-called just tuning where both the fifths and the thirds are properly tuned. Just tuning, however, does not allow easy modulation through keys and therefore various temper- aments (systems of tuning that in a controlled manner compromise the purity of intervals) were introduced. A representative example is the “mean-tone” temperament, that in its several forms allows modulation, even though mostly only through several core keys. Later, various so-called well-tempered temperaments allowed modulation to all twelve keys, while the keys retained their distinct character.
Eventually, apparently around the beginning of the twentieth century, the twelve-tone equal temperament, where all the semitones have the same size67 and all keys have identical char- acter (and where also all the subtle differences in interval and chord shades that the Greeks and others had are lost) was widely accepted in music practice and became “the only” stan- dard. Or so many musicians think. In fact, the piano that is thought to have forced the √ 67The “ratio” of the two frequencies in the equally tempered semitone is 12 2, an irrational number.
19 acceptance of the equal temperament, cannot be easily tuned that way; it is unnatural and therefore difficult for singers and players of string and wind instruments to sing or play in equal temperament; the pipe organ, if tuned in the equal division, does not sound very well. Perhaps the only instruments that can be tuned accurately in the equal temperament are electronic devices, be it digital hardware synthesizers or software instruments.
Just intonation Just intonation means tuning tones precisely, to the maximum of consonance. Harmony played in just intonation sounds well, and a critical ear recognizes that tones, intervals and chords are played “in tune”, rather than “out of tune”. As music gets more complicated, and begins to “change keys” (to modulate between keys), just intonation indeed gets in the way and on keyboard instruments needs to be replaced by some kind of a temperament. Nevertheless, just intonation does not have to be completely compromised.
Harry Partch said in his lecture on intonation:68
[...] just intonation [...] is an ancient truth [...] based on the relationship of small numbers, small numbered parts of a string [...] or [...] small frequency ratios [...] And this was true in the world prior to the advent of the piano, or let’s say the keyboard scale, which has been a monolith for about three hundred years, and in my opinion, a rather tyrannical monolith. Certainly some great things came out of it, yet it has its own truth. It’s just not the truth that I happen to like.
[...] The tempered triad is strangely uneasy, and no wonder. It wants nothing so much as to go away and sit down someplace.
[...] Equal temperament is a current habit as is also the scope for modulation which it allows. Composers can think only in equal temperament for just one reason: because it is all they have got to think in.
[...] I have said many times, and I am by no means the first to say it [...] that 12-tone equal temperament not only slams doors against any further investigation of consonance, but it also slams doors in the entire balance of the temple against any further investigation of dissonance.69
68Partch, “Barbs and Broadsides”.
20 The practice of the equal division of the octave
Equal temperament is indeed a modern state of mind, rather than a tuning that is in real use. The twelve tones of the chromatic scale are often perhaps considered to be wider “pitch bands” without predetermined precise pitch values, and the adjustment of their intonation, for music played in mathematically precise equal temperament sounds poorly, is silently left to performing musicians and to listeners.
The musicians apply what can be called “corrective tuning”, that is, they micro-adjust the pitch of played tones “by ear”, if and when possible. The listeners intuitively employ “corrective listening”: they get used to the various imperfections of intonation and consider them to be “normal”. The timbres of instruments with fixed pitch (such as the percussive character of the piano sound or the vibrato, phasing and other effects that are integral parts of the “synthesizer sound”) do “mask” tuning faults.
Hence practically performed music depends on the sensibilities of musicians and the insen- sitivity of listeners to partially correct and partially mask the artefacts of the twelve-tone tuning compromise. Machines do not have a sensibility and if not programmed to do so, will not make such tuning adjustments as a sensitive human musician would, and therefore a sensitive listener’s ear might not be satisfied.
However, thinking in terms of a general twelve-tone division of the octave, as twelve (equal?) semitones which size is not precisely specified, is pragmatic and expedient for these main reasons:
• Music composition is a complex combinatorial task. Dealing with intonation in addition to the possible tone combinations would make the task more difficult to solve. At the least, it would force composers to create much less complex works. • Playing the pitches precisely on musical instruments with a continuous pitch range (e.g. the violin) would require much more refined skills; on instruments with fixed set of pitches (e.g. keyboard instruments) it would need more complex instrument design and also more complicated playing techniques.
69Like Var`ese (p.8), Partch also credited Helmholtz for showing him the way: “Before I was twenty, I had [...] rejected the intonation system of modern Europe [...] When I was twenty-one I finally found [...] the key for which I had been searching, the Helmholtz-Ellis work, On the Sensations of Tone [...] and I began to take wing [...] I wrote a string quartet in Just Intonation [...]” (Partch, Genesis of a Music: An Account of a Creative Work, Its Roots, and Its Fulfillments, Preface to the Second Edition)
21 Computer-assisted microtonal music Microtonal music (used here as a blanket term) is made from tone material other than the standard Western chromatic scale. That includes tones and semitones of different sizes, and potentially more than twelve pitches in the octave.70
Systems with more than twelve pitches pose a difficulty: higher number of pitches make composition more complex, and intervals smaller than the semitone are difficult to intone (or even impossible to play on keyboard instruments). However, modern computers:
• can process large amounts of data fast, and therefore handle the complexity of micro- tonal music (if a suitable algorithm exists), and • used as electronic instruments, can synthesize musical sounds with precisely set fre- quencies and intensities.
Computers offer a great opportunity not only to simulate traditional instruments and or- chestras, but also to be used for sound and music analysis, as an experimental tool (perhaps the greatest “laboratory” that ever existed) and, first and foremost, a new kind of musical instrument that is flexible, precise, and can be made “smart”.
Experiments and experimental tools The ultimate test of music is how it is perceived by the listener; music must be heard to exist. The scientific approach requires that assumptions, theories, and facts be tested by experiments. In the course of our argument we therefore must conduct several fundamental experiments in perception of musical sound.
Music perception is in part innate, but also quite subjective and individual, and is very much influenced by previous conditioning on the part of the listener.71 We recommend that each reader carries out the experiments individually, to confirm that the perception of the test sounds is indeed similar to what is described in this text. As Helmholtz wrote: 70“Microtonal music”. Encyclopædia Britannica Online. 71According to recent research, music perception — the perception of its “spectral and temporal organi- zation” (“pitch and rhythmic aspects”) — requires specialized processing in the brain. The ability to discriminate tones that is sufficient for perception of pitch is already present in infants; appreciation of harmony seems to fully develop by early teenage years. There have been reports of structural brain differ- ences between adult musicians and non-musicians, detected on MRI scans and in EEG activity. (Trainor and Unrau, “Human Auditory Development”)
22 It is of course impossible for any one to understand the investigations thoroughly, who does not take the trouble of becoming acquainted by personal observation with at least the fundamental phenomena mentioned. Fortunately with the assistance of common musical instruments it is easy for any one to become acquainted with harmonic upper partial tones, combinational tones, beats, and the like. Personal observation is better than the exactest description, especially when, as here, the subject of investigation is an analysis of sensations themselves, which are always extremely difficult to describe to those who have not experienced them.72
The experiments and demonstrations within this text were prepared and carried out with the use of either freely available open-source programs, or programs that I have developed over the past number of years for research purposes.73 The used programs are available for all three major operating systems (MacOSX, Linux, and Windows):
• Audio files were edited with Audacity, “an open source program for recording and editing sound.”74 Audacity has a simple, intuitive interface, it is very easy to learn and therefore is widely used as a “Swiss Army knife” for work with digital audio files. • Sonograms were made with Sonic Visualiser,75 “an application for viewing and analysing the contents of music audio files.”76 Sonic Visualiser is a program that creates excellent sonograms that make it very easy to “look at music while listening to it.” • Time and frequency-reassigned (sharpened) sonogram-like scores were prepared with Loris,77 a library for sound morphing, and graphically presented with Qvorum,78 an experimental software for transcription of music.79 • The notation of all musical examples was typeset using ABC,80 a plain text format and a system for notating music. • Sound for examples used in tone perception and other experiments was synthesized 72Helmholtz, On the sensations of tone as a physiological basis for the theory of music, p.6. 73Development of the custom programs is ongoing, to satisfy the needs of continuing research. 74Mazzoni and Dannenberg, Audacity. 75British spelling. 76Cannam et al., Sonic Visualiser. 77Fitz and Haken, Loris. 78Burleigh, Qvorum. 79Qvorum is beeing developed as part of an ongoing research project in “mixed” music studies (Zattra, Burleigh, and Sallis, “Studying Luigi Nono’s A Pierre. Dell’azzurro silenzio, inquietum (1985) as a Per- formance Event”) and was presented at a 2012 IRCAM colloquium “Analyser la musique mixte.” (Sallis, Burleigh, and Zattra, “Seeking Virtual Voices in Luigi Nono’s A Pierre (1985) through a Study of a Per- formance and the Creative Process”) 80Walshaw et al., The ABC Music project.
23 with Csound,81,82 an audio synthesis and processing system that has a long genesis going back to the MUSIC-N programs developed by Max Mathews,83 beginning in the 1950s at the Bell Labs. • The tone arrangements in the 53-division of the octave were completed and sequenced in Qvinta,84 another experimental software that is under development. Their sound was synthesized using the SuperCollider 85 sound synthesis server. • The real-time interactive music system that is briefly introduced in Chapter 5 has been implemented in Qvin.86 Qvin is a suite of software components that include a custom scripting language, a collection of various libraries, and a plug-in system through which the suite can be extended by new components. (Qvorum and Qvinta may be in the future integrated into Qvin as its plug-ins.)
Sound files The sound for experiments in musical tone perception was synthesized using the Csound software and stored as high-quality sound files (48 kHz sampling rate and 32-bit depth samples). Their compressed mp3 version is available for online listening, along with the source Csound files that were used to synthesize the sound. Whenever they are referred
to in this text, the names of the online mp3 files are labelled by an MP3 tag, and the names
of the corresponding Csound source files are labelled by CSD .
The reader is encouraged to re-synthesize the sound, possibly changing parameters of the experiments (frequencies, amplitudes, times, etc.), if so required. To play the synthesized sounds in real time, run:
csound -f -m6 -d -odac
The command line flag -f selects the floating-point (32-bit) representation of the samples (replace it with -3 to use 24-bit samples if the computer audio system does not accept 32-bit samples), -m6 and -d are used to suppress unimportant informational messages, and -odac directs the synthesized sound to play through the default audio interface. To store the synthesized sound as an audio file (WAVE format), replace -odac with -o
81Vercoe and Ellis, “Real-Time CSOUND: Software Synthesis with Sensing and Control”. 82Vercoe et al., Csound. 83http://www.csounds.com/mathews/max_bios.html 84Burleigh, Qvinta. 85McCartney et al., SuperCollider. 86Burleigh, Qvin.
24 Audio equipment It is necessary to conduct the experiments in an environment and using audio equipment that allow attentive listening to high-quality sound, reasonably free from distortion, noise, hums, and other interferences. The built-in sound chipsets or sound cards of the modern laptop or desktop computers are satisfactory, but playing sound through a good quality external audio interface is still preferable.
It is most important to use a good quality pair of headphones. The computer built-in speakers or small external computer speakers are unacceptable: they do not radiate frequencies below about 160 Hz and their sound is often distorted. Various “earphones” or “earbuds” and “in-ear monitors” are not sufficient, either: they are designed to fit into ear canals, tightly seal them and thus to isolate the listener from outside noise. In doing that, they change the natural air pressure conditions within the ear and also exclude the effects of the ear canals and earlobes that are essential for natural sound perception.
The best results should be achieved with a pair of over-the-ear headphones that have a flat frequency response, such as are used for audio monitoring and mixing in a recording studio. Listening to the synthesized sound over a pair of high-quality loudspeakers would be also satisfactory, albeit less than when using headphones: many of the experiments include pure sine waves (tones without higher harmonics) and unless one listens in an anechoic room, natural room resonances would also interfere with the experience.
When listening to computer-generated sounds, especially using headphones, always be careful and protect your hearing! To avoid discomfort and potential hearing damage, begin with the volume of the sound lowered, and keep the headphones slightly off your ears. If all sounds well, then put the headphones over your ears and listen more closely.
The outline of the work The outline of this work is as follows:
1. First, we will introduce and discuss the sensation and perception of tone pitch, and the phenomenon of the harmonic series as a natural origin and also a constraint of the Western music system. 2. Next, we will show why Western music is based on the division of the octave into twelve steps of the chromatic scale, as a practical compromise between the precision of intonation and the complexity of the tone system. We will point out the fundamental
25 problem of tuning that exists due to incompatibility of intervals, and briefly review and test the psychoacoustic theory of tone consonance developed by Herman Helmholtz. 3. Next, we will demonstrate the traditional method of resolving the tuning problem: tempering the twelve-tone system. We will compare the Pythagorean tuning, just intonation, a “quarter-comma” mean-tone temperament, and the equal twelve-tone temperament. We will show on a brief example how the character of the same piece of music changes when the music is played in different temperaments. 4. Next, we will make a case for a fifty-three tone system as the domain for machine- assisted microtonal tone arrangement, and propose how to notate the fifty-three pitches in the Western staff notation. 5. Finally, we will show how the method of machine-assisted tone arrangement has been used to compose a suite of short musical etudes, and to create a dynamic system for musical interplay between a man and a machine.
26 Chapter 2: Tone sensations
The basic building elements of music are tones. Tones are sound elements with periodic waveforms, definite frequency and therefore distinct pitch. Even though the perception of pitch has a subjective component, it is in principle logarithmically related to tone frequency, at least in the middle range of audible frequencies.
The sensitivity and precision of hearing and therefore the ability to perceive and analyze musical sound varies significantly from person to person. It also depends on other factors: on how rested the person is, on the listening environment, on the time of the day, etc.
It is beneficial to complement listening analysis of musical sound by its analysis with the help of machine-based (computer) analytical tools, using visual representation of the sound in question; this allows the listener to “see the sound while listening to it”. Seeing a visual image associated with a sound helps to focus hearing to its aspects that otherwise might be missed. Visual images, however, should be used only for guidance and ought not to replace careful listening to the sound. Listening tests and “judgement of the ear” are most important.
In this work, two such tools are used: the sonograms (of the usual kind) obtained by Fourier analysis that provide a graphical representation of sound intensity in the two-dimensional time×pitch domain, and the “pitch spiral”, a novel tool for visual representation and control of synthesized tones in pitch perception experiments.
The pitch spiral The “pitch spiral” is a novel tool that has been developed for experiments that investigate perception of tones and their combinations. The tool integrates sound synthesis of tones with a graphical display that indicates the presence and relations of the tone frequencies and, optionally, their differences (that are perceived as difference tones). The visual representation of sounding pitches thus supports and clarifies the perception of the sound. A number of music theory concepts, such as the character of various possible musical intervals and chords, their voicings and inversions, or the nature of scales, tuning systems and temperaments, etc., can be also demonstrated on the spiral.1
1A detailed discussion of those is beyond the scope of this text, and is reseved for another publication.
27 The current version of the pitch spiral2 is a web-based application, written in JavaScript using the D3.js3 and Vinci 4 libraries. The graphical part is realized as an SVG5 code embedded in an HTML6 page with standard HTML user interface elements. The pitch spiral works in any modern web browser that supports the HTML5 standard. A running instance of the spiral is available online. Since web technologies are ready for sound file playback, but not yet for efficient sound synthesis, it requires locally installed Csound and a running WebSocket7 bridge (that passes commands from JavaScript running inside the HTML page to Csound) to synthesize sound. Appendix B describes in more detail how to use the pitch spiral.
Figure 2.1: The pitch spiral
The curve of the spiral is a continuous scale of a uniformly stretched continuum of pitches,
2There were two previous versions of the pitch spiral, prototyped in SuperCollider/Tk and Java/Swing programming environments. 3Bostock, Heer, and Ogievetsky, D3.js. 4Burleigh, Vinci. 5Scalable Vector Graphics, a language to describe two-dimensional vector graphics. http://www.w3.org/ Graphics/SVG/ 6Hypertext Markup Language, a standard language for creating web pages. http://www.w3.org/html/ 7A protocol for communication between web pages and network servers. http://dev.w3.org/html5/ websockets/
28 from C1 to C8. Each turn of the spiral represents one octave. C4 (the “middle C”) serves as an origin of the pitch scale and is indicated by a red circle (Fig. 2.1).
Figure 2.2: Source and difference tones on the pitch spiral
In the centre of the spiral there is a “repository” of five coloured discs (blue, green, or- ange, yellow, and white). When the discs are dragged from the centre onto the spiral, they represent respective pitches (fundamental harmonics) of up to five tones. Each tone can have a harmonic spectrum of between one to six harmonics,8 and its synthesized sound re- flects the displayed spectrum. The difference tones (see p.52) are displayed as circles with perimeters drawn by a combination of the colours of their two resepective source harmonics (Fig. 2.2).
8The range of six harmonics (the “senario”, see p.46) was chosen for the purpose of investigation of 5-limit (see p.35) sonorities. The range of harmonics can be extended beyond six for experiments with higher-limit tuning systems (that are among the planned topics of future work).
29 Just tuning 12-tet 53-tet Figure 2.3: Various grids on the pitch spiral
The spiral can be overlaid by various radial grids that indicate the positions of pitches in respective tuning systems, namely just tuning and 12-tone and 53-tone equal divisions of the octave (Fig. 2.3), but also a number of other equal divisions, meantone temperaments, tuning systems generated by other intervals than the fifths, etc.
Sonograms The theoretical foundation for sonograms (“images of sound”), also known as spectrograms (“images of [sound] spectrum”) is the Fourier theorem that is based on these principles:
• Complex periodic waves are composed from a series of simple (sinusoidal) waves, • the frequencies of the sinusoidal component waves are whole multiplies of the funda- mental frequency (the frequency of the complex wave), and • each component wave is characterized by its frequency, phase, and amplitude.9
Fourier theorem is the basis for a Fourier transform, a mathematical method that breaks down a complex wave into a series of its component simple waves.
Helmholtz hypothesized that the inner ear performs a type of Fourier transform by mechan- ical means: the basilar membrane within the cochlea resonates (sympathetically vibrates) with the sound; the resonant frequency of the membrane changes along its length, and there- fore sounds of differing frequencies (that is, the component waves of a complex wave) induce vibrations at different places along the membrane where they are detected by the hair cells in the organ of Corti. Although this “place theory” is an explanation that is no longer fully accepted and the mechanism of frequency separation within the ear is apparently much
9“Fourier theorem”. Encyclopædia Britannica Online.
30 more complicated,10,11 it is a useful simplified analogy that helps to explain how we perceive complex sounds.
Sonograms are computed by application of Short-time Fourier transform (STFT) that uses a “sliding window” to isolate short intervals of the sound signal that is being analyzed. For digital signals, STFT is realized as Fast Fourier transform (FFT), an efficient algorithm that allows real-time processing of sound.12,13 The sound signal is converted from the “time- domain” (the instantaneous amplitude of the sound waveform as a function of time) into the “frequency domain” (the sound amplitude as a function of frequency). A sonogram shows the frequency content (the so-called frequency spectrum) of sound in time, and the image is strongly related to what the listener hears.14
The typical sonogram is a two-dimensional graph: time is plotted on the horizontal axis in the left-to-right direction, frequency (or pitch) on the vertical axis in the bottom-up direction, and the sound amplitude is indicated by colour. A sonogram, read left-to-right, shows the “evolution of sound” in time.
FFT operates on digital audio data. Digital audio is sampled, i.e. a continuous waveform is represented by discrete samples (measurements of its instantaneous amplitude) taken at regular time intervals (the samling rate). The width of the STFT window, as a number of samples,15 determines the number of so-called “frequency bins”: frequency ranges of cumulative frequency domain data; there is not individual information for every possible component frequency in FFT data.
The width of the STFT window (related to its length in time) also determines the compromise between the time and frequency resolution of the sonogram. This is not unlike listening to sound by ear: a longer window (longer lasting sound) leads to better frequency accuracy at the expense of time resolution. A shorter window gives better time resolution but not enough time to determine the frequency accurately. In addition to this unavoidable compromise
10“Human ear: Transmission of sound within the inner ear”. Encyclopædia Britannica Online. 11Wightman and Green, “The Perception of Pitch”. 12Benson, Music: a mathematical offering. 13Loy, Musimathics: The Mathematical Foundations of Music, Volume 1 and 2 . 14Fourier transform yields two values for each component frequency: its amplitude and phase. Since the phase of vibrations makes no difference for sound sensation, in sonograms the phase information is usually omitted. 15In FFT, the width of the window is a power of two.
31 between the accuracy of time and frequency information, the windowing principle itself introduces additional artefacts into a sonogram. Sonograms therefore must be interpreted with caution.
This phenomenon can be demonstrated by a simple experiment (Example 2.1):
Example 2.1: MP3 2.1 CSD 2_1_tones.csd This sound file contains a synthesized sequence of seven one second long tones with
frequencies that correspond to a whole-tone scale from C4 to C5 in twelve-tone equal temperament. The tones have a sinusoidal waveform and a constant amplitude envelope with short linear rise and release segments. The frequencies of the tones were calculated as f = 261.63 × 2n/6, n = 0..6.
(a) w=1024 (21ms); lin (b) w=4096 (85ms); lin (c) w=16384 (340ms); lin
(d) w=1024 (21ms); log (e) w=4096 (85ms); log (f) w=16384 (340ms); log Figure 2.4: Sonograms of a whole-tone scale
Fig. 2.4 shows six different sonograms of the synthesized whole-tone scale, created with three different widths of the STFT window and displayed on either a linear or logarithmic frequency scale. The sonograms in the left column (a, d) were made with the window width of 1024 (210) samples, which corresponds to about 21 milliseconds of the sampled signal. The sonograms in the centre column (b, e) were made with the window width of 4096 (212) samples, about
32 85ms, and in the right column (c, f) with the width of 16384 (216) samples, about 340ms. It can be clearly seen how shorter lengths of the window discriminate frequencies less precisely (the frequency bins are wider) but do show the onset and the duration of the tones very well. Longer window lenghts, on the other hand, discriminate frequencies well but the onset times and tone lengths are “blurred”.
The sonograms in the top row of Fig. 2.4 (a, b, c) display the frequency on a linear scale. Since the computed tone frequencies rise exponentially, the series of displayed tone images curves slightly upward. In the bottom row (d, e, f), the frequency is displayed on a logarithmic scale, and the tone images lie on a straight line. Since the frequency bins of FFT are of the same size throughout the frequency range, on a logarithmic scale the lower bins are wider and the higher frequency bins are narrower.
This observation relates to how we hear musical sound: the perceived pitch of tones is roughly logarithmic in relation to their frequency,16 the pitch of tones in the lower frequency range is perceived less acutely than the pitch of higher tones, the pitch of long sustained tones can be determined more precisely, while the pitch of short tones in fast musical passages is perceived as being less distinct.
Reassigned sonograms The technique of time-frequency reassignment can be used to “sharpen” the sonogram image. Since musical sound for the most part consists of tones (distinct frequencies separated by frequency gaps) and silences (time gaps), it is possible by using the phase information (that is otherwise ignored in sonograms) to move the frequency and time points within the frequency bins and window lengths to more specific positions.17,18
An implementation of the time-frequency reassignment technique for sound analysis is avail- able in the open-source library Loris.19 The Loris library performs time-frequency reas- signed analysis of a sound file, and then detects “partials”: higher intensity “ridges” in the
16This is a simplified relation that holds well for the middle range of audible frequencies and for complex tones that contain a number of harmonics. For simple tones of lower frequencies (below ca. 500Hz) the relationship is roughly linear. Individual results vary and are subject to perceptual biases (Stevens, Volkman, and Newman, “A Scale for the Measurement of the Psychological Magnitude Pitch”). 17Fitz and Haken, “On the Use of Time: Frequency Reassignment in Additive Sound Modeling”. 18Fitz and Fulop, “A Unified Theory of Time-Frequency Reassignment”. 19Fitz and Haken, Loris.
33 time×frequency space of the reassigned sonogram that likely represent continuous pitched sounds (typically the harmonics of complex tones). Data that describe the detected partials are then available to a calling program.
Figure 2.5: The whole-tone scale in Qvorum
The Qvorum20 software is a graphical editor that displays time-frequency reassigned data provided by Loris. Each “partial” is shown as a single glyph.21 Fig. 2.5 shows the sound file from Example 2.1 analyzed by the Loris library and displayed in Qvorum. Notice that the displayed glyphs indicate very closely the pitch and time span of the synthesized sound, and align well with the background 12-tet grid. Small artefacts on each end of each glyph are the result of STFT windows “sliding in and out” and are related to the “blurring” seen on the respective sonogram in Fig. 2.4.
This simple experiment shows that for sparse musical textures reassigned sonograms ade- quately represent musical sound, and can be therefore used for analysis of its microtonal pitch content.
The harmonic series Many physical bodies can vibrate in a simple harmonic motion, repeatedly to-and-fro through a point of equilibrium.22 Simple harmonic motion produces waves of a sinusoidal character and therefore the sound of simple tones. Many “sonorous” physical bodies can vibrate simul- taneously in several modes of vibration with different frequencies. In particular, stretched strings or enclosed columns of air, such as those that are the basis of string and wind musical
20Burleigh, Qvorum. 21A glyph is a small graphical symbol, such as a short line or curve. 22“Harmonic motion”. Encyclopædia Britannica Online.
34 instruments, vibrate as their whole lengths, in two halves, three thirds, four fourths, etc. The frequencies of their vibrations are thus related by ratios 1:2:3:4:. . . , and form the so-called harmonic series. The lowest frequency in the series is called the fundamental frequency, or the first (the lowest) harmonic. The higher frequencies within the series are its higher harmonics.
Vibrations related by the harmonic series, in most cases produced by the many modes of vibration of the same sonorous body, occur so frequently that they are perceived by hearing as a single complex tone, rather than as a set of individual simple tones; this effect might be, at least in part, the result of our conditioning by a lifetime experience of listening to such sounds. It is certainly often possible to hear individual harmonics and distinguish them by ear, but in most cases the harmonics of a complex tone do perceptually “fuse”. Most of the time we recognize changes in relative strength of the harmonics only as changes in the timbre of the musical sound.23
Since the frequencies of the harmonic series are equidistant (separated by the same frequency difference) and pitch is logarithmically related to frequency, the musical intervals formed by subsequent pairs of frequencies in the series progressively decrease in size.24
Table 2.1 lists the first twenty-four intervals found in the harmonic series and indicates which of them are part of the twelve-tone system of Western music. Note that those intervals are formed by frequency ratios of numbers which prime number factorization contains only prime numbers 2, 3, and 5. Such intervals and tone systems made with their use are called to be 5- limit. The intervals formed with higher prime numbers, most notably the number 7 (the first prime number after 5) that occurs in the 7:6 ratio (the seventh harmonic), sound as if they
23Reportedly, there are detectable differences in neural activities between non-musicians and musicians, and even non-musicians develop brain networks that are influenced by the pitch structure of the music in their environments. There is also reported evidence suggesting that pitch (as a fusion of individual harmonics) is extracted from the complex tone prior to its processing in the auditory cortex. The detection of a harmonic series might be peripheral in the neural system, which would explain the emergence of sensitivity to consonance and dissonance early in human development, and the apparently innate preference for consonance over dissonance (a mistuned fifth) that has been found in six months old infants. (Trainor and Unrau, “Human Auditory Development”; Folland et al., “Processing simultaneous auditory objects: infants’ ability to detect mistuning in harmonic complexes”) 24The size of a musical interval between two tones is related to the ratio of the two frequencies. Adding intervals requires multiplication of frequency ratios; an inversion of an interval requires inversion of the 2 1 frequency ratio. For example: 1 is the ratio of an ascending octave, 2 of a descending octave. Adding a 4 3 3 4 2 6 3 fourth ( 3 ) to a fifth ( 2 ) gives an octave ( 2 × 3 = 1 ). Subtracting a minor third ( 5 ) from a fifth ( 2 ) gives 3 5 5 a major third ( 2 × 6 = 4 ), etc.
35 were “outside” of the twelve-tone system. “Higher-limit” tone systems (systems that contain higher-limit intervals) are of progressively higher complexity than the 5-limit twelve-tones system of Western music.
frequency ratio interval frequency ratio interval 2:1 octave (P8) 14:13 3:2 fifth (P5) 15:14 4:3 fourth (P4) 16:15 diatonic semitone (m2) 5:4 major third (M3) 17:16 6:5 minor third (m3) 18:17 7:6 19:18 8:7 20:19 9:8 major tone (M2) 21:20 10:9 minor tone (M2) 22:21 11:10 23:22 12:11 24:23 13:12 25:24 chromatic semitone (m2)
Table 2.1: Intervals of the harmonic series
Tuning by the harmonic series, i.e. using frequency ratios of small numbers, is the so-called natural tuning or just tuning. Any 5-limit interval can be tuned by a sequence of octaves (the prime number 2 in 2:1), of fifths (3 in 3:2), and of major thirds (5 in 5:4). For example, a diatonic semitone can be tuned by this sequence: one octave up, one fifth down, one third 2 2 4 16 down, 1 × 3 × 5 = 15 (the difference between a perfect fourth and a major third). A chromatic 5 5 2 25 semitone can be tuned as two thirds up and a fifth down, 4 × 4 × 3 = 24 (the difference between an augmented and perfect fifths), etc.
Note (and this is a clear indication that the twelve tone system is a compromised simplifi- cation of a much more complex system) that in just tuning there are two kinds of the major second (the major and minor whole tones) and two kinds of the minor second (the diatonic and chromatic semitones).
Relations among the harmonics of a complex tone (that is, among the simple tones that form its harmonic series) can be nicely illustrated on the pitch spiral. Fig. 2.6 shows a sequence of six images of the spiral that display one to six harmonics of C4. The spiral is
36 overlaid by two grids of radial lines: the thicker lines show the angles that correspond to justly tuned intervals, while the thinner lines divide the octave into fifty-three equal small intervals. The coloured discs represent the harmonics of the tones; the position of each disc indicates the pitch of the corresponding harmonic and the size of each disc is related to the strength (loudness) of the harmonic. There is one full clockwise turn of the spiral between the first and second harmonics (an octave), slightly over half a turn between the second and third harmonics (a fifth), less than half a turn between the third and the fourth harmonics (a fourth; added to the fifth between the second and third harmonics they make an octave between the second and the fourth harmonics). The fourth and the fifth harmonics, and the fifth and the sixth harmonics form a major third and a minor third, respectively.
Note that the thicker radial lines that indicate the justly tuned intervals, relatively to C4, do quite nicely align with the thin lines of the 53-division. That is a fortuitous coincidence that is exploited further in this work.
The harmonic series is inherent in most musical sounds (made by vibrating strings and pipes) and therefore strongly influences, or even determines the character of what is music. Even if all musical instruments were made to avoid the harmonic series (for example by allowing only percussion instruments, or instruments with so-called non-harmonic timbres, such as bells, xylophones, steel drums etc.) and thus prevent its impact on what is music, the harmonic series and our conditioning to it cannot be escaped for the one and very reason that it is present in the human voice! Therefore anyone who sings or is sung to is inevitably conditioned to perceive the harmonic series of simple tones as a “fused” unity of a complex tone.
It is possible to learn, quite easily for an attentive listener, to hear separately a few of the harmonics sounded by, for example, a vibrating string. Helmholtz claimed to be able to hear up to sixteen to twenty harmonics of thin strings, with the help of his custom-made ear-fitted and sealed resonators.25
Harmonics of the singing voice Let us now analyze a short recording of a singing voice (Example 2.2). The singer is Iva Bittov´a,a Moravian natural-born musician.26 Her trademark is solo singing while accompa-
25Helmholtz, On the sensations of tone as a physiological basis for the theory of music, p.47. 26http://www.bittova.com/
37 nd (1) The fundamental of C4. (2) Add 2 harmonic (P8).
(3) Add 3rd harmonic (P5). (4) 4th harmonic (P4).
(5) 5th harmonic (M3). (5) 6th harmonic (m3). Figure 2.6: Harmonics of a complex tone, shown on the pitch spiral
38 nying herself on a violin; the combination of the voice and the sound of violin open strings strongly emphasizes the role of the harmonic series in her music. In the example she sings an opening of a Moravian folk song.27 Her intonation is flawless and, except for the folk-style voice modulation at the onsets of sung tones, there is not any prominent vibrato that would mask the tuning precision. Music notation of the example is shown in Fig. 2.7.
Example 2.2: MP3 2.2
rit. Figure 2.7: Tuˇzba(Desire)
Figure 2.8: A sonogram of the singing voice, linear frequency scale
Fig. 2.8 shows a sonogram of the recording on a linear frequency scale. The displayed frequency range (on the left) is from about 100 Hz to about 15 kHz. The horizontal dark lines show the detected harmonics of the voice. The lines are mostly straight, showing steady, sustained pitch. The ocassional, limited vibrato can be seen as the orderly undulation of the lines. The vertical darker bars show the fricative and plosive consonants (in the sung lyrics) that are characterized by broadband frequency content.
27Jan´aˇcek, Tuˇzba (Desire).
39 On the sonogram image it is clearly possible to identify up to 32 harmonics, from the fun- damental frequency in the range of about 300–500 Hz, to the highest harmonic frequency in the range of 9–16 kHz. Note that all harmonics are vertically equally spaced, since they are integer multiples of the fundamental frequency (the harmonic frequencies are equidis- tant).
Figure 2.9: A sonogram of the singing voice, logarithmic frequency scale
Fig. 2.9 shows the same sonogram on a logarithmic frequency (linear pitch) scale. The harmonics of the voice now clearly form a series of intervals of decreasing size (the octave, the fifth, the fourth, etc.).
The recording has been converted to the time-frequency reassigned sonogram. Fig. 2.10 shows the data directly as produced by the Loris library and displayed by the Qvorum software. The thickness of the glyphs indicates the amplitude of corresponding harmonics. 28 The horizontal grid lines labelled 8–14 indicate the octaves from C4 to C10.
The weak harmonics were then eliminated from the sonogram. Fig. 2.11 shows a sonogram that contains only the stronger harmonics.
28The ordinal number scale for octaves follows the convention used, for example, in Csound. If the number 8 indicates C4 (the “middle C”) that has a frequency of about 262 Hz, then 0 indicates the C eight octaves below with a frequency of about 1.02 Hz, and the full range of notes can be comfortably represented by non-negative numbers.
40 Fig. 2.12 shows a closer view of the sonogram overlaid by a 12-tet grid. The coincidence of the partials with the grid lines clearly suggests that during the recording session the 29 musicians tuned to the A4 = 440 Hz pitch standard. Note that the first five harmonics are approximately 12, 7, 5, 4, and 3 equally tempered semitones (an octave, fifth, fourth, major third, and a minor third) apart.
In Fig. 2.13, only the fundamental harmonics (that correspond to the perceived pitch of the complex tones) remain. The sonogram shows only the melodic range G4–D5. Finally, Fig. 2.14 shows the opening phrase in close view. Note that the pitches of the sustained tones, especially the opening ascending fifth G4–D5 and the C5 in the ascending step-wise motif very well coincide with the 12-tet grid.
Figure 2.10: Time-frequency reassigned sonogram of Example 2.2
30 31 In the second example of a singing voice (Example 2.3a), the violin plays a G4 drone (apparently as the second harmonic of the open violin G string — a “flageolet” tone) and the voice sings a melody that is very well tuned relatively to the drone pitch. Music notation
29The pitch standard historically varied from about 370 to 560 Hz. The current standard of 440 Hz, known as the Stuttgart pitch, was recommended by Johann Heinrich Scheibler and accepted by the German Association of Natural Philosophers (die deutsche Naturforscherversammlung) in 1834. (Ellis, “The History of Musical Pitch”) 30Jan´aˇcek, Jabluˇnka(Apple Tree). 31Played by a member of the Skampaˇ String Quartet
41 Figure 2.11: Example 2.2, strong harmonics only
Figure 2.12: Example 2.2, 12-tet grid of the example is shown in Fig. 2.15, and the time-frequency reassigned sonogram in Fig. 2.16. Note that in this case not all the glyphs on the sonogram align completely with the 12-tet grid (although the tones apparently do sound to be very well in tune). Specifically:
42 Figure 2.13: Example 2.2, fundamental harmonics
Figure 2.14: Example 2.2, the first phrase
• The G4s played by the violin and sung by the voice are perfectly in alignment with the horizontal grid line. That makes them a good reference point.
• The two D5s are slightly above the grid line. A justly tuned fifth (G–D) is apparently sharper than the equally tempered one.
• The first two sung B4s are below the grid line. This demonstrates that the equally tempered major third (G–B) is sharp compared to natural tuning. The second two
43 sung B4s are closer to the grid line (they are intoned slightly higher, since they “lead” to
the C5 tones that follow them), which clearly demonstrates that tones notated by notes of the same name can have slightly (“microtonally”) different pitches, and therefore different character or “shade”.
Example 2.3A: MP3 2.3A
Figure 2.15: Jabluˇnka (Apple Tree)
Figure 2.16: A sonogram of a violin/voice duet.
The different tuning and the consequent difference in the character of the two pairs of B4s can be heard in the next example and seen in Fig. 2.17. The four tones were cut from the original sound file and spliced together.32
32 Note that in the spliced excerpt one can hear the pitches G4 – G4 - D5 - A4, tones that were not apparent in the original, as if they were played by plucking the violin strings. This is an artefact caused by reverberation that is present in the recording. What we hear are the reverberation tails of the tones sung just before the selected part. The sung tones to which the reverberation would be normally perceptually attributed are absent from the spliced sound file. The cut parts have a very short fade-ins that are reminiscent of percussive sounds and therefore create an illusion of the sound of a plucked violin string.
44 Example 2.3B: MP3 2.3B
Figure 2.17: Two shades of B4
Microtonal notation
Microtonal variations in tone pitch can be, with some difficulties, notated using the Western staff notation. For example, the standard natural and accidental signs (\, ], [) can be adorned by small arrows pointing up or down that indicate that the pitch of the respective tone should be slightly sharpened or flattened relative to its normal value. What is a normal value of the pitch and what should be the size of the notated differences from it is a good question, of course.
Fig. 2.18 illustrates how the last example, as shown on the sonogram in Fig. 2.16, could be notated microtonally. It assumes that the microtonal accidental signs indicate a deviation from the twelve-tone equal temperament. Note that in the microtonal notation as used within this text, the meaning of the natural and accidental signs is not carried all the way to the end of the measure, but alters only the immediately following note.
Figure 2.18: Jabluˇnka — microtonal notation
45 The intervals of the harmonic series The harmonic series is the foundation of just (natural) tuning that we have seen in the two examples of a singing voice. The series and the intervals found in it have been studied since Antiquity. Until the early Middle Ages, only the unison, the octave, the fifth, and the fourth were considered to be consonant intervals. The thirds and the sixth were deemed dissonant since, in the Pythagorean view, in their justly tuned form they were “incompatible” with the consonant intervals. The polyphony of early European Renaissance gradually introduced the thirds and the sixths as consonant intervals.33
The first six integer numbers (1 to 6, the “senario”) and the first five intervals (or six, if the unison is included) of the harmonic series formed by them were, and indeed still are, the basis of Western twelve-tone music. At first, numerology was usually used to explain the importance of the number six in music: God created the world in six days, there were six known planets at the time, six faces on the cube, etc. Since the seventeenth century there were numerous advances in music theory based in physics and mathematics, presented in the works of music theorists and scientists: Descartes, Mersenne, Rameau, Euler, d’Alembert and others,34 and the work culminated in the middle of the nineteenth century in Helmholtz’s Die Lehre von den Tonempfindungen.
The modern view of the function of the intervals found in the harmonic series is based on the psychoacoustic theory that Helmholtz proposed and which, at least in my view and for our purposes, still firmly stands. The role of the first several intervals of the harmonic series is summarized below. Keep in mind that:
• In an interval sounded by two complex tones (that may or may not be related by the harmonic series), each of the complex tones is itself a sum of a harmonic series of simple tones. • In a complex tone that is naturally created, for example by a vibrating string, the amplitude of higher harmonics progressively decreases; the lower harmonics (the lower components of the harmonic series) are stronger and therefore more audible and per- ceptually more important. • By the principle of parsimony of the Ockham’s razor, simpler explanations are to be 33Cohen, Quantifying music: the science of music at the first stage of the Scientific Revolution, 1580–1650 . 34Barbour, Tuning and Temperament: A Historical Survey.
46 preferred.
The unison, 1:1
In the domain of musical intervals, the unison is an identity relation with respect to pitch. Two complex tones of the same timbre (of the same harmonic content) are (do sound) the same, since all their respective harmonics coincide in frequency:
1 2 3 4 5 6 ... (harmonics of the one tone) 1 2 3 4 5 6 ... (harmonics of the other tone)
The octave, 2:1
The octave is an equivalence relation35 with respect to pitch. A tone that is transposed by an octave (or several octaves) up or down is “equivalent” to (i.e. it can reasonably well replace) the original tone. The (numbered) harmonics of two tones that are separated by an octave align this way:
1 2 3 4 5 6 7 8 9 10 ... (the lower tone) 1 2 3 4 5 ... (the higher tone)
The higher tone does not contribute any new harmonic content to the aggregate result of the two tones, it merely strengthens every other harmonic of the lower tone. The sensations of the two tones therefore fuse very well.
The octave is the most “simple” (most consonant) interval of two tones with different pitches.
The fifth, 3:2
Between two tones that form a fifth, one half of the harmonics of the higher tone strengthen the harmonics of the lower tone and the other half introduce a new harmonic content: harmonics that were not present in the lower tone.
1 2 3 4 5 6 7 8 9 10 11 12 ... (the lower tone) 1 2 3 4 5 6 7 8 ... (the higher tone)
34“Entities are not to be multiplied beyond necessity.” (“Ockham’s razor”. Encyclopædia Britannica Online) 35A relation that is symmetric, reflexive, and transitive. (“Equivalence relation”. ibid.)
47 Transposing a tone by a fifth thus introduces a new pitch that is not “equivalent” to the first one; at the same time, two tones at the distance of a fifth still “fuse” well enough that the fifth can be used as a simple harmonic “colouration” (organum36 ).
The fourth, 4:3
The fourth is an inversion of the fifth with respect to the octave: transposing the lower tone of a fifth or a fourth an octave up, or transposing the higher tone an octave down turns 3 4 the fourth into a fifth and vice versa; a fourth added to a fifth make an octave ( 2 × 3 = 2 1 ). Therefore the fourth is inherently present in a system constructed from octaves and fifths and is not an interval that would introduce new complexity (a new dimension) to the system.
1 2 3 4 5 6 7 8 9 ... (the lower tone) 1 2 3 4 5 6 7 ... (the higher tone)
The (major) third, 5:4
The third does introduce a new dimension into the system. (We shall see how the essentially one-dimensional tuning system of octaves and fifths becomes two-dimensional with the in- troduction of the third.) Every fourth harmonic of the higher tone coincides with every fifth harmonic of the lower tone. The two tones are thus reasonably strongly “anchored” to each other,37 yet each has a quite distinctly different harmonic content.
1 2 3 4 5 6 7 8 9 10 ... (the lower tone) 1 2 3 4 5 6 7 8 ... (the higher tone)
The minor third, 6:5
The minor third is simply an inversion of the major third with respect to the fifth, analogous to the fourth being an inversion of the fifth with respect to the octave. The minor third is thus inherently present in a system that contains fifths and major thirds.
36In mediaeval chant, a second voice doubling the principal voice a fourth or a fifth below. (“Organum”. Encyclopædia Britannica Online) 37Provided that their timbres have at least five harmonics. However, if difference tones (p.52) are taken into account, two harmonics suffice to strongly connect the two tones of the major third. E.g. two tones with frequencies of 400 and 500 Hz also contain harmonics of 800 and 1000 Hz. The difference tones of 500−400 = 100, 1000−800 = 200, 800−500 = 300, and 1000−400 = 600 Hz nicely complete the “senario” of 100–200–300–400–500–600 Hz.
48 Furthermore, the inversion of the major or minor third with respect to the octave makes the minor or major sixth, respectively; the inversion of the major or minor third with respect to the fourth makes the minor or major second, and the inversion of the major or minor second with respect to the octave makes the minor or major seventh. All intervals of the Western chromatic scale are present in a system generated by the octave, the fifth, and the third.
The subminor third, 7:6
The next interval of the harmonic series is, in Helmholtz’s terminology, the subminor third. It cannot be derived from the first six intervals and therefore introduces another level of complexity (a third dimension) to the tuning system. The twelve-tone system generated by the three of the first five intervals in the harmonic series (that is, the octave, the fifth, and the third) is already sufficiently complex to support a multitude of musical works created over the course of a number of centuries. The complexity of a system that includes the subminor third among the generating intervals would, conceivably, be overwhelming, and therefore this interval and also all other higher-limit intervals (intervals which frequency ratios contain prime numbers 7 and higher) are not used in Western music.
Consonance and dissonance, redefined In the usual context of a music discourse, consonance is “a harmonious sounding together of two or more tones [...] with an ‘absence of roughness’.” Dissonance is “then the antonym to consonance with corresponding criterion of ‘roughness’ [...] [that] implies a psychoacoustic judgement.”38 This meaning of the word ‘consonance’ applies to “stable” intervals and chords (the unisons and octaves, fifths and fourths, thirds and sixths, major and minor triads, and such) and ‘dissonance’ to intervals and chords with more “tension” (the seconds and sevenths, augmented and diminished chords, and such). The perception of certain intervals and chords as consonant or dissonant has changed throughout music history and is, to a point, graded subjectively.39 There is, of course, the implied assumption that the intervals and chords in question are played well “in tune” and that they are judged to be consonant or dissonant solely based on the intervallic relations among their tones.
For the remainder of the text we redefine the meaning of ‘consonance’ and ‘dissonance’ in
38“Consonance”. Oxford Music Online, (Claude V. Palisca). 39“Consonance and dissonance”. Encyclopædia Britannica Online.
49 the following way:
Consonance: Two or more tones sounding together are consonant if the perceived “rough- ness” or “tension” is locally minimized and cannot be improved by microtonal adjust- ment of their pitches. That is, if any of the consonant pitches is slightly sharpened or flattened (by significantly less than a semitone), the “roughness” or “tension” in- creases. By “locally” we mean that the roughness cannot be lowered by adjusting the tone pitches within their “neighbourhood”,40 but could be lowered “globally”, i.e. by changing the pitches by larger amounts (more than a fraction of the semitone). Such adjustment would, of course, change the character of the interval or the chord.
Dissonance: The opposite of ‘consonance’.
“Consonant” resp. “dissonant” thus means “sounding well in tune” resp. “out of tune”. A sensible musician that sings or plays a string or wind instrument in a choir or an orchestra, given a note to sing or play, naturally does what we have just described: makes the tone sound “locally consonant”, i.e. microtonally adjusts its pitch so that the tone sounds as much in tune as possible, however strong the cumulative tension among all tones sung or played by the musical group may be.
Our goal is to do the same thing computationally: find the points of local consonance and use them in computer-assisted microtonal music.
Helmholtz’s Tonempfindungen Hermann von Helmholtz (1821—1894) was a truly versatile scientist and “the last scholar whose work [...] embraced all the sciences, [...] philosophy and the fine arts,” who made a number of significant contributions to the fields of anatomy, physiology, neurology, op- tics, thermodynamics, hydrodynamics, electrodynamics, acoustics, geometry, and philoso- phy.41
Helmholtz carried out rigorous investigations into the nature of musical sound, supported by an extensive set of experiments. In 1863 he published his findings in a seminal work Die Lehre von den Tonempfindungen, translated by Alexander John Ellis as On the Sensations of Tone
40“Neighborhood”. Wolfram MathWorld™ . 41Turner, “Dictionary of Scientific Biography”.
50 and first printed in English in 1875. The most consequential result of Helmholtz’s work is his psychoacoustic theory of tone consonance that explains harmonious relationships of musical tones, intervals and chords, that also allowed him to make connections between physiological acoustics and the theory and aesthetics of tonal music. Ellis writes in his Translator’s Notice to the second English edition that the book is “a work which all candidates for musical degrees are expected to study.”
In his translation of Helmholtz’s book, Ellis promoted the ‘cent’, a unit for measuring mu- sical intervals that Ellis introduced.42 Cent measurements are clearly marked additions to Helmholtz’s text (Helmholtz himself did not use cents). The cent is widely adopted today, however in this text we will use it sparingly and only as a reference unit.43
Helmholtz’s theory of tone consonance is beautiful in its simplicity. Recall that complex tones are sums of harmonic series of simple tones with sinusoidal waveforms. Two simultaneously sounding simple tones with sufficiently close frequencies produce audible “beats”. The beats are periodic fluctuations in the amplitude of the combined sound, caused by the interference of the two sound waves. The frequency of the beats is equal to the difference of the two frequencies. Beats that are faster than a few cycles per second cause a rough, unpleasant sensation that leads to a perception of tone dissonance. Two or more tones are consonant when the rough sensation is minimized. That is certainly so for the intervals of the harmonic series (see p.46), which harmonics are either coincident or sufficiently separated.
Helmholtz also explained the known phenomenon of combination tones that can often be heard in addition to simultaneously sounding simple tones. When two simple tones of a
42“cent”. Oxford Music Online. 43In my view, the wide acceptance of the cent as a measurement unit, without realizing its full impact, is unfortunate. The cent is 1/100 of the equally tempered semitone. The equally tempered intervals have nice round cent values, and thus may seem somehow more correct or natural than justly tuned intervals which cent values are fractional numbers. Using the cent unit therefore reinforces the notion of certain dominance of the equally tempered tuning over just intonation. Paradoxically, Ellis himself was much in favour of just tuning, as evidenced in his report to the Musical Association of London (1875): “At any rate just intonation, even upon a large scale, is immediately possible. And if I long for the time of its adoption, in the interests of the listener, still more do I long for it in the interests of the composer. What he has done of late years with the rough-and-ready tool of equal temperament is a glorious presage of what he will do in the future with the delicate instrument which acoustical science puts into his hands. The temporary necessity for equal temperament is passing away. Its defects have been proved to be ineradicable, because inherent in the nature of sound. An intonation possessing none of these defects has been scientifically demonstrated. It is feasible now on the three noblest sources of musical sound — the quartet of voices, the quartet of bowed instruments, and the quartet of trombones. The issue is in the hands of the composer.” (Vogel, On the relations of tone, p.9)
51 sufficient intensity are sounded simultaneously, one can hear a third difference tone with a frequency equal to the difference of the two source frequencies and, under certain conditions, possibly several summation tones with frequencies equal to various sums of the two source frequencies. Demonstrably, combination tones do not physically exist outside the ear, but are created by the nonlinear mechanical response of the eardrum and the ossicles in the middle ear. Once created in the ear, the combination tones are indistinguishable from the source tones, contribute to the composite sensation of tone, and are also an additional source of beats.
Several finer points of Helmholtz’s theories have been challenged in relation to various psy- choacoustic phenomena, such as the explanation of the “missing fundamental”.44 Neverthe- less, it can be verified that in reasonable situations and within reasonable ranges of sound frequencies and intensities Helmholtz’s model correlates very well with the results of exper- iments in perception of musical sound.
Helmholtz performed his experiments using mechanical devices. For tone generation he used mechanical sirens: rotating metal discs with concentric rings of equidistant holes that modulated jet streams of air. Two simultaneous tones were sounded by a double siren, in which mechanical gears set the ratio of tone frequencies. To analyze complex tones (to detect the presence of their harmonics), Helmholtz used sets of tuned resonators with protruding necks that could be fitted directly into the ear canal and sealed with wax.45
Experiments Today, most of Helmholtz’s experiments can easily be reproduced with a personal computer, digital audio software, and a pair of quality headphones. The reader is encouraged to conduct the following experiments or, at the least, listen closely to the sound files that are provided on the accompanying web site.
Example 2.4A (audible range): MP3 2.4A CSD 2_4_range_A.csd
44Wightman and Green, “The Perception of Pitch”. 45The mechanical nature (and therefore imperfections) of sound generation by the siren may have been the cause of a rare, interesting mistake in Helmholtz’s text: he thought the upper limit of hearing to be around 40 kHz, twice the actual limit of about 20 kHz. My conjecture is that Helmholtz might have heard a subharmonic frequency of the siren tone.
52 The purpose of this experiment is to test the frequency range of the used audio system (mainly the headphones) combined with the frequency range of the listeners hearing. The ex- ample also demonstrates that the perception of the pitch of simple tones is not precise.
The audio file contains series of simple tones, synthesized as sine waves with plain amplitude envelopes. The series begins with the frequency at the low end of the audible range, and progresses by octaves (by doubling the frequency) toward the high limit of the audible range. Each frequency is played as a sequence of three two seconds long tones: two seconds for the left ear only, two seconds for the right ear, and two seconds for both ears. The frequencies are: 31.25 Hz, 62.5 Hz, 125 Hz, 250 Hz, 500 Hz, 1 kHz, 2 kHz, 4 kHz, 8 kHz, and 16 kHz.
With the possible exception of the lowest (31.25 Hz) and the highest (16 kHz) frequencies, you should clearly hear all tones. The lowest frequency might be less audible due to the naturally lower sensitivity of hearing in the low range, and also due to a possible roll-off in the frequency characteristic of the used audio interface and headphones. With a pair of quality studio headphones, however, even the lowest frequency should be heard or at least “sensed”. If you can hear a tone in one ear but not the other, try to turn the headphones around; that should show whether it is due to a difference in sensitivity between your ears or the headphones.
Whether you are able to hear the 16 kHz tones depends very likely less on the headphones and more on your hearing. Young, healthy ears hear tones of frequencies up to 20 or 21 kHz; with age (and possible hearing damage mostly due to exposure to high intensity sound) the high limit of the hearing range decreases.46 It is possible that one hears another sound when the 16 kHz tones are being played: sound that apparently is of a lower frequency, or a mix of frequencies. That would be most likely some kind of a distortion or cross-modulation in the audio system. If such sound is present and its intensity is not negligible, it is recommended to use a better quality audio system.
You should be able to hear all the frequencies between 62.5 Hz and 8 kHz clearly and with no apparent distortion. It is a necessary condition for success of the experiments that follow. Note that the lower frequencies and the higher frequencies sound less loud than the middle- range frequencies. This is a normal effect that is due to the varied sensitivity of the ear,
46As of the time of writing this text my own hearing range seems to end somewhere between 15 and 16 kHz.
53 known as the Fletcher-Munson curves of equal loudness.47
Observe that in each group of three tones of the same frequency (left ear, right ear, both ears) the perceived pitch is not necessarily the same. The pitch perceived in one ear may be higher than in the other ear, and the perceived pitch of the tone that is played to both ears is then somewhere between the two individual pitches. Listen closely and note which ear hears each frequency as a higher pitch than the other ear. Turn the headphones around and repeat the experiment: does the difference in perceived pitches at all depend on which ear hears the tone first? Return the headphones to their original position and listen again: has the experience changed? I myself perceive the pitch of tones in the low range (below 500 Hz) to differ by about a semitone, consistently higher in one ear (which one, depends on the frequency). From 500 Hz up, the perceived pitch difference is very small but still noticeable, and which ear perceives the pitches higher changes back and forth with repeated listening and changed loudness of the tones.
Example 2.4B (low range): MP3 2.4B CSD 2_4_range_B.csd
This experiment tests the low limit of the audible range in more detail. A series of simple (sinusoidal) tones with frequencies starting from 250 Hz and descending by one third of an octave (the equally tempered major third) is again played for each ear separately and then for both ears together; starting with the right ear for a change. The frequencies are: 250, 198, 157, 125, 99.2, 78.7, 62.5, 49.6, 39.4, 31.3, 24.8, 19.9, and 15.6 Hz.
This time, perhaps because of the melodious relationship among the tones (descending thirds in a regular time pattern) I tend to perceive the first tone (in the right ear) as having the higher pitch in each pair of tones.
Note that there are two competing sensations in the low frequency range: one is that of a pitched tone, the other of “growling”: separate, distinct sound pulses. Helmholtz calls such sensation to be “intermittent”, and a source of unpleasant “sensory roughness”. Below about 60 Hz the intermittent “growling” sensation becomes more dominant than the sensation of the pitch. Below about 30 Hz the sensation of the pitch vanishes, and only the feeling of separate pulses remains.
47“Sound: Dynamic range of the ear”. Encyclopædia Britannica Online.
54 Example 2.4C: MP3 2.4C CSD 2_4_range_C.csd
The high limit of the audible range is tested in this experiment. The series of simple sinusoidal tones starts from 4 kHz and ascends by equally tempered major thirds. The frequencies are: 4.00, 5.04, 6.35, 8.00, 10.1, 12.7, 16.0, and 20.2 kHz. There is no intermittent sensation of any sensory roughness in this range.
Example 2.4D: MP3 2.4D CSD 2_4_range_D.csd
The high limit of hearing is tested even in more detail. A series of simple tones with frequencies ascending from 8 kHz to 18 kHz by 500 Hz steps is played for both ears. Since the hearing sensitivity decreases in high frequency range, you should notice a progressive drop in loudness from about 12 kHz up to the limit of your hearing.
Example 2.5A: MP3 2.5A CSD 2_5_pitch_A.csd
In Example 2.4B we have encountered a case of what might be called a “corrective perception of pitch” (see p.21). That is, in situations where the pitch perception is uncertain or where it is in conflict in what the listener expects (such as a familiar musical context), the perceived pitch differs from the pitch that would be otherwise perceived as a logarithmic function of the given frequency. As a real-life example: the sound of a musical group that plays poorly in tune feels insufferable at first, but it may become to be perceived tolerable as the concert proceeds, since the ear attempts to “correct” the out-of-tune pitches.
In this experiment, a sequence of tones of the same pitch (C4) that alternate between the left and right ears is played. At first, the sequence is played in a slow tempo, with longer silences between the tones. The pitch of the tones may be perceived by each ear as being slightly different. As the speed of the sequence increases and the silences shorten, the pitches get perceptually closer. If the sequence is played again, it may seem that the pitches, as perceived by each ear separately, are the same from the very beginning. This effect could be explained by short-term musical memory: the listener remembers the pitches from the first time, and therefore is more likely to identify them as being identical when the sequence is repeated.
55 On the other hand, if one concentrates on hearing the tones as different in pitch, it is more likely that they may be perceived as such.
Example 2.5B: MP3 2.5B CSD 2_5_pitch_B.csd
In the previous experiment, I perceived the pitch of the tones to be slightly higher in the left ear than in the right ear. I therefore changed the frequency of one of the tones, to make them perceptually of equal pitch. In this example the frequency of the tones played on the
left is 261.6 Hz (C4), and on the right 263.2 Hz (the difference is 9/1000 of an octave).
The reader is encouraged to repeat this experiment individually with the help of the provided Csound source file, and compare the results with mine.
Conclusion: The perception of pitch of simple tones is inaccurate; it depends on the fre- quency range, the loudness of the sound, and other factors. Each ear may perceive the pitch of a tone of the same frequency as being slightly different. Perception of pitch is also influenced by the musical context, the memory of a past experience, or a conscious expectation.
Example 2.6A: MP3 2.6A CSD 2_6_series_A.csd
There are some musical instruments with timbres that are similar to that of a simple tone, with a strong fundamental harmonic and weaker higher harmonics, such as the flute, glass harmonica, or the Fender Rhodes piano (with its amplified sound of metal tines), but most musical instruments produce complex tones that consist of a number of harmonics.
In this example we introduce complex tones with rich harmonic content. The sound example has several parts:
1. Sixteen simple tones which frequencies are related by the harmonic series are played.
The frequency of the lowest tone is 263.2 Hz (the middle C4). 2. A complex tone is built from the same set of simple tones that now function as its harmonics. Note that as the harmonic content changes (new high harmonics are added to it), one can always hear the highest harmonic distinctly (since it is the cause of the most recent difference in the character of the sound), but the lower harmonics do
56 seamlessly blend into the composite sensation of a complex tone. As the number of harmonics increases, the sensation of the complex tone becomes richer. 3. A series of tones that at first are becoming gradually more complex (by more har- monics added to them) and then, in reverse order, less complex (the harmonics being removed) is played. The perceived pitch of all tones remains the same and corresponds to the frequency of the fundamental harmonic, but their timbre changes. The highest harmonics do perceptually stand out because they are the difference between each two consecutive tones, but the rest of harmonics do blend well together.
Figure 2.19: Example 2.6A
Fig. 2.19 shows the sound file analyzed by the Loris library and displayed by the Qvorum software. The horizontal grid lines indicate the pitches of the 12-tet system. Note that the “5-limit” harmonics (1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16) are aligned reasonably closely with the 12-tet grid lines. The higher-limit harmonics (7, 11, 13, 14) do clearly lie outside of the twelve-tone system.
Example 2.6B: MP3 2.6B CSD 2_6_series_B.csd
In this example, a complex tone with the same range of harmonics as the one in Example 2.6a is built, but only from the “5-limit” harmonics. Harmonics 7, 11, and 13 are omitted.
57 This complex tone sounds more harmonious. The included 5-limit harmonics emphasize the intervals of the octave, the fifth, and the third: the intervals that are the building blocks of Western harmony. Including higher-limit harmonics would lead to “extended just intonation”, more complex and unusual tonal systems, as the one employed by Harry Partch in his music.
Figure 2.20: Example 2.6B
Fig. 2.20 (rotated right to save space) shows in detail how the 5-limit harmonics align with the 12-tet grid.
Example 2.7: MP3 2.7 CSD 2_7_complex.csd
The musical etudes that accompany this text (the “C53 Suite”, see p.94) were rendered using several synthesized timbres that are arguably quite harsh in sound. Most of the timbres were constructed by additive synthesis from a required number of harmonics with simple amplitude envelopes and their frequency spectrum was limited by a low-pass filter to avoid aliasing of high frequencies. There is no vibrato, chorus effect, detuning of harmonics, phasing, or other effects that are usually used to make synthesized sound sound smoother and more interesting. The effects were omitted on purpose: our attention ought to be focused on issues of tuning, and for that reason there should be a minimum of sound masking and other distractions.
In this example, five instruments play the same well-known melodic theme. Note that although the pitches of the tones (their frequencies) are exactly the same in each version of the theme, the perceptual results differ depending on the timbre of the instrument:
58 1. Simple sinusoidal tones. The melody is recognizable, but the exact tuning of the pitches seems uncertain. 2. The “flute” timbre with only two harmonics. The pitch of the tones is more determined. 3. The “harmonium” timbre with four harmonics. The pitch and the contour of the melody are now quite certain. 4. The “organ” timbre contains all 3-limit harmonics up to the 16th (1, 2, 3, 4, 6, 8, 9, 12, 16) that emphasize the octaves and fifths. The pitch is well-defined, but also very sensitive to tuning, and some tones in the melody sound slightly out of place. That is because the excerpt is in just 5-limit tuning while the timbre favours Pythagorean tuning (see p.70). 5. The “full” timbre contains all 5-limit harmonics up to the 16th (1, 2, 3, 5, 4, 6, 8, 9, 10, 12, 15, 16). The sound of the instrument is less clear (it has more inner tension) than the “organ” timbre. However, since the timbre contains thirds in addition to octaves and fifths, there is no sensation hinting that any of the tones of the melody would be mistuned as it was in the case of the “organ” timbre.
Example 2.8A - beats of simple tones: MP3 2.8A CSD 2_8_beats_A.csd
The reader is encouraged to conduct this and the following experiment using the pitch spiral tool. A sound file is provided for illustration, but to appreciate the real nature and extent of this sensation requires interactive control of the tones. Becoming familiar with the sensation of beats is most important for apprehension of tone consonance.
Listen to two simple sinusoidal tones. Begin with both tones at the same pitch (for example,
C4), sounding in unison. Keep the pitch of one tone fixed while slowly changing the other. The interference of the waves with close, but unequal frequencies creates beats with frequency that is equal to the difference in frequencies of the two tones. Notice how the beats are slow at first, then increase in speed and eventually morph into an intermittent sensation of sensory roughness that is the strongest around 20–40 Hz. Up to this point of the maximum roughness, the two tones were perceived as a single, perceptually fused pitch. If the frequency difference is increased further, the two pitches become perceptually separated and the strength of the rough sensation begins to decrease. Eventually one begins to hear the difference tone, a tone with frequency equal to the difference of frequencies of the two tones, created by a nonlinear distortion within the middle ear.
59 Notice that, apart from the unison and the octave, no interval can be clearly perceptually determined. A perfectly tuned unison does not beat at all, and therefore even a slight mistuning is easily recognizable. In a mistuned octave one can perceive beats between the lower pitch and the difference tone. The fifths, the fourths, the thirds and so on, whether equally tempered or justly tuned, do not stand out in any distinct way and can be recognized only on the basis of a listener’s long-term pitch memory and therefore with a certain difficulty and within a margin of error.
Example 2.8B - beats of complex tones: MP3 2.8B CSD 2_8_beats_B.csd
Repeat the experiment 2.8A, now with two complex tones of a timbre with richer harmonic content of four to six harmonics. Even though the simple tones of the harmonic series in each complex tones are perceived together as one “fused” sensation of a complex tone, each pair of harmonics does create their own beats.
Since the fundamental harmonics are the strongest, the sensation caused by the beats be- tween them prevails over the sensations caused by beats between other pairs of harmonics and the overall sensory roughness is similar to that of the two simple tones in the previous example. However, the beats between higher harmonics contribute to the composite sensa- tion and do further increase the overall roughness. Whenever any two harmonics coincide in frequency, the beats between them cease. As a result, the overall roughness has distinct “local minima” that clearly mark justly tuned intervals: the octave (for 2-limit timbres with at least two harmonics), the fifth and the fourth (for 3-limit timbres), the thirds and the sixths (5-limit timbres) and, with attentive listening also the seconds, the sevenths, etc.
60 Chapter 3: Octave divisions
We are now familiar with the harmonic series and the sensations of simple and complex tones, and can proceed to investigate how a tone system (a musical scale) can be constructed on the basis of these principles (and as always, the principle of Ockham’s razor).
Octave equivalence
Figure 3.1: Octave equivalence: C2, C3, C4, and C5
The construction of a tonal system begins with an arbitrary choice of a single pitch that serves as the pitch standard and the central point of the system. We will start with C4 (the “middle” C), set to 261.6 Hz (the “Stuttgart”1 pitch).
The octave is an equivalence relation (see p.47). Every given pitch can be replicated by octave transpositions (multiplying and dividing its frequency by two) across the full range of the system. From the original C4 one thus obtains all Cs. Fig. 3.1 shows the pitch spiral
1See the footnote on p.41.
61 Figure 3.2: Octave equivalence with difference tones
with complex tones C2, C3, C4, and C5. Each tone has six harmonics (the “senario”). Note that all harmonics of the tones are perfectly lined up; there are no beats and no sensory roughness caused by them, that would not be already inherently present among the six harmonics of any one of the complex tones. Even when difference tones are considered, as shown in Fig. 3.2, the image is one of calmness.
Division of the octave by cycles of fifths Using the principle of octave equivalence, all pitches with the same name can be generated from a single one from among them. Next, the second most simple and most consonant interval of the harmonic series, the fifth, is used to generate other pitches. The new pitches will then also be replicated by the octave equivalence through the whole pitch range.
Since the fifth is the simplest interval that is not an equivalence, pitches that are related by the interval of the fifth, although distinctly different, are harmonically very closely related (half of the harmonics of the higher tone strengthen those already present in the lower tone, see p.47). The sequence of pitches linked by the interval of the fifth is known as the cycle
62 C C–G C–G–D
C–G–D–A–E C–G–D–A C...E–B MOS: pentatonic scale
C...B–F] C...F]–C] C...C]–G] MOS: heptatonic scale
C . . . A]–E] C . . . G]–D] C...D]–A] MOS: chromatic scale Figure 3.3: The cycle of fifths with moments of symmetry
63 of fifths.2 Fig. 3.3 shows the sequence of screenshots of the pitch spiral that illustrate the construction of the cycle of fifths and its “moments of symmetry” (MOS).
Moments of Symmetry
A moment of symmetry occurs when a tone system has only one or two sizes of steps (intervals between adjacent notes).3,4,5
Among chains of fifths of increasing length, moments of symmetry occur for (see Fig. 3.3):
• The two fifths long chain (three tones, C–G–D): this chain is too short and does not make a usable musical scale. • Four fifths (five tones, C–G–D–A–E): a usable minimal scale; one of the possible pentatonic scales. • Six fifths (seven tones, C–G–D–A–E–B–F]): the most common scale in Western mu- sic, the heptatonic (diatonic) scale. The chain of seven fifths generates the modern Lydian mode, and not the Ionian (major) mode. This will be corrected shortly. • Nine fifths: there is no strong significance of this system, since it is practically the twelve-tone system (below), only with two tones missing. • Eleven fifths (twelve tones, C...E]): The Western chromatic scale is completely sym- metric with only one size of step, from which various combinations of seven tones, i.e. various heptatonic scales, can be drawn.
Just intervals are not compatible: The Pythagorean comma
The remaining, twelfth fifth that would be expected to “close” the cycle (E]–B], B] being enharmonically equivalent to C) does, in fact, generate a new pitch (Fig. 3.4). The small interval between C and B] (B] is higher in pitch than C) is known as the Pythagorean comma; it is the difference between twelve fifths and seven octaves, or the interval that
2In the traditional musical context, the ‘cycle of fifths’ is more often referred to as the ‘circle of fifths’. In this text we prefer the term ‘cycle’ as a “recurring series”, rather than ‘circle’ that might wrongly suggest a geometric shape. 3Schulter, Why are there 12 notes per octave on typical keyboards? 4Wilson, Moments of Symmetry. 5“Moment of symmetry” in this sense does not imply that the scale itself is necessarily symmetric. A musical scale needs at least two sizes of steps to establish a tonal centre. Completely symmetric scales, such as the “augmented” scale of six whole tones (also known as the whole-tone scale) or the “diminished” scale of four semitones alternating with four whole tones (also known as an octatonic scale) are ambivalent in that respect: there is no clear point of origin. On the other hand, a scale with three sizes of steps can be symmetric, but violates the Ockham’s razor principle.
64 remains after tuning five fifths up (5×7 semitones) and seven fourths down (7×5 semitones): 3 5 3 7 312 531441 ( 2 ) × ( 4 ) = 219 = 524288 ≈ 1.14. If the cycle is limited to twelve tones (the original C substituted for the derived B]), the twelfth fifth E]–C is by the Pythagorean comma narrower and is severely mistuned; such out-of-tune interval is called a wolf interval, because it “howls” when played.
Figure 3.4: The Pythagorean comma
Similar situations (other kinds of “commas”) arise when octaves are compared with chains of major thirds (the difference is the “lesser diesis”) or minor thirds (the “greater diesis”), or 81 6 fifths compared with thirds (the syntonic comma, 80 ). The intervals are, in a Pythagorean view, “incompatible” (cf. p.46).
Possible solutions
The problem of incompatible intervals can be alleviated, if not solved, by these means:
1. Avoid the use of wolf intervals; this choice requires that the structure of the music is
6This is related to the fact that every natural number has a unique prime factorization; see the “fundamental theorem of arithmetic”. (“Fundamental theorem of arithmetic”. Wolfram MathWorld™ )
65 kept simple, without the use of chromaticism and modulation to different keys. 2. Make a tuning compromise: intentionally mistune (temper) the justly tuned intervals in order to make the wolf intervals less mistuned. 3. Dynamically adjust the pitch of the twelve tones as required. 4. Use more than twelve pitches in an octave.
We shall briefly demonstrate the principle of the solution 2) and then discuss the solution 4) that is at the core of our method.
Tempering out the Pythagorean comma
The first eleven fifths in the Pythagorean tuning are justly tuned; the remaining, twelfth fifth is narrower by the Pythagorean comma. If each of the eleven fifths is flattened by one twelfth of the Pythagorean comma, then the wolf fifth that is flat by the full comma will 1 1 become also flat only by 1 − 11 × 12 = 12 of the comma, and all twelve fifths will be equally mistuned; but not enough to be considered “wolf” intervals. This principle is the foundation of the twelve-tone equal temperament. Fig. 3.5 shows equal tempering on the pitch spiral (left to right).
Just fifths ⇒ Equally tempered fifths Figure 3.5: Tempering out the Pythagorean comma
Fig. 3.6 shows a justly tuned fifth on the left with its harmonics and difference tones nicely lined up. On the right it shows an equally tempered fifth. Its harmonics are very slightly misaligned; the sensory roughness due to beats is quite acceptable.
The sound of the third is a more serious problem in equal temperament. In the Pythagorean tuning there are really no thirds; rather, a chain of four fifths (e.g. C–G–D–A–E) generates
66 Just fifth Equally tempered fifth Figure 3.6: Just and equally tempered fifths
a ditone C–D–E, an interval that is sharper than the justly tuned third by the syntonic 3 4 1 2 4 81 comma: ( 2 ) × ( 2 ) × 5 = 80 . The equally tempered third is narrower than the Pythagorean ditone, but it is still quite sharp. Fig. 3.7 shows the ditone (left), the just third (centre), and the equally tempered third (right).
Pythagorean ditone Just third Equally tempered third Figure 3.7: Pythagorean, just and equally tempered thirds
Tempering out the syntonic comma
To temper out the syntonic comma of the ditone, each fifth must be flattened by one quarter of the syntonic comma (SC). Fig. 3.8 illustrates the process (left to right).
67 Equally tempered fifths ⇒ Fifths tempered by SC/4 Figure 3.8: Tempering out the syntonic comma
The result is the quarter-comma meantone temperament (Fig. 3.9). Its name is derived from the incidental fact that its major second (whole tone) is the mean between the justly tuned minor and major tones.
The meantone temperament has semitones of different sizes, and it also contains wolf in- tervals. As a result, some keys sound very well in meantone temperament, mainly because their important major triads (tonic, subdominant, dominant) contain properly tuned major thirds and therefore have a “sweet” sound. Other keys sound more rough, and some keys are not usable at all.
Figure 3.9: The meantone temperament (unequal semitones)
68 Temperaments — how do they sound? The following set of examples demonstrates the impact of the necessary tuning compromises in the four discussed tuning systems:
• The Pythagorean tuning: eleven pure fifths and one wolf fifth; the quality of the rest of the intervals is incidental. Pythagorean tuning is not adequate for Western tertian harmony, since it contains ditones rather than thirds.7 • Just tuning: all steps in a scale are tuned to frequency ratios of small numbers. In particular, the tonic, subdominant, and dominant major triads are tuned perfectly. Just tuning contains the best tuned intervals and chords, but only in one key. • The meantone temperament: the eleven fifths are flattened, each by a quarter of the syntonic comma, and provide nine justly tuned major thirds. The twelfth fifth and the three remaining major thirds are too wide. The meantone temperament is very well usable in several, but not all keys. Each key has some tuning imperfections in different places; the keys are thus not equal and each has a distinct character. • The equal temperament: all twelve fifths are equally narrowed by one twelfth of the Pythagorean comma and all twelve thirds are equally sharp. The equal temperament does not contain any wolf intervals, but besides the octaves there are also no perfectly tuned intervals. All keys are equally usable in equal temperament, and each key has the identical, slightly mistuned character.
Listen to the following sound examples and compare the tunings of the musical excerpts:
Example 3.1 (chromatic scale): MP3 3.1P/J/M/E CSD 3_1_P/J/M/E.csd
The twelve-tone chromatic scale as tuned in (P)ythagorean tuning, (J)ust intonation, (M)eantone temperament, and (E)qual temperament.
Example 3.2 (major scale and triad): MP3 3.2P/J/M/E CSD 3_2_P/J/M/E.csd
The ascending and descending heptatonic major scale, followed by the tonic major triad. The most calmly sounding is the perfectly tuned triad in just tuning, next is the triad of
7To be precise, Pythagorean tuning does contain four good thirds: those generated by the chains of four fifths among which one is the wolf fifth.
69 the meantone temperament (with its just third and a tempered fifth), and then the equal temperament (both the third and the fifth are tempered). The Pythagorean tuning (a just fifth, but a ditone instead of a third) is the least adequate.
Example 3.3 (Prelude in C opening): MP3 3.3XP/J/M/E CSD 3_2_XP/J/M/E.csd
In this example there are five groups of four sound files. The opening four measures of the Prelude (the harmonic progression C–Dm7–G7–C) are played in the original arpeggiated form that is followed by sustained chords, in the four different tunings. The Prelude is first presented in the original key of C, and then transposed to keys F , G, E[, and E (while keeping the tunings set for the key of C).
The Pythagorean tuning is, as could have been expected, not suitable for the Prelude in any of the keys. The Prelude sounds the purest in just tuning, but only in its original key of C (for which this particular instance of just tuning has been adapted). In all other keys to which the Prelude has been transposed the just tuning is unacceptable. In the equal temperament, the Prelude sounds reasonably well (or equally badly, one could say) in all keys.
In the meantone temperament the sound of the Prelude is, perhaps, the most interesting. None of the triads are tuned perfectly, but since the purity of major thirds is more vital than the purity of the fifths, the major thirds give most of the triads a sweet, full character. The scale obtained by meantone temperament is irregular and therefore different triads are mistuned differently, which gives them individual character. However, the five wolf thirds and the wolf fifth are badly out of tune and therefore the keys in which they would play a fundamental role are unusable. In this example, the key of C is the best in tune, the keys F , G, and Eb are quite acceptable with an “interesting” sound, and the key of E cannot be used at all.
Pythagorean (the fifth) tuning We have shown how the Pythagorean tuning generates a twelve-tone scale, by tuning a chain of eleven fifths (p.62).The first six fifths starting from C generated a heptatonic scale:
C - G - D - A - E - B - F#
70 Sorted with the help of octave equivalence and assuming that the starting tone C is the tonic, that created a diatonic scale in the Lydian mode: CDEF] GAB.
The most common major (Ionian) mode can be tuned, with C as the tonic, by tuning one fifth down (the subdominant F ) and five fifths up (the dominant G and the sequence of secondary dominants):
F- C-G-D-A-E-B
The C major scale (CDEFGAB) that is thus obtained corresponds to the seven white keys on the piano keyboard. To tune the notes on the five black keys, the chain is to be extended by five more fifths, both down and up. That gives two sets of altered, enharmonic notes:
Gb-Db-Ab-Eb-Bb -F- C-G-D-A-E-B- F#-C#-G#-D#-A#
In Pythagorean tuning, the “enharmonically equivalent” pitches are not the same; they differ by the Pythagorean comma. Each such two pitches, for example, G[ and F], are separated by a chain of twelve fifths. To reach F] from G[ requires tuning by twelve fifths up and 3 12 1 7 312 531441 seven octaves down: ( 2 ) × ( 2 ) = 219 = 524288 > 1. In Pythagorean tuning, the “sharp” notes are higher in pitch than the enharmonic “flat” notes. To make the available fifths all pure, each black key on the keyboard would have to be split in two and the system would have seventeen, and not just twelve, tones in an octave.
Just tuning A justly tuned (5-limit) system is created by tuning both fifths and thirds to their just frequency ratios, 3:2 and 5:4. The C major scale is generated, starting from C, by tuning the tonic, subdominant, and dominant major triads, C–E–G, F –A–C, and G–B–D:
F-A- C-E- G-B-D 4:3 5:3 1:1 5:4 3:2 15:8 9:8
It is more intuitive to reason about just tuning using the tuning lattice. The tuning lattice is a tool commonly used by microtonal music theorists. It was introduced by Leonhard Euler.8
8Euler, Tentamen novae theoriae musicae, “De genere diatonico-chromatico,” p.147.
71 Fig. 3.10 shows a modern version of the lattice with justly tuned C major scale, the five “sharp” and five “flat” accidentals, to a total of seventeen pitches.
The horizontal lines indicate tuning by fifths; each horizontal leg corresponds to tuning up by a fifth, in the left-to-right direction. Each leg of the vertical lines that are slanted to the right corresponds to tuning up by a major third, in the bottom-left to top-right direction, and the remaining lines correspond to tuning up by minor thirds, top-left to bottom-right. The lines of the fifths and major thirds are the two coordinate axes of the two-dimensional tuning system.
The equilateral upward and downward pointing triangles on the lattice correspond to major and minor triads, respectively. Fig. 3.10 clearly shows that the C major diatonic scale was constructed by first tuning the tonic, subdominant, and dominant triad, and then the accidental pitches were added by tuning additional fifths and/or major thirds. Enharmonic pairs of notes are connected by chains of three major thirds and differ in pitch by the lesser 5 3 1 125 diesis (( 4 ) × 2 = 128 < 1). In just tuning, the “sharp” notes are lower in pitch than the “flat” notes (the opposite of their relation in the Pythagorean tuning).