A SPECTROGRAPHIC ANALYSIS OF VOCAL TECHNIQUES IN FOR MUSICOLOGICAL ANALYSIS

1 Eric Smialek (1, 2) Philippe Depalle (2, 3) David Brackett (1) CIRMMT, McGill University, Montréal, QC, Canada

ABSTRACT1 Western notation are obvious [10]. With this in mind, the study of extreme metal presents something of a

Extreme metal genres such as and black disciplinary middleground in its foundations in popular metal force analysts to seek alternative methods to music studies and its incompatibility with Western Figure 1. Some basic distinctions between extreme metal vocal techniques as demonstrated by a volunteer extreme metal Western notation-based analysis, especially when one notation. Because extreme metal vocals place greater vocalist. asks what means of expression their vocalists may draw importance on the timbral variations of vowels than on demonstrate these two types of voice. His inhaled vocals , a sub-genre where a generally higher, from in order to seem convincing and powerful to fans. pitch and harmony, methods of analysis that rely on are noticeably more agile with spectral energy raspier voice predominates, vocalists will tend to raise Using spectrograms generated by AudioSculpt, a conventional notation do not reveal much information concentrated around a single focal point in the 1500– the frequency of their vowel formants past the levels that powerful sound analysis, processing, and re-synthesis about their expressive screams. Extreme metal vocals 2000Hz range (here and throughout this paper, the would ordinarily correspond to their lyrics when spoken. program, this paper demonstrates a mixed application of thus provide an opportunity to analyze the role and horizontal grid is set to increments of 500Hz). In One way to think of each of these techniques, especially spectrograms and conventional music analysis to vocals possibilities of spectrographic technology to reveal contrast to that highly mobile focal point, the exhaled those that lengthen the vocal tract, is to recognize that in two separate contexts: an a cappella recording in a information about musical expression in a genre of vocals maintain a more balanced spectral distribution of the vocalists are physically mimicking the physiological soundproof laboratory and a commercial recording with music for which little to no analytical methods have energy in the lower register of the spectrogram, moving properties of a large beast: the ventricular folds produce a full band. The results support an argument for the been established. Using real-time spectrographic in comparatively small increments from one vowel to the a spectral spread of energy that suggests an inhuman utility of spectrograms in revealing articulations and displays, this paper will demonstrate how spectrograms next in an almost mechanical fashion. sound source and the lengthening of the vocal tract expressive nuances within extreme metal vocals that can be useful research tools for scholars who require or exaggerates the size of that source. have thus far passed unnoticed in popular music seek alternatives to notation-based analysis. 2.2. Vocal Tract Alterations that Imitate Inhuman scholarship. Sound Sources 3. SIGNAL PROCESSING AND RESYNTHESIS 2. THE EXTREME METAL VOICE USING AUDIOSCULPT 1. INTRODUCTION Within these two basic categories of voice production, vocalists may alter the length and shape of In order to investigate the aforementioned features of the Popular music scholars have long insisted that details of 2.1. Basic Aspects of Vocal Production their vocal tract to modify their voices in a wide variety metal voice and their roles in music pieces, we decided musical sound such as rhythmic and melodic inflections of expressive ways. To lengthen their vocal tract and not only to perform spectrographic analyses, but also to or timbral characteristics are central in importance to To produce the vocal sounds characteristic of death metal and black metal (as well as related sub-genres), produce a lower sounding voice, vocalists may simply extract parameters that allow for a partial resynthesis of popular musicians and their audiences [1, 2, 3, 4]. lower their chin or conversely angle their chin upwards key sound components in order to validate our Accordingly, some of the leading popular music scholars extreme metal vocalists pass air through the ventricular (or “false vocal chords”) located a few millimetres to shorten the vocal tract length for a higher scream. It is assumptions. such as Richard Middleton [1] and Philip Tagg [2] have folds above the vocal folds (see [12] for anatomical details). also possible to change the vocal tract length by raising To explain, the following is a brief description of the expressed criticism towards analyses of popular music or lowering the larynx (or voice box), a process which basic principles underlying AudioSculpt functionalities: that overlook these features, often due to a reliance on This allows extreme metal vocalists to achieve the large spectral spread of energy visible in spectrograms.2 happens automatically in conjunction with the rounded AudioSculpt generates a short-term Fourier transform Western music notation. One alternative is to visually or spread lip shapes we use to articulate different vowels (STFT) representation of the sound.1 In order to represent sound through spectrograms, and it has now Extreme metal screams can be performed by either inhaling or exhaling, resulting in two very distinct styles (see Figure 2) [12]. optimally display the sonogram image, STFT parameters been nearly thirty years since Robert Cogan made his must be chosen including analysis window size and strong case for their utility to model musical sound in a of screaming. The different directions of air flow can be thought of as akin to the linguistic distinction made shape, step size (that determines how the STFT fits over way that can account for melodic, rhythmic, or timbral the signal characteristics), and fast Fourier transform nuances [5]. Since then, despite the importance popular between voiced and unvoiced methods of articulating consonants: when performing exhaled vocals, one’s (FFT) sampling. For each sound example in this study, music scholars have placed in these features, they have we used a Blackman window of size M=2400 with a rarely used spectrograms to study them. In the few larynx vibrates, indicating that the vocal cords are — — step size of M/8 and a number of channels equal to instances where they have appeared, such as [6], vibrating rather forcefully whereas this vibration does not occur with inhaled vocals. This basic difference 4096. Once a satisfactory image is in view, it is ready to reviewers have argued that spectrograms are superfluous be analyzed. has a profound effect on the overall sound quality Figure 2. George “Corpsegrinder” Fisher of Cannibal or have voiced a general distrust of analytical Qualitative judgments can be made based on the produced, the ease with which different phonetic Corpse rounds his lips to lengthen his vocal tract (left); technology such as spectrum photography or digital obtained spectrographic image but more precise signal processing of spectral imagery [7, 8, 9]. articulations can be made, the ability for a vocalist to (Daniel Lloyd Davey) of spreads his lips to make his vocal tract shorter (right). measurements of vowel formant quality are also taken Such a distrust of musical spectrograms has not, of sustain a long scream, and the degree of strain put on the voice. by first synthesizing a new sound file based on the course, extended to fields such as electroacoustic This last point raises what is perhaps the most original sound’s Figure 1 demonstrates some of the acoustical vowel formants using one of composition and analysis where there exists an important yet subtle technique that extreme metal AudioSculpt’s several spectral gain filters (in each case epistemological tradition that has long supported the use differences between inhaled and exhaled vocals. Here, a vocalists have at their disposal, the manipulation of volunteer extreme metal vocalist was asked to freely during this study, the pencil filter tool). These filters of music technology and where the limitations of vowel formants. These vocalists will frequently sacrifice change the gain of certain frequency regions defined by the intelligibility of lyrical content in order to exaggerate the user within the resynthesized signal. For this study, This research was greatly facilitated through funds from the the sense of “heaviness” that can be perceived with Social Sciences and Humanities Research Council (SSHRC) especially low vowel formants in death metal vocals. In 1 and a CIRMMT Student Award. 2 The technical information offered here is based on [11]. 1 (1): Musicology Area, (2): Centre for Interdisciplinary A similar process is used in Mongolian throat singing Research in Music Media and Technology (CIRMMT), (3): where singing voice formants are used to convey melodic Music Technology Area. information [13]. _88 _89

A SPECTROGRAPHIC ANALYSIS OF VOCAL TECHNIQUES IN EXTREME METAL FOR MUSICOLOGICAL ANALYSIS

1 Eric Smialek (1, 2) Philippe Depalle (2, 3) David Brackett (1) CIRMMT, McGill University, Montréal, QC, Canada

ABSTRACT1 Western notation are obvious [10]. With this in mind, the study of extreme metal presents something of a

Extreme metal genres such as death metal and black disciplinary middleground in its foundations in popular metal force music analysts to seek alternative methods to music studies and its incompatibility with Western Figure 1. Some basic distinctions between extreme metal vocal techniques as demonstrated by a volunteer extreme metal Western notation-based analysis, especially when one notation. Because extreme metal vocals place greater vocalist. asks what means of expression their vocalists may draw importance on the timbral variations of vowels than on demonstrate these two types of voice. His inhaled vocals black metal, a sub-genre where a generally higher, from in order to seem convincing and powerful to fans. pitch and harmony, methods of analysis that rely on are noticeably more agile with spectral energy raspier voice predominates, vocalists will tend to raise Using spectrograms generated by AudioSculpt, a conventional notation do not reveal much information concentrated around a single focal point in the 1500– the frequency of their vowel formants past the levels that powerful sound analysis, processing, and re-synthesis about their expressive screams. Extreme metal vocals 2000Hz range (here and throughout this paper, the would ordinarily correspond to their lyrics when spoken. program, this paper demonstrates a mixed application of thus provide an opportunity to analyze the role and horizontal grid is set to increments of 500Hz). In One way to think of each of these techniques, especially spectrograms and conventional music analysis to vocals possibilities of spectrographic technology to reveal contrast to that highly mobile focal point, the exhaled those that lengthen the vocal tract, is to recognize that in two separate contexts: an a cappella recording in a information about musical expression in a genre of vocals maintain a more balanced spectral distribution of the vocalists are physically mimicking the physiological soundproof laboratory and a commercial recording with music for which little to no analytical methods have energy in the lower register of the spectrogram, moving properties of a large beast: the ventricular folds produce a full band. The results support an argument for the been established. Using real-time spectrographic in comparatively small increments from one vowel to the a spectral spread of energy that suggests an inhuman utility of spectrograms in revealing articulations and displays, this paper will demonstrate how spectrograms next in an almost mechanical fashion. sound source and the lengthening of the vocal tract expressive nuances within extreme metal vocals that can be useful research tools for scholars who require or exaggerates the size of that source. have thus far passed unnoticed in popular music seek alternatives to notation-based analysis. 2.2. Vocal Tract Alterations that Imitate Inhuman scholarship. Sound Sources 3. SIGNAL PROCESSING AND RESYNTHESIS 2. THE EXTREME METAL VOICE USING AUDIOSCULPT 1. INTRODUCTION Within these two basic categories of voice production, vocalists may alter the length and shape of In order to investigate the aforementioned features of the Popular music scholars have long insisted that details of 2.1. Basic Aspects of Vocal Production their vocal tract to modify their voices in a wide variety metal voice and their roles in music pieces, we decided musical sound such as rhythmic and melodic inflections of expressive ways. To lengthen their vocal tract and not only to perform spectrographic analyses, but also to or timbral characteristics are central in importance to To produce the vocal sounds characteristic of death metal and black metal (as well as related sub-genres), produce a lower sounding voice, vocalists may simply extract parameters that allow for a partial resynthesis of popular musicians and their audiences [1, 2, 3, 4]. lower their chin or conversely angle their chin upwards key sound components in order to validate our Accordingly, some of the leading popular music scholars extreme metal vocalists pass air through the ventricular (or “false vocal chords”) located a few millimetres to shorten the vocal tract length for a higher scream. It is assumptions. such as Richard Middleton [1] and Philip Tagg [2] have folds above the vocal folds (see [12] for anatomical details). also possible to change the vocal tract length by raising To explain, the following is a brief description of the expressed criticism towards analyses of popular music or lowering the larynx (or voice box), a process which basic principles underlying AudioSculpt functionalities: that overlook these features, often due to a reliance on This allows extreme metal vocalists to achieve the large spectral spread of energy visible in spectrograms.2 happens automatically in conjunction with the rounded AudioSculpt generates a short-term Fourier transform Western music notation. One alternative is to visually or spread lip shapes we use to articulate different vowels (STFT) representation of the sound.1 In order to represent sound through spectrograms, and it has now Extreme metal screams can be performed by either inhaling or exhaling, resulting in two very distinct styles (see Figure 2) [12]. optimally display the sonogram image, STFT parameters been nearly thirty years since Robert Cogan made his must be chosen including analysis window size and strong case for their utility to model musical sound in a of screaming. The different directions of air flow can be thought of as akin to the linguistic distinction made shape, step size (that determines how the STFT fits over way that can account for melodic, rhythmic, or timbral the signal characteristics), and fast Fourier transform nuances [5]. Since then, despite the importance popular between voiced and unvoiced methods of articulating consonants: when performing exhaled vocals, one’s (FFT) sampling. For each sound example in this study, music scholars have placed in these features, they have we used a Blackman window of size M=2400 with a rarely used spectrograms to study them. In the few larynx vibrates, indicating that the vocal cords are — — step size of M/8 and a number of channels equal to instances where they have appeared, such as [6], vibrating rather forcefully whereas this vibration does not occur with inhaled vocals. This basic difference 4096. Once a satisfactory image is in view, it is ready to reviewers have argued that spectrograms are superfluous be analyzed. has a profound effect on the overall sound quality Figure 2. George “Corpsegrinder” Fisher of Cannibal or have voiced a general distrust of analytical Qualitative judgments can be made based on the produced, the ease with which different phonetic Corpse rounds his lips to lengthen his vocal tract (left); technology such as spectrum photography or digital obtained spectrographic image but more precise signal processing of spectral imagery [7, 8, 9]. articulations can be made, the ability for a vocalist to Dani Filth (Daniel Lloyd Davey) of Cradle of Filth spreads his lips to make his vocal tract shorter (right). measurements of vowel formant quality are also taken Such a distrust of musical spectrograms has not, of sustain a long scream, and the degree of strain put on the voice. by first synthesizing a new sound file based on the course, extended to fields such as electroacoustic This last point raises what is perhaps the most original sound’s Figure 1 demonstrates some of the acoustical vowel formants using one of composition and analysis where there exists an important yet subtle technique that extreme metal AudioSculpt’s several spectral gain filters (in each case epistemological tradition that has long supported the use differences between inhaled and exhaled vocals. Here, a vocalists have at their disposal, the manipulation of volunteer extreme metal vocalist was asked to freely during this study, the pencil filter tool). These filters of music technology and where the limitations of vowel formants. These vocalists will frequently sacrifice change the gain of certain frequency regions defined by the intelligibility of lyrical content in order to exaggerate the user within the resynthesized signal. For this study, This research was greatly facilitated through funds from the the sense of “heaviness” that can be perceived with Social Sciences and Humanities Research Council (SSHRC) especially low vowel formants in death metal vocals. In 1 and a CIRMMT Student Award. 2 The technical information offered here is based on [11]. 1 (1): Musicology Area, (2): Centre for Interdisciplinary A similar process is used in Mongolian throat singing Research in Music Media and Technology (CIRMMT), (3): where singing voice formants are used to convey melodic Music Technology Area. information [13]. _88 _89

the regions to be filtered were defined according to a frequently invoked in discussions of basic musical paradigmatic analysis1 of the recording sample based quicker sixteenth-note rhythms that occur as variations computer-based analysis of the original sound file’s composition: the improvisation shows a clear binary primarily on rhythmic identities and secondarily on in measures 3 and 6 are possible because the alveolar partials. Specifically, AudioSculpt’s “Partial Tracking division into two roughly equal sections of music, phoneme content. Although somewhat similar to a closure made by the tongue when forming the consonant Analysis” feature was used to track partials using separated by a sudden change in dynamics (see the musical score, the chart parses the music into segments, /d/ can be executed quickly; similarly, the quick eighth- multiple breakpoint functions for each sinusoidal dynamic markings at measure 7); each half contains a usually spanning one measure each (measures are note alternations between /ɚ/ (the r-coloured vowel in component (a procedure that works with inharmonic recurring, slightly varying rhythm (labelled x and y) that indicated by circled numbers; segments spanning two or “sure”) and /i/ (as in “heed”) in measure 12 are easily signals as well as harmonic ones). A new sound was then generally does not occur in the other half; and, following more measures are enclosed within square brackets), and performed because they do not require elaborate jaw created by amplifying the formant regions of the original the contours visible in the spectrogram, both divisions distributes them horizontally to show difference and motions, only quick shifts between lip and tongue signal. exhibit an arch-like quality of intensification whereby vertically to indicate similarity. Musical time flows positions.2 AudioSculpt’s diapason tool was then used to take the vocalist’s screams steadily increase in volume and downwards on the chart so that as new segments of frequency readings from the synthesized partials. When high spectral frequencies (measures 2–4, 9–11) before music occur in the recording, they appear in the chart 4.2.2. The Importance of /ɹ / Sounds the tool is pointed at a particular place on the sonogram, rapidly calming with quieter, lower vowels (measures 6– beneath one another whenever a portion of their rhythms This last point raises one of the most conspicuous its frequency and amplitude are displayed, a 8, 12–15). The total result is a sample of improvised matches other segments (e.g. measures 2–6 and 11–12 consistencies observable throughout the recording corresponding sine tone sounds, and a two-dimensional extreme metal vocals which demonstrates controlled have common rhythms for beats 3 and 4). Thus, a sample. The vocalist very frequently alternates between spectral slice of the synthesized signal appears. This musical techniques in a regular manner, one that could continuously repeating piece of music would appear as a /ɚ/ and /i/ sounds with both inhaled and exhaled vocals. procedure allows for an analysis of the original sound by be heard as loosely narrative or rhetorical in the sense single column while one that continuously introduced These phonemes occur for nearly all the articulations hypothesizing that certain components of it appear to be that it creates regularly spaced climaxes, moments of new material would have its segments listed diagonally. contained within the solid box given in the left-centre important, resynthesizing those components, and closely repose, and gradual variations. If there is a clear sense of To add more columns of rhythmic identity and to column and, in the case of the one exception, the /iɚ/ analyzing them. musicality to be found here, what can be inferred about avoid cluttering them, the same segmented unit combination is merely displaced by a beat, and this is the vocal techniques used to achieve it? sometimes appears more than once. Such is the case marked by the arrow in the left-centre column. The 4. A RECORDED IMPROVISATION with measures 1–2, which is shown as a single segment, versatile rhotic (“r”) of Standard North American 4.1.2. Inhaled vs. Exhaled Vocals as well as measures 5, 9, 11, and 12 where each of these In order to investigate the features of extreme metal English is especially worthy of emphasis here for both segments reappear in new columns to the right. Inhaled voices presented in section 2 in a more musical context, Inhaled vocals in Figure A1 are shown by boxes around the acoustical and physiological advantages it offers the and exhaled vocals are indicated by different shades of we asked a volunteer vocalist to perform a vocal the notation below the spectrogram, indicating that vocalist. This sound does not require a stoppage in air grey horizontal boxes but not in the case of reappearing improvisation using whatever extreme metal vocal exhaled and inhaled vocals are employed about equally flow so it can be used as a vowel (e.g. /ɚ/ as in “fir”) or segments. Consequently, the chart can be followed in techniques he wished. Given the somewhat artificial during the improvisation. Though aurally distinguishable as a consonant (e.g. /ɹ / as in “rapid”). As a result, it is real time along with the recording by reading only the performance environment—the room was virtually from exhales to those familiar with extreme metal especially suited to the physiological difficulties of shaded segments, proceeding downwards from the top- devoid of reverberation and there were no instruments to vocals, the distinguishing acoustical characteristics of articulating consonants with inhaled vocals (note again left to the bottom-right corners. To draw attention to accompany the vocalist—the volunteer vocalist’s the inhaled vocals are not always immediately apparent that nearly all of the other consonants are exhaled). 1 areas of greatest interest, solid boxes outline regions that improvisation shown in Figure A1 (see URL ), if not an from the spectrogram (at least at the resolution given in Because it can be articulated with a continuous air flow, show both identical rhythms and noticeably repeated exact indicator of the performance decisions a vocalist the example). What can be readily seen is a clear the /ɹ / can be used to slightly alter vowels. Specifically, phonemes while dashed boxes indicate rhythmic might make in a concert setting or full-band recording, difference in the highest regions of spectral energy when a vowel is rhotacized, i.e. coloured by an /ɹ /, the identities with little phonemic similarity. can be considered an accurate reflection of the kinds of reached in measures 4–5 and 10–11. None of the third formant becomes lowered [15]. possible performance choices that are using inhaled and exhaled portions of the improvisation come close to this 4.2.1. Some Rhythmic and Phonetic Motives Even if this third-formant lowering is not as directly exhaled voices and the variations in formant frequency region reaching upwards of 4000Hz, a circumstance tied to an impression of heaviness as the lowering of the available with different vowel combinations. which indicates that only inhaled vocals allow the The far-left column brings into relief how the vocalist first two formants, it nevertheless provides a way for the vocalist to achieve the very wide spectral spread of created a series of variations during the first half of the vocalist to create variety and, on a social-perceptual note 4.1. A Spectrographically-Informed Music Analysis energy characteristic of these climactic moments in his improvisation (measures 1–6). Here, he has clearly with regards to paralanguage, it seems more than a improvisation. followed a pattern of varying his rhythms during beats 1 coincidence that the /ɹ / sound is often used to imitate In addition to the spectrogram, Figure A1 also contains 3 The vocalist also appears to have reserved at least and 2 while treating beats 3 and 4 as a recurring the snarls of wild beasts. Having drawn a number of phonetic and rhythmic transcriptions as well as one of the vowels with especially high formant rhythmic motive (shown by the large vertical rectangle inferences as to why certain patterns appeared in the analytical annotations provided at the bottom of the frequencies for his inhaled voice. /æ/ (as in “had”), the around the common rhythms in the far left column). improvisation, the strongest and most basic point here is figure. As indicated by the rhythms given below the vowel most consistently present during these moments Indeed, this recurring rhythm coincides with each of the that there exists a consistency to the vocalist’s use of spectrogram, the vocals fit neatly into a regular 4/4 of intense high spectral energy and a vowel with one of increases in spectral bandwidth that generate the arch- particular phonemes in such a way that they seem meter, indicating that the vocalist kept a regular pulse in the highest first formant frequencies, only occurs once as like pattern visible in the first half of Figure A1. fundamental to the most salient musical features of the marked contrast to his earlier performances discussed an exhale (measure 2). Lastly, it seems that nearly all the On occasion, the vocalist has punctuated the last beat improvisation. above. Partly because of this rhythmic regularity, the consonants with the exception of / ɹ / (and /p/ in measure of the measure with /ʊ/ (as in “hook”, marked to the improvisation comes across as surprisingly organized, as 3) were reserved for exhaled vocals. Here, it would seem right of the column with arrows), a gesture that in each though it had been deliberately crafted. that the closures of the vocal tract necessary to produce case requires an elaborate motion of closing the jaw and some consonants prove too awkward to execute with the quickly rounding the lips when moving to /u/ (as in 4.1.1. An Impression of Tight Control steady “sucking” air flow used in inhaled vocals. “who”) from /a/ (as in “father”) or the similarly 2 The alveolar ridge is the sloping region located by the This impression likely results from several musical articulated /æ/ (as in “had”). It makes sense then that upper jaw’s front teeth. One’s tongue quickly touches features exhibited by the improvisation that are 4.2. A Paradigmatic Analysis of the Improvisation such an elaborate physical motion would be reserved for this plain to stop the flow of air through the vocal tract a longer sustained rhythmic value and would punctuate when producing the consonant /d/ [15]. In order to arrive at an especially clear demonstration of 3 the final accent in a rhythmic motive. By contrast, the Paralanguage can roughly be understood to include all 1 the primary role that certain phoneme combinations An online appendix that includes larger images such as forms of non-verbal communication. These can include played in the improvisation, Figure A2 provides a Figure A1 can be accessed at facial expressions, vocal utterances such as grunts and http://www.music.mcgill.ca/~depalle/ICMC2012/ICMC2 1 One of the clearest introductions to paradigmatic sighs, as well as prosaic and timbral modifications to 012Smialek.htm. analysis available can be found in [14]. ordinary speech that shade meaning [16].

_90 _91

the regions to be filtered were defined according to a frequently invoked in discussions of basic musical paradigmatic analysis1 of the recording sample based quicker sixteenth-note rhythms that occur as variations computer-based analysis of the original sound file’s composition: the improvisation shows a clear binary primarily on rhythmic identities and secondarily on in measures 3 and 6 are possible because the alveolar partials. Specifically, AudioSculpt’s “Partial Tracking division into two roughly equal sections of music, phoneme content. Although somewhat similar to a closure made by the tongue when forming the consonant Analysis” feature was used to track partials using separated by a sudden change in dynamics (see the musical score, the chart parses the music into segments, /d/ can be executed quickly; similarly, the quick eighth- multiple breakpoint functions for each sinusoidal dynamic markings at measure 7); each half contains a usually spanning one measure each (measures are note alternations between /ɚ/ (the r-coloured vowel in component (a procedure that works with inharmonic recurring, slightly varying rhythm (labelled x and y) that indicated by circled numbers; segments spanning two or “sure”) and /i/ (as in “heed”) in measure 12 are easily signals as well as harmonic ones). A new sound was then generally does not occur in the other half; and, following more measures are enclosed within square brackets), and performed because they do not require elaborate jaw created by amplifying the formant regions of the original the contours visible in the spectrogram, both divisions distributes them horizontally to show difference and motions, only quick shifts between lip and tongue signal. exhibit an arch-like quality of intensification whereby vertically to indicate similarity. Musical time flows positions.2 AudioSculpt’s diapason tool was then used to take the vocalist’s screams steadily increase in volume and downwards on the chart so that as new segments of frequency readings from the synthesized partials. When high spectral frequencies (measures 2–4, 9–11) before music occur in the recording, they appear in the chart 4.2.2. The Importance of /ɹ / Sounds the tool is pointed at a particular place on the sonogram, rapidly calming with quieter, lower vowels (measures 6– beneath one another whenever a portion of their rhythms This last point raises one of the most conspicuous its frequency and amplitude are displayed, a 8, 12–15). The total result is a sample of improvised matches other segments (e.g. measures 2–6 and 11–12 consistencies observable throughout the recording corresponding sine tone sounds, and a two-dimensional extreme metal vocals which demonstrates controlled have common rhythms for beats 3 and 4). Thus, a sample. The vocalist very frequently alternates between spectral slice of the synthesized signal appears. This musical techniques in a regular manner, one that could continuously repeating piece of music would appear as a /ɚ/ and /i/ sounds with both inhaled and exhaled vocals. procedure allows for an analysis of the original sound by be heard as loosely narrative or rhetorical in the sense single column while one that continuously introduced These phonemes occur for nearly all the articulations hypothesizing that certain components of it appear to be that it creates regularly spaced climaxes, moments of new material would have its segments listed diagonally. contained within the solid box given in the left-centre important, resynthesizing those components, and closely repose, and gradual variations. If there is a clear sense of To add more columns of rhythmic identity and to column and, in the case of the one exception, the /iɚ/ analyzing them. musicality to be found here, what can be inferred about avoid cluttering them, the same segmented unit combination is merely displaced by a beat, and this is the vocal techniques used to achieve it? sometimes appears more than once. Such is the case marked by the arrow in the left-centre column. The 4. A RECORDED IMPROVISATION with measures 1–2, which is shown as a single segment, versatile rhotic (“r”) of Standard North American 4.1.2. Inhaled vs. Exhaled Vocals as well as measures 5, 9, 11, and 12 where each of these In order to investigate the features of extreme metal English is especially worthy of emphasis here for both segments reappear in new columns to the right. Inhaled voices presented in section 2 in a more musical context, Inhaled vocals in Figure A1 are shown by boxes around the acoustical and physiological advantages it offers the and exhaled vocals are indicated by different shades of we asked a volunteer vocalist to perform a vocal the notation below the spectrogram, indicating that vocalist. This sound does not require a stoppage in air grey horizontal boxes but not in the case of reappearing improvisation using whatever extreme metal vocal exhaled and inhaled vocals are employed about equally flow so it can be used as a vowel (e.g. /ɚ/ as in “fir”) or segments. Consequently, the chart can be followed in techniques he wished. Given the somewhat artificial during the improvisation. Though aurally distinguishable as a consonant (e.g. /ɹ / as in “rapid”). As a result, it is real time along with the recording by reading only the performance environment—the room was virtually from exhales to those familiar with extreme metal especially suited to the physiological difficulties of shaded segments, proceeding downwards from the top- devoid of reverberation and there were no instruments to vocals, the distinguishing acoustical characteristics of articulating consonants with inhaled vocals (note again left to the bottom-right corners. To draw attention to accompany the vocalist—the volunteer vocalist’s the inhaled vocals are not always immediately apparent that nearly all of the other consonants are exhaled). 1 areas of greatest interest, solid boxes outline regions that improvisation shown in Figure A1 (see URL ), if not an from the spectrogram (at least at the resolution given in Because it can be articulated with a continuous air flow, show both identical rhythms and noticeably repeated exact indicator of the performance decisions a vocalist the example). What can be readily seen is a clear the /ɹ / can be used to slightly alter vowels. Specifically, phonemes while dashed boxes indicate rhythmic might make in a concert setting or full-band recording, difference in the highest regions of spectral energy when a vowel is rhotacized, i.e. coloured by an /ɹ /, the identities with little phonemic similarity. can be considered an accurate reflection of the kinds of reached in measures 4–5 and 10–11. None of the third formant becomes lowered [15]. possible performance choices that are using inhaled and exhaled portions of the improvisation come close to this 4.2.1. Some Rhythmic and Phonetic Motives Even if this third-formant lowering is not as directly exhaled voices and the variations in formant frequency region reaching upwards of 4000Hz, a circumstance tied to an impression of heaviness as the lowering of the available with different vowel combinations. which indicates that only inhaled vocals allow the The far-left column brings into relief how the vocalist first two formants, it nevertheless provides a way for the vocalist to achieve the very wide spectral spread of created a series of variations during the first half of the vocalist to create variety and, on a social-perceptual note 4.1. A Spectrographically-Informed Music Analysis energy characteristic of these climactic moments in his improvisation (measures 1–6). Here, he has clearly with regards to paralanguage, it seems more than a improvisation. followed a pattern of varying his rhythms during beats 1 coincidence that the /ɹ / sound is often used to imitate In addition to the spectrogram, Figure A1 also contains 3 The vocalist also appears to have reserved at least and 2 while treating beats 3 and 4 as a recurring the snarls of wild beasts. Having drawn a number of phonetic and rhythmic transcriptions as well as one of the vowels with especially high formant rhythmic motive (shown by the large vertical rectangle inferences as to why certain patterns appeared in the analytical annotations provided at the bottom of the frequencies for his inhaled voice. /æ/ (as in “had”), the around the common rhythms in the far left column). improvisation, the strongest and most basic point here is figure. As indicated by the rhythms given below the vowel most consistently present during these moments Indeed, this recurring rhythm coincides with each of the that there exists a consistency to the vocalist’s use of spectrogram, the vocals fit neatly into a regular 4/4 of intense high spectral energy and a vowel with one of increases in spectral bandwidth that generate the arch- particular phonemes in such a way that they seem meter, indicating that the vocalist kept a regular pulse in the highest first formant frequencies, only occurs once as like pattern visible in the first half of Figure A1. fundamental to the most salient musical features of the marked contrast to his earlier performances discussed an exhale (measure 2). Lastly, it seems that nearly all the On occasion, the vocalist has punctuated the last beat improvisation. above. Partly because of this rhythmic regularity, the consonants with the exception of / ɹ / (and /p/ in measure of the measure with /ʊ/ (as in “hook”, marked to the improvisation comes across as surprisingly organized, as 3) were reserved for exhaled vocals. Here, it would seem right of the column with arrows), a gesture that in each though it had been deliberately crafted. that the closures of the vocal tract necessary to produce case requires an elaborate motion of closing the jaw and some consonants prove too awkward to execute with the quickly rounding the lips when moving to /u/ (as in 4.1.1. An Impression of Tight Control steady “sucking” air flow used in inhaled vocals. “who”) from /a/ (as in “father”) or the similarly 2 The alveolar ridge is the sloping region located by the This impression likely results from several musical articulated /æ/ (as in “had”). It makes sense then that upper jaw’s front teeth. One’s tongue quickly touches features exhibited by the improvisation that are 4.2. A Paradigmatic Analysis of the Improvisation such an elaborate physical motion would be reserved for this plain to stop the flow of air through the vocal tract a longer sustained rhythmic value and would punctuate when producing the consonant /d/ [15]. In order to arrive at an especially clear demonstration of 3 the final accent in a rhythmic motive. By contrast, the Paralanguage can roughly be understood to include all 1 the primary role that certain phoneme combinations An online appendix that includes larger images such as forms of non-verbal communication. These can include played in the improvisation, Figure A2 provides a Figure A1 can be accessed at facial expressions, vocal utterances such as grunts and http://www.music.mcgill.ca/~depalle/ICMC2012/ICMC2 1 One of the clearest introductions to paradigmatic sighs, as well as prosaic and timbral modifications to 012Smialek.htm. analysis available can be found in [14]. ordinary speech that shade meaning [16].

_90 _91

5. AN EXCERPT FROM “THE VOWEL SONG” is shown in Table 1 (see next page) which compares the assumptions are in no small part the result of deeply interdisciplinary approach to the extreme metal voice, high lead guitar melody with the first formant of each entrenched habits of describing vocal music primarily in the results of this paper support ongoing arguments for Framed as a public service promoting literacy, “The vowel at its point of greatest stability and sustain (the terms of pitched melodies, the extreme metal voice can the musicological utility of spectrograms in drawing Vowel Song” by death metal band Zimmer’s Hole dotted points in Figure 3). Both are shown as frequency serve as an invitation to approach the study of musical attention to subtle means of musical expression that can begins with vocalist Chris Valagao reciting the vowels values in Hz and in terms of pitches that correspond to expression in new ways. Having taken an easily be overlooked. of the alphabet (henceforth “letters” to distinguish from those frequencies. A comparison of these pitch values phonetic vowels) in long sustained screams, harmonized (including their octave position) reveals a striking No. 1 – A No. 2 – E No. 3 – I No. 4 – O No. 5 – U by three guitars in homorhythm (see Figure A3). The relationship. With a margin of about one semitone above Upper Melody (Gtr 1) C6/1050Hz G5/780 Hz Eb 6/1240 Hz C6/1050 Hz Eb 6/1240 Hz slow punctuations of each homorhythmic attack, and below the upper guitar part, the voice’s first formant First Formant C#5/540 Hz G4/380 Hz F5/700 Hz B4/510 Hz E5/650 Hz combining voice, low power chords (shown only as parallels the exact contour of the guitar part. roots in the example), and two harmonized lead guitars, Table 1. A comparison of frequency values between the highest lead guitar’s melody and the first vocal formant in its point not only lend a certain satirical grandiosity to song, they of greatest stability. also help to create the sensation of Valagao’s unpitched screams possessing a kind of melody, drawn by a precise [10] Geslin, Y. and A. Lefevre, “Sound and musical 7. REFERENCES control of formant frequency locations. representation: the Acousmographe software”, Proceedings of the International Computer [1] Middleton, R. Studying popular music. Open 5.1.1. Formants in Flux Music Conference, Miami, USA, 2004. University Press, Philadelphia, 1990. Although it may not be immediately evident in the [11] Bogaards, N. et al., “Sound analysis and [2] Tagg, P., Kojak: 50 seconds of television spectrograms given here, there is quite a great deal of processing with AudioSculpt 2”, Proceedings music. Towards the analysis of affect in variation to the formant frequencies used in the example. of the International Computer Music popular music. Musikvetenskapliga In order to illustrate this variation, Figure 3 plots the Conference, Miami, USA, 2004. position of each letter that Valagao screams within Institutionen, Göteborg, 1979. [12] Sundberg, J., The science of the singing voice. vowel space, i.e. a graph which plots vowels according [3] Tagg, P. and B. Clarida., Ten title tunes: Northern Illinois University Press, Dekalb, to the frequency of the first formant on the y-axis and Towards a musicology of the mass media. Mass 1987. second formant on the x-axis. Of course, some of the Media Music Scholars’ Press. New York and letters Valagao screams are actually diphthongs or Montreal, 2003. [13] Lindestad P.-Å., “Voice Source Characteristics triphthongs. Accordingly the most steady-state vowel in Mongolian ‘Throat Singing’ Studied with [4] Chester, A., “Second thoughts on a rock within each letter is identified with a dot on the graph High-Speed Imaging Technique, Acoustic aesthetic: The Band”, New Left Review, 62 and arrows leading up to or away from it depending on Spectra, and Inverse Filtering”, Journal of (1970): 75–82. how the phonetic transitions occur. To illustrate an Figure 3. Formant positions from the opening of “The Vowel Voice, 15, no. 1 (March 2001): 78–85. example, Valagao’s screamed letter “u,” represented by Song” plotted in vowel space [17]. Each letter that Valagao [5] Cogan, R., New images of musical sound. [14]Agawu, K., “The challenge of musical plot #5, is performed in such a way that it traverses screams is assigned a number. Arrows indicate phonetic Harvard University Press, Cambridge, 1984. vowel space beginning near /ɪ/ (as in “hit”), reaching its transitions before and after a vowel is stabilized and sustained. semiotics”, in Rethinking music, eds. Nicholas most steady-state point near /ɑ/ (as in “harm”), and [6] Brackett, D., Interpreting popular music. 2nd Cook and Mark Everist, 138–160. Oxford finally moving towards the lower formant frequencies in Taking into consideration that there is usually a edition. University of California Press, University Press, New York, 1999. between /ʊ/ and /ɔ/ (as in “hook” and “caught” frequency range of around 100Hz over which each Berkeley and Los Angeles, [1995] 2000. [15]Ladefoged, P and K. Johnson, A course in formant’s energy is significant (the values in Table 1 respectively). It becomes clear how much formant [7] Dibben, N., Review of [6]. Popular Music, 21, phonetics. 6th edition. Wadsworth, Centage, sample the average frequency for the first formant) and variation is involved in this song introduction from only no. 1 (January 2002): 143–45. Boston, [1975] 2011. examining the steady-state plot points, let alone taking that the voice is in rhythmic unison with the guitars, it into account all the quick diphthong transitions indicated does not seem far-fetched for a listener to perceptually [8] Moore, A., Review of [6]. Music & Letters, 77, [16] Poyatos, F., Paralanguage: a linguistic and by the arrows. connect the guitar melody and the formant movements, no. 4 (November 1996): 658–59. interdisciplinary approach to interactive thereby imagining a kind of melodic motion assigned to speech and sound. J. Benjamins, Philadelphia, [9] Huron, D., Review of Empirical musicology, a series of unpitched screams. Even after the 1993. 5.1.2. Interactions between the Voice and Guitar Notes: Quarterly Journal of the Music Library homorhythm breaks off, one can discern further Association, 63, no. 1 (September 2006): 94. If these changes in formant frequency are compared with frequency-oriented connections between the voice and [17] Beskow, J., Formant synthesis demo, the pitch contour of the guitar parts (which move in its surrounding musical contexts. As the vocalist finishes http://www.speech.kth.se/wavesurfer/formant/. parallel motion), a surprising correspondence appears preparing the last letter by screaming the words “and Accessed 17 February 2012. between the guitars’ changes of pitch direction and the sometimes,” a diving upper harmonic merges with his movements of the first formant. As the first letter third formant beginning with the onset of the final letter changes to the next, the lower formant decreases in Y. In this brief intro to “The Vowel Song,” the widely frequency paralleling the descent of the guitars (see the taken for granted division between timbre and pitch “changes of direction” arrows in Figure A3). With the appears especially blurred. next letter, the lower formant reverses direction just as the guitar does. This pattern of alternating upwards and 6. CONCLUSION downwards directions, shared between the guitars and the voice’s first formant, continues until the guitars and Having observed extreme metal vocal techniques in both voice break the homorhythm. What’s more, there is an a relatively controlled recording session and at work in a even stronger correspondence between the highest lead commercial studio recording, it should now be clear that guitar part and the voice’s first formant movements. This the extreme metal voice is far from the simplistic percussive device that it is often assumed to be. If such

_92 _93

5. AN EXCERPT FROM “THE VOWEL SONG” is shown in Table 1 (see next page) which compares the assumptions are in no small part the result of deeply interdisciplinary approach to the extreme metal voice, high lead guitar melody with the first formant of each entrenched habits of describing vocal music primarily in the results of this paper support ongoing arguments for Framed as a public service promoting literacy, “The vowel at its point of greatest stability and sustain (the terms of pitched melodies, the extreme metal voice can the musicological utility of spectrograms in drawing Vowel Song” by death metal band Zimmer’s Hole dotted points in Figure 3). Both are shown as frequency serve as an invitation to approach the study of musical attention to subtle means of musical expression that can begins with vocalist Chris Valagao reciting the vowels values in Hz and in terms of pitches that correspond to expression in new ways. Having taken an easily be overlooked. of the alphabet (henceforth “letters” to distinguish from those frequencies. A comparison of these pitch values phonetic vowels) in long sustained screams, harmonized (including their octave position) reveals a striking No. 1 – A No. 2 – E No. 3 – I No. 4 – O No. 5 – U by three guitars in homorhythm (see Figure A3). The relationship. With a margin of about one semitone above Upper Melody (Gtr 1) C6/1050Hz G5/780 Hz Eb 6/1240 Hz C6/1050 Hz Eb 6/1240 Hz slow punctuations of each homorhythmic attack, and below the upper guitar part, the voice’s first formant First Formant C#5/540 Hz G4/380 Hz F5/700 Hz B4/510 Hz E5/650 Hz combining voice, low power chords (shown only as parallels the exact contour of the guitar part. roots in the example), and two harmonized lead guitars, Table 1. A comparison of frequency values between the highest lead guitar’s melody and the first vocal formant in its point not only lend a certain satirical grandiosity to song, they of greatest stability. also help to create the sensation of Valagao’s unpitched screams possessing a kind of melody, drawn by a precise [10] Geslin, Y. and A. Lefevre, “Sound and musical 7. REFERENCES control of formant frequency locations. representation: the Acousmographe software”, Proceedings of the International Computer [1] Middleton, R. Studying popular music. Open 5.1.1. Formants in Flux Music Conference, Miami, USA, 2004. University Press, Philadelphia, 1990. Although it may not be immediately evident in the [11] Bogaards, N. et al., “Sound analysis and [2] Tagg, P., Kojak: 50 seconds of television spectrograms given here, there is quite a great deal of processing with AudioSculpt 2”, Proceedings music. Towards the analysis of affect in variation to the formant frequencies used in the example. of the International Computer Music popular music. Musikvetenskapliga In order to illustrate this variation, Figure 3 plots the Conference, Miami, USA, 2004. position of each letter that Valagao screams within Institutionen, Göteborg, 1979. [12] Sundberg, J., The science of the singing voice. vowel space, i.e. a graph which plots vowels according [3] Tagg, P. and B. Clarida., Ten title tunes: Northern Illinois University Press, Dekalb, to the frequency of the first formant on the y-axis and Towards a musicology of the mass media. Mass 1987. second formant on the x-axis. Of course, some of the Media Music Scholars’ Press. New York and letters Valagao screams are actually diphthongs or Montreal, 2003. [13] Lindestad P.-Å., “Voice Source Characteristics triphthongs. Accordingly the most steady-state vowel in Mongolian ‘Throat Singing’ Studied with [4] Chester, A., “Second thoughts on a rock within each letter is identified with a dot on the graph High-Speed Imaging Technique, Acoustic aesthetic: The Band”, New Left Review, 62 and arrows leading up to or away from it depending on Spectra, and Inverse Filtering”, Journal of (1970): 75–82. how the phonetic transitions occur. To illustrate an Figure 3. Formant positions from the opening of “The Vowel Voice, 15, no. 1 (March 2001): 78–85. example, Valagao’s screamed letter “u,” represented by Song” plotted in vowel space [17]. Each letter that Valagao [5] Cogan, R., New images of musical sound. [14]Agawu, K., “The challenge of musical plot #5, is performed in such a way that it traverses screams is assigned a number. Arrows indicate phonetic Harvard University Press, Cambridge, 1984. vowel space beginning near /ɪ/ (as in “hit”), reaching its transitions before and after a vowel is stabilized and sustained. semiotics”, in Rethinking music, eds. Nicholas most steady-state point near /ɑ/ (as in “harm”), and [6] Brackett, D., Interpreting popular music. 2nd Cook and Mark Everist, 138–160. Oxford finally moving towards the lower formant frequencies in Taking into consideration that there is usually a edition. University of California Press, University Press, New York, 1999. between /ʊ/ and /ɔ/ (as in “hook” and “caught” frequency range of around 100Hz over which each Berkeley and Los Angeles, [1995] 2000. [15]Ladefoged, P and K. Johnson, A course in formant’s energy is significant (the values in Table 1 respectively). It becomes clear how much formant [7] Dibben, N., Review of [6]. Popular Music, 21, phonetics. 6th edition. Wadsworth, Centage, sample the average frequency for the first formant) and variation is involved in this song introduction from only no. 1 (January 2002): 143–45. Boston, [1975] 2011. examining the steady-state plot points, let alone taking that the voice is in rhythmic unison with the guitars, it into account all the quick diphthong transitions indicated does not seem far-fetched for a listener to perceptually [8] Moore, A., Review of [6]. Music & Letters, 77, [16] Poyatos, F., Paralanguage: a linguistic and by the arrows. connect the guitar melody and the formant movements, no. 4 (November 1996): 658–59. interdisciplinary approach to interactive thereby imagining a kind of melodic motion assigned to speech and sound. J. Benjamins, Philadelphia, [9] Huron, D., Review of Empirical musicology, a series of unpitched screams. Even after the 1993. 5.1.2. Interactions between the Voice and Guitar Notes: Quarterly Journal of the Music Library homorhythm breaks off, one can discern further Association, 63, no. 1 (September 2006): 94. If these changes in formant frequency are compared with frequency-oriented connections between the voice and [17] Beskow, J., Formant synthesis demo, the pitch contour of the guitar parts (which move in its surrounding musical contexts. As the vocalist finishes http://www.speech.kth.se/wavesurfer/formant/. parallel motion), a surprising correspondence appears preparing the last letter by screaming the words “and Accessed 17 February 2012. between the guitars’ changes of pitch direction and the sometimes,” a diving upper harmonic merges with his movements of the first formant. As the first letter third formant beginning with the onset of the final letter changes to the next, the lower formant decreases in Y. In this brief intro to “The Vowel Song,” the widely frequency paralleling the descent of the guitars (see the taken for granted division between timbre and pitch “changes of direction” arrows in Figure A3). With the appears especially blurred. next letter, the lower formant reverses direction just as the guitar does. This pattern of alternating upwards and 6. CONCLUSION downwards directions, shared between the guitars and the voice’s first formant, continues until the guitars and Having observed extreme metal vocal techniques in both voice break the homorhythm. What’s more, there is an a relatively controlled recording session and at work in a even stronger correspondence between the highest lead commercial studio recording, it should now be clear that guitar part and the voice’s first formant movements. This the extreme metal voice is far from the simplistic percussive device that it is often assumed to be. If such

_92 _93