<<

When the Leading Tone Doesn’t Lead: Musical Qualia in Context

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Claire Arthur, B.Mus., M.A.

Graduate Program in

The Ohio State University

2016

Dissertation Committee:

David Huron, Advisor David Clampitt Anna Gawboy c Copyright by

Claire Arthur

2016 Abstract

An empirical investigation is made of musical qualia in context. Specifically, scale-degree qualia are evaluated in relation to a local harmonic context, and qualia are evaluated in relation to a metrical context. After reviewing some of the philosophical background on qualia, and briefly reviewing some theories of musical qualia, three studies are presented. The first builds on Huron’s (2006) theory of statistical or implicit learning and melodic probability as significant contributors to musical qualia. Prior statistical models of melodic expectation have focused on the distribution of pitches in melodies, or on their first-order likelihoods as predictors of melodic continuation. Since most Western music is non-monophonic, this first study investigates whether melodic probabilities are altered when the underlying harmonic is taken into consideration. This project was carried out by building and analyzing a corpus of containing harmonic analyses. Analysis of the data found that was a significant predictor of scale-degree continuation.

In addition, two experiments were carried out to test the perceptual effects of context on musical qualia. In the first experiment participants rated the perceived qualia of individual scale-degrees following various common four-chord progressions that each ended with a different harmony. While scale-degrees were still shown to elicit relatively stable qualia, there was a significant effect for the role of the local chord context. Importantly, this experiment was carried out using participants both with

ii and without music-theoretic training, supporting the notion that the identification of scale-degrees was not responsible for the evoked qualia. This experiment also partially replicated a component of Krumhansl & Kessler’s (1982) study examining the “goodness of fit” of scale-degrees within a key. However, the authors’ claim that scale-degrees 1, 3, and 5 were best “fitting” due to the tonal stability of the tonic triad could not be fully supported here. In fact, the results from the present study found that the “goodness of fit” effect could perhaps be better explained by other factors.

In the second experiment participants rated the perceived qualia of either com- posed inter-onset patterns or recorded song clips presented in different metrical con- texts. Both inter-onset interval pattern and meter were shown to be significant in-

fluences on judgments of qualia. In addition, was found to be a strong predictor for certain components of qualia.

The overall results from these studies show that musical context is an important contributor to musical qualia, and therefore, while isolated musical events may still be capable of creating relatively “stable” qualia, in real musical contexts these may change dramatically.

iii Acknowledgments

I would first and foremost like to thank my advisor, , for his con- tinuous guidance and support, and for his infectious curiosity and excitement for all things musical. I would also like to thank my fellow CSML colleagues, both past and present, for their feedback, advice, and camaraderie. In addition, many thanks are due to the other members of my committee, Anna Gawboy and David Clampitt, for their insights and encouragement throughout my time at Ohio State. I would also like to acknowledge the support of the Social Sciences Humanities Research Council of Canada, whose funding made possible my final year of study. Finally, I especially need to thank my “better half”, Nat Condit-Schultz, for all of his help with statis- tics and programming, for his unfailing optimism, and above all, for inspiring me to become a better scholar and musician.

iv Vita

2015–2016 ...... Graduate Research Assistant, Ohio State University School of Music 2012-2014 ...... Graduate Teaching Associate, Ohio State University School of Music 2008–2012 ...... Private Piano and In- structor 2008 ...... M.A., Music Theory, University of British Columbia 2004 ...... B.Mus., Music Theory and History, University of Toronto

Publications

Research Publications

Arthur, C., & Huron, D. (2016). The direct rule: Testing a scene analysis interpretation. Musicae Scientiae. Advance online publication. doi: 10.1177/1029864915623093

Devaney, J., Arthur, C., Condit-Schultz, N., & Nisula, K. (2015). Theme and Vari- ations Encodings with Roman Numerals (TAVERN): A new data set for symbolic . In M. Muller & F. Wiering (Eds.), Proceedings of the International Society of Music Information Retrieval (ISMIR) Conference. Malaga, Spain: 728– 734.

Arthur, C. (2014). Does harmony affect scale-degree qualia?: A corpus study in- vestigating the relation of scale-degree and harmonic support. In M.K. Song (Ed.), Proceedings of the 13th International Conference for and Cognition. Seoul, Korea: Yonsei University, 194–196.

v Fields of Study

Major Field: Music Area of Specialization: Music Theory

vi Table of Contents

Page

Abstract ...... ii

Acknowledgments ...... iv

Vita...... v

List of Tables ...... x

List of Figures ...... xi

1. Introduction ...... 1

2. On the Philosophy and Science of Musical Qualia ...... 4

2.1 What Are Qualia? ...... 4 2.2 The Problem Music Raises for the Study of Qualia (and Vice-Versa)6 2.3 Conceptual Knowledge and Qualia ...... 9 2.4 Qualia as Synthesis of Sensory and Cognitive Processing ...... 12 2.5 Introspection, Observation, and Converging Evidence ...... 13 2.6 Why Study Musical Qualia? ...... 14 2.7 Theories of Musical Qualia ...... 16 2.7.1 Scale-degree Qualia and Implicit Learning ...... 17 2.7.2 Rhythm Qualia ...... 21 2.8 Chapter Summary ...... 23

3. A Corpus Study ...... 26

3.1 The Corpus Analysis ...... 29 3.1.1 Overview ...... 29 3.1.2 Sampling ...... 30

vii 3.1.3 Methodology ...... 33 3.2 Evaluating the Models ...... 41 3.2.1 Global Hypothesis Test ...... 41 3.3 Descriptive Statistics ...... 44 3.3.1 Mode Classification ...... 44 3.3.2 Zeroth-order Probabilities ...... 46 3.3.3 First-order Probabilities ...... 49 3.3.4 Change in Melodic Probabilities when Harmony is Considered 52 3.4 Conclusions ...... 58 3.5 Discussion ...... 61

4. A Perceptual Study of Scale-degree in Context ...... 63

4.1 Introduction ...... 63 4.2 Method ...... 66 4.2.1 Participants ...... 69 4.2.2 Stimuli ...... 70 4.2.3 Procedure ...... 72 4.3 Results ...... 75 4.4 Discussion ...... 93

5. Rhythm Qualia ...... 96

5.1 Introduction ...... 96 5.1.1 Background ...... 100 5.2 A Perceptual Study: Rhythm in Context ...... 101 5.2.1 Introduction ...... 101 5.2.2 Method ...... 107 5.2.3 Results and Discussion ...... 110 5.2.4 Post-hoc Exploration of Rhythm Qualia ...... 120 5.3 Chapter Summary ...... 128

6. General Summary ...... 131

6.1 Recapitulation ...... 131 6.2 Discussion ...... 133 6.3 Implications for Music Pedagogy ...... 135 6.4 Areas for Future Research on Musical Qualia ...... 139

Works Cited ...... 141

viii Appendices 150

A. The Corpus Data ...... 150

ix List of Tables

Table Page

3.1 Illustration of Encoding Key Changes ...... 38

3.2 Melody and Harmony Encodings at Key Changes ...... 40

4.1 Result Statistics by Dependent Variable ...... 76

4.2 Tally of “Opt-outs” by Dependent Variable ...... 77

4.3 Correlations with the K&K Profile ...... 89

5.1 Similarity Measures Between Dependent Variables ...... 114

5.2 Syncopation Scores for Rhythm Stimuli ...... 123

x List of Figures

Figure Page

2.1 Example of Identical Acoustic Information Generating Unique Qualia 17

2.2 Flow Chart of -degree Probabilities from Huron (2006) 18

3.1 Zeroth-order Distribution of Scale-degrees in the Corpus ...... 47

3.2 Zeroth-order Distribution of in the Corpus ...... 48

3.3 First-order Probabilities for Scale-degrees ...... 50

3.4 Predicted and Observed Probabilities for 1ˆ in Tonic Context . . . . . 53

3.5 Effect of Harmony on Scale-degree Probability (continued on next page) 55

4.1 Image of Digital Interface Used in Experiment ...... 73

4.2 Scale-degree Qualia Ratings (continued on next page) ...... 80

4.3 Consistency Between and Across Participants ...... 84

4.4 Intra-Subject Correlations ...... 86

4.5 Key Profiles from Krumhansl & Kessler (1982) ...... 87

4.6 Present Data Compared to Krumhansl & Kessler’s (1982) Key Profiles 88

5.1 Three Identical Onset Patterns in Different Metrical Contexts . . . . 99

5.2 Rhythm Stimuli (Composed) ...... 105

xi 5.3 Rhythm Stimuli (Borrowed) ...... 107

5.4 List of Rhythmic Descriptor Terms ...... 109

5.5 Rhythm Qualia Ratings - Composed (continued on next page)111

5.6 Rhythm Qualia Ratings - Song Clips ...... 117

5.7 Amount of Syncopation and “Grooviness” ...... 124

5.8 Additional Examples of Syncopation Predicting Qualia ...... 125

xii Chapter 1: Introduction

This dissertation examines whether certain fundamental components of musical structure, such as scale-degree and rhythm, can generate unique, qualitative musical experiences, or “qualia,” and in particular whether the musical context — in this case harmony and meter, respectively — can affect those experiences.

Chapter 2 introduces the topic of qualia, briefly mentions its philosophical origins, and discusses its role in music. In particular, the concepts of scale-degree and rhythm pose a challenge to many traditional accounts of what constitutes “qualia” on account of their being referential. In addition, many philosophers who claim qualia to be ineffable pose a challenge to empirical researchers wishing to study musical qualia.

This chapter also reviews the literature on music and qualia, and summarizes theories about the generation of musical qualia, giving special attention to the role of implicit learning. It then proposes an operational definition of qualia, similar to those used by Zentner (2012) and Dowling (2010), that supports a perspective conducive to the evaluation of qualia as an object for scientific study.

Chapters 3 and 4 are devoted to the topic of scale-degree and harmony interac- tion, in particular questioning how a harmonic context might influence a scale-degree’s qualia. Building on Huron’s (2006) theory of the role of implicit learning in the gen- eration of musical qualia, Chapter 3 begins by examining how melody and harmony

1 interact from a statistical standpoint. Specifically, prior research has examined pitch distributions and/or first-order probabilities for scale-degrees (or pitch) in various corpora (e.g. Krumhansl, 1990; Aarden, 2003; Pearce, 2005; Huron, 2006; Temperley,

2007; Albrecht & Huron, 2014), with some of these researchers finding that a listener’s expectations can be largely predicted by these pitch distributions. However, it is un- clear — for music that typically is set with a harmonic accompaniment — whether the underlying harmonic context may influence the melodic behavior (and therefore, by extention, the melodic expectations), and if so, to what extent? Therefore, a corpus of classical music with both melodic and harmonic information was created in order to analyze the relative contribution of harmony in influencing the statistical probabilities of melodic continuations.

Chapter 4 reports a perceptual experiment that examines the qualia of scale- degrees set in various harmonic contexts. One question that arises from both philo- sophical and cognitive literature is whether conceptual knowledge can influence qualia.

In the case of scale-degree in particular, trained musicians can not only identify scale- degrees, but have conceptual knowledge about them (e.g., functional properties), and have already been largely exposed to vocabulary commonly applied to them within the

field of music theory. Therefore, by using participants with and without musical train- ing, this chapter also investigates the question of how music-theoretic training might influence descriptions of musical qualia, and in fact, questions whether non-musicians are capable of reporting scale-degree qualia at all. In addition, this experiment par- tially replicates a component of Krumhansl and Kessler’s (1982) “probe-tone” study.

As such, the chapter ends with a comparison of the results of their study with the results of the present experiment.

2 Chapter 5 discusses the related “referential” problem of musical qualia as it applies

to rhythm in a metrical context. It then details an experiment on the perception of

rhythm qualia in various metrical contexts, which makes use of both experimenter-

controlled (i.e., composed) and ecologically valid (i.e., preexisting) stimuli.

In the work as a whole, I attempt to go beyond simply looking for statistically

significant results, which merely suggest the existence of some effect, by investigating and analyzing the stimuli themselves — via theoretical analyses and post-hoc ex- ploratory comparisons — to look for features that might be causing or contributing to the effect.

Finally, Chapter 6 summarizes the research on musical qualia. Given that real music consists of a multitude of contexts (meter, duration, melody, rhythm, key, har- mony, etc.) happening simultaneously, it is proposed that while certain components in music — such as scale-degree or rhythm — may elicit relatively “stable” qualia in isolation, these may change dramatically when encountered in more complete musical contexts. Nevertheless, it is suggested that in order to build a general picture of how musical qualia are evoked, it is appropriate to study the qualitative effects of musical rudiments independently. The chapter ends with a discussion of the implications for the findings, and suggestions for future areas of study related to musical qualia.

3 Chapter 2: On the Philosophy and Science of Musical Qualia

Abstract

In this chapter, the topic of musical qualia is reviewed, taking into con- sideration philosophical, theoretical, and cognitive standpoints. Theories about the origins of musical qualia are discussed, and relevant literature is reviewed. Concerns are raised regarding various stipulations on the traditional definition of qualia, and the role of conceptual knowledge. A working definition of qualia is proposed that would make qualia amenable to scientific study.

2.1 What Are Qualia?

“Qualia” is a term, borrowed from philosophy, that generally refers to what it is like to experience something. The term is often used synonymously with “phenomenal character.” In philosophy of mind qualia is a hotly debated concept, as it features prominently in issues related to consciousness and the mind-body problem. The most common points of debate are “which mental states have qualia, whether qualia are intrinsic qualities of their bearers, and how qualia relate to the physical world both inside and outside the head” (Tye, 2015). While the full debate of what are or aren’t qualia will not be expounded on here, some frequently-made claims about qualia will be mentioned since they necessarily bear on the topic of musical qualia.

4 Before engaging in further discussion, some definitions of terminology with special-

ized meanings in philosophy will be useful. In philosophy, intrinsic (or non-relational)

is a property that an object or thing has of itself, independently of other things, includ-

ing its context. An extrinsic (or relational) property, on the other hand, is a property

that does depend on the things around it (i.e., context). Ineffable (opposite: effable)

refers to a common problem in aesthetics and philosophy of mind where it is argued

that certain aspects of experience are incommunicable. The notion of representation

in philosophy is a complex one. It comes from the theory of representationalism,

which holds that we only come to understand the physical world through our minds

and our ideas. (Note that this is not the same as Platonic idealism, where it is held

that there is no physical realm but only the realm of ideas.) A simplistic explanation

of the term representation is to say that objects (both physical and non-physical) have “isms,” “likenesses” or features that we come to internalize. These “isms” often can be described using propositional language. That is, we can say “I feel x about y” or “I believe x has/does y.” Some philosophers, however, believe that qualia are made up of more than these representations (or that they exist as something completely independent of them), and this view is described as non-representational (Hansberry,

in press). Theories of non-representational accounts of qualia therefore typically hold

that qualia cannot be described in propositional language.

While there are many claims made about qualia within the field of philosophy,

it is commonly argued that qualia are intrinsic, non-relational properties, and many

believe that they are ineffable and non-representational (Tye, 2015), and it is these

claims that prove particularly problematic when describing musical qualia.

5 2.2 The Problem Music Raises for the Study of Qualia (and Vice-Versa)

What qualia are elicited by music? There are many instances where a listener might recognize a distinctive feeling, character, or quality associated with some musi- cal moment. For example, in the striving or effortful intensity of a sung high pitch, or in the paradoxical feel of both repose and impetus evoked by a deceptive . In these and many other musical situations, listeners can experience seemingly ineffable yet characteristic subjective states that that would be appropriate to call “qualia.”

Yet, considering some of these experiences as qualia may raise concern for some philosophers who hold a strict viewpoint on what constitutes qualia, as described in the section above.

There is a problem with the notion of musical qualia as being intrinsic, specifically, that an object’s quale cannot exist in reference to anything else. The claim of qualia as intrinsic is perhaps the single most difficult problem for musical qualia, in particular for the two types of qualia I investigate in this dissertation: scale-degree qualia and rhythm qualia. Our experience of scale-degree and rhythm are not like that of color, or even of timbre, in that they are not “absolute” but in fact only exist in relation to something else. In the case of scale-degree, it exists only in relation to a tonic, key, or other scale-step; in the case of rhythm, it typically relies on its relation to a meter

(see Figure 5.1 for an example and Section 5.1 for discussion). Of course, that these aspects of music are capable of eliciting qualia does not appear to be under debate.

The problem of ineffability, if true, would certainly pose a problem for anyone wishing to study musical qualia empirically, since experiments typically rely on lan- guage as expressed by participants (e.g., “how does this sound?” or “how does this

6 feel?”). Even when examinations of qualia attempt to avoid procedures that “collect”

language as a dependent variable (such as descriptive language offered by participants

or words rated for applicability), language is nevertheless used in instructions for ex-

periments that indirectly measure qualia (e.g., “adjust the brightness until the object

on the left is twice as bright”). It seems, however, that the (typical) philosopher’s

objection to the notion of effability is that it is considered impossible to describe a

phenomenal experience in words to another individual who has never had that ex-

perience. While I agree that it is likely impossible to communicate the qualia of, for

example, scale-degree 7 (or anything for that matter) to someone who has never ex-

perienced it, I will argue that those of us who have experienced hearing scale-degree

7 are able to communicate with each other some aspect of that shared experience.

The idea of musical qualia (and perhaps all qualia) as being non-representational as defined above, is challenged if one asks what remains after the representational content is removed. Hansberry (in press), for instance, states that qualia ought not to extend to “propositional content, conceptual content, emotions, associated memories, affordances, or any other part of experience.” Tye (2015), on the other hand, makes an opposing argument (specifically in reference to the qualia of linguistic understanding):

The phenomenal aspects of understanding derive largely from linguistic (or verbal) images, which have the phonological and syntactic structure of items in the subject’s native language. These images frequently even come complete with details of stress and intonation. As we read, it is sometimes phenomenally as if we are speaking to ourselves. (Likewise when we consciously think about something without reading). We often “hear” an inner voice. Depending upon the content of the passage, we may also undergo a variety of emotions and feelings. We may feel tense, bored, excited, uneasy, angry. Once all these reactions are removed, together with the images of an inner voice and the visual sensations produced

7 by reading, some would say (myself included) that no phenomenology remains.

Music, like language, is able to evoke emotions and feelings, though arguably, in a more abstract way, since in the case of music without lyrics there lacks semantic meaning1. Music also has its own syntax and prosody (Thaut, 2008), and is capable of inducing visual imagery (Gabrielsson, 2011; Osborne, 1980; Quittner & Glueckauf,

1983). Thus, although Tye’s argument refers to language, it might apply equally well to music. These aspects of music, combined with others that I describe below, intu- itively seem to make up a rather large contribution to their qualia, and after removing all propositional content, conceptual content, emotions, memory, affordances, etc., it is difficult to comprehend what would remain.

Discussions and debates on qualia in philosophy frequently arise due to unresolved issues in perception and consciousness, and a particular problem arises over the ex- planation of phenomena such as illusion, memory, and hallucination. While these phenomena bear a relationship to objects of reality, they technically only exist in one’s mind. Common points of argument typically center on whether the qualitative experience of these phenomena are distinct from their “original” (or physical) coun- terparts, and/or whether these phenomena have representational content in the same way that their “originals” do. Scale-degrees and rhythms pose a similar problem for qualia perception, since, in a way, they too only exist in the mind. That is, they by definition rely on interpretation. However, they are unlike memory and hallucinations in that they are typically evoked in direct response to real environmental stimuli, and

1Michael Thaut, in his book “Rhythm, Music, and the Brain” asserts that “most likely the most important difference between speech and music lies in the lack of explicit semantic or referential meaning in music.”(p.2) However, he does argue that music fits the definition of communication.

8 they are unlike illusions since those are usually thought of as “perceptual errors,”

whereas scale-degrees and rhythms are typically not. Indeed, a trained musician can

hear a single complex tone (e.g., a single key struck on the piano) and impose upon it

the phenomenal experience of any scale-degree entirely through mental “gymnastics.”

Importantly, this is not simply an artificial exercise, but occurs during real listening at moments of transposition, presumably not only for trained musicians, but for all listeners enculturated into the Western musical tradition. The same phenomenon of mental “manipulation” occurs with metric reinterpretations of rhythms, or “fake- outs,’ as will be discussed in Chapter 5. This problem, where seemingly the same physical stimulus can represent two things simultaneously poses a serious challenge for philosophical theories that hold qualia to be intrinsic.

2.3 Conceptual Knowledge and Qualia

A famous thought experiment in philosophical literature is that of Mary the color scientist who lives in a room without color, knows everything there is to know about color, and then one day she leaves the room and finally experiences color. The

argument goes that the actual experience of seeing color must extend beyond her

knowledge of the physical properties of color. This argument intuitively seems to

have merit. However, it does not disprove the role of conceptual information in

shaping our qualitative experiences. That is, what if the argument is flipped around?

For someone who knows nothing about wavelengths and frequencies as properties

of light and color, but can experience color, how is their experience of seeing color different from Mary’s (if at all)? Surprisingly, there is not much discussion on the role of conceptual knowledge as contributing to qualia. Many philosophers contend that

9 “the content of each experience and its phenomenal properties are thus two different features that may be present independently from each other” (Schiavio & van der

Schyff, 2016), and perhaps this is why there is a reluctance to discuss the role of conceptual knowledge. However, simply because two things might exist independently does not mean that they cannot bear influence on each other.

So does conceptual information affect or influence qualia? Of course, there is a distinction between knowing facts and having beliefs about things, and the two may have very different effects (or lack of) on phenomenal experience. Since I do not wish to digress further into philosophical debates, I will assume that “conceptual knowledge” can imply both factual (including semantic) knowledge and held beliefs, that it relies on declarative memory, and typically can be expressed in propositional language. I argue that while conceptual knowledge may not always directly influence qualia, even casual observation suggests knowledge can, at least, indirectly influence it.

Take, for example, what might seem to be a completely random, nonsensical sound. While one person might be able to describe their experience based on the

“feel” of the timbre, loudness, and so on, the qualia will almost certainly be different for a second listener for whom the sound is not random, but is an expletive in his/her native language. Take this personal anecdote as another example: While a child around the time of Halloween, there was an exhibit that consisted of boxes with holes cut out that allowed one to stick their hand inside. Each box had a label of some human body part, insinuating its contents. However, the objects were simple everyday objects: a human eye was a skinned grape; human intestines were a ball of slick noodles; and so on. The point is that thinking that I was touching these objects

10 was a horrific and disgusting experience in a way that simply touching those common objects normally would not be. A more simple, everyday example of the same scenario unfolds when someone makes a rather poorly timed suggestion or comment in relation to some food you might be eating. How does the experience of seeing or tasting that food change after this concept has been brought to mind? Based on these examples,

I argue that knowing something (or thinking/imagining) about an object, has the potential to influence its qualia.

Further support for conceptual information influencing phenomenal experience comes from recent literature of synaesthesia, a rare phenomenon where certain indi- viduals have sensory experiences that are interlaced, either in the same modality or across different ones, such as seeing colors associated with certain letters or numbers, or associating certain tastes with sounds. Research has pointed to regions of the cortex associated with conceptual knowledge as playing an important role in synaesthesia, suggesting that associations form (at least in part) due to conceptual associations, which would somewhat explain the wide differences and subjectivity in reports of the condition (Chiou & Rich, 2014).

Of course, there is a lot of variation in the type of experiences we have, and in the way that one individual’s experience might differ from another’s. However, if conceptual knowledge has the potential to influence musical qualia, then do musicians with a deep theoretical knowledge of music experience some musical passage in a phenomenally different way compared to a group with no theoretical understanding of music? This question is addressed in this dissertation, and a discussion of the issue is carried out in Chapter 6.

11 2.4 Qualia as Synthesis of Sensory and Cognitive Processing

The concerns raised in this chapter are not unique. Despite differing points of view, Goguen (2004), Zentner (2012), Raffman (1993), and Dowling (2010), all discuss the problem with traditional accounts of qualia for empirical musicologists.

Goguen, for instance, holds that qualitative experience is “situation dependent,” and believes that “emotion ... is the essence of qualia.” Zentner argues that “contrary to the claim that musical experiences are ineffable...musical qualia may be amenable to linguistic description and objectification.” Raffman believes that empiricism has much to contribute to aesthetics, and in particular criticizes the view of qualia as

“non-representational, saying “some sensory states have ... legitimate representa- tional contents” (although she also believes these are “consciously accessible but not reportable”). Dowling, in fact, is quite audacious in dismissing the stipulations of ineffability (“I tend to think something truly ineffable would resist being manipu- lated”), determinacy (“there aren’t any incorrigible facts”), and intrinsic properties

(“there is almost no area of human perception which is not context dependent”).

In order to study musical qualia, it seems that one must either take a position on what qualia is or is not, or propose to use the word out of tradition, but define it to mean something other than what it usually means. While I have opposed various stipulations on the traditional definition of qualia — and in so doing imply a definition other than what it usually means — I propose that it remains appropriate to use the term “qualia,” since, as Dowling (2010) states:

This use of the term clearly is not consonant with some of the ways it has been used in the history of philosophy, but nevertheless it is difficult to see what term could better be used to point to the functions described.

12 In terms of an operational definition of qualia, mine can be taken as largely synony-

mous with Dowling’s, who proposes qualia as so-called “intervening variables,” or “in-

ferred processes in the causal chain leading from stimulus to response.” Specifically, I

propose that while qualia are the resulting subjective experience of something, there

are a multitude of factors that can contribute to that phenomenal aspect of musical experience, including sensory information (bottom-up information); conscious and unconscious (implicit) knowledge, memory, and awareness (top-down information); and the resulting inference or interpretation (including possible accompanying bodily changes, such as heartrate) that are a result of the synthesis of this sensory and cog- nitive processing. Furthermore, I informally define qualia to be aspects of conscious experience, available via introspection, fleeting or temporary, extrinsic (relational), at least partially communicable, and as having the potential to be mediated both by context and conceptual knowledge (explicit or implicit). In addition, while I regard qualia as necessarily subjective, the fact that persons can have similar experiences in response to the same stimulus suggests there may be (theoretically) measurable features of the stimulus that are seemingly “absorbed” by our senses and that may lead to these common, or overlapping, aspects of perception.

2.5 Introspection, Observation, and Converging Evidence

What it is like to undergo some phenomenal experience is available to us by in- trospection (Tye 2015), and I argue that we can study qualia by examining common language and observing common reactions to stimuli from multiple persons’ experi- ences and introspections. Zentner (2012) argues that “although the inner experience of an emotion is a private and subjective one, its expressions are amenable to scientific

13 description, quantification, and analysis”. Chalmers (1996), on the other hand, claims

that models of the mind as posed by scientific research are “incapable of explaining

the human experience of consciousness in any satisfactory way”(Montague, 2011). I

tend to agree, however, with Goguen (2004), who says “Chalmers discusses the hard problem which is to explain qualia in the language of the hard sciences... [however] we argue that cognitive and qualitative aspects of experience are inseparable, even though first and person approaches artificially separate them”.

In particular, as already mentioned, the converging evidence provided by similar descriptions independently offered by multiple individuals’ reaction to the same stim- ulus surely points to some common component of an experience. This point of view is similarly shared by Zentner, who says:

Although we cannot know what people’s inner experience ... might feel like, the assumption is that the similarity in emotion expression is sub- tended by an interpersonally similar inner experience of the emotion. In other words, the similarity and dissimilarity of inner subjective experi- ences across individuals, while not directly accessible, can nonetheless be inferred.

In this way, I proceed by assuming that some aspects of musical qualia can be ob- tained using this “converging evidence” methodology, even if the resulting descriptions might be crude and/or incomplete, and that musical qualia are amenable to scientific analysis.

2.6 Why Study Musical Qualia?

I believe that one goal of music cognition is to attempt to bridge the explanatory gap between musical structure and musical perception. For composers who wish to compose in a way that generates a specific effect in the listener, a better understanding

14 of the relationship between the music’s structure and the qualia they elicit can only be beneficial. Of course, music theory and musical composition have a complementary relationship. Montague (2011), discusses the language used in “formalist interpre- tations” of musical events, complaining that frequently the language used “does not offer much connection to the experience of attending to this music,” and suggests that a connection of “the totality of musical experience” with “analytical explanations” would make analyses themselves more meaningful. While this connection to musical experience is not always possible in analysis (nor, in my opinion, always relevant), a comprehensive, phenomenally-informed analysis of music remains a formidable goal not only for music theorists but for systematic musicologists as well. Of course, this type of approach to music analysis is certainly not novel, having largely been brought to the foreground of music theory by Meyer (1956), who was perhaps one of the

first music theorists to examine music from a cognitive perspective, attempting to interpret the progression from structure to anticipation to emotion.2 In yet other ap- plications of qualia research, if practitioners in the field of can better understand the link from musical stimulus to physiological reaction and phenomenal experience, it could lead to innovations and/or potential increases in the reliability of music therapies. Theories of musical expectation have been used to attempt to explain the existence of some musical qualia (Huron, 2006), however, musical qualia can also be used to generate theories. That is, if musical qualia can generate reliably elicited effects from multiple individuals’ responses to the same stimuli, it may be useful in helping shape theories of music cognition. In sum, if perceptual reactions

2Note that outside of music theory, Berlyne’s Aesthetics and Psychobiology (1971) takes on a similar lofty goal to Meyer in his investigation of the biological flow from perceptual variables in artworks to physiological variables and their associated behavioral states to affective-aesthetic responses.

15 can be reliably tied to musical features, then this information could be used to inform musical composition, music therapy, theories of music cognition and musical expecta- tion, as well as . Of course, one’s understanding of music often begins in a classroom, and therefore all of the above implications are also relevant for mu- sic pedagogy, in particular for music theory and composition pedagogy. In addition, there may be certain aspects of musical qualia that are particularly relevant for aural skills training, however these will become clearer after a comprehensive investigation of the topic of qualia “in context,” and so this discussion is reserved for Chapter 6.

Our reactions and experiences to music are almost as varied as music itself. While certainly not a simple undertaking, I believe that understanding the range of musical experiences and the factors that might contribute to them to be a valuable endeavor not only for musicians and music scholars, but also for philosophers and psychologists.

2.7 Theories of Musical Qualia

Where do musical qualia come from? What gives each note or passage a unique characteristic, or quality? For instance, what makes the scale-degrees in each measure of Figure 2.1 “feel” different?

16 Figure 2.1: Example of identical acoustic informtaion generating unique qualia. Three intervals are presented that are identical in pitch, but composed of different scale-degree pairs. Despite having the same acoustic information, each generates a distinct qualia.

Of course, scale-degrees and intervals can be defined in a mathematical sense as a point on a scale, or a unit of measure, respectively. However, this is not typically how one perceives them. Indeed, in music theory pedagogy, teachers often focus on the overall qualities of the sounds, or what they sound like. In the above examples, the acoustic information in each measure is identical. Only the musical context – in this case, key – has changed. In order to perceive these three pairs of scale- degrees as having different qualia, then, one must be able to imagine them within their appropriate positions within the given key (or scale). But what would cause one note within a scale to sound different from any other in the first place?

2.7.1 Scale-degree Qualia and Implicit Learning

Huron (2006) argues that scale-degree qualia arise – at least in part – from sta- tistical learning. As applied to melody, the theory of implicit learning assumes that, through exposure, we come to internalize the statistical probabilities for where a given scale-degree will go next. In tonal music, of course, the progression of scale-degrees is

17 Figure 2.2: Flow chart of diatonic scale-degree probabilities from Huron (2006). Diatonic scale-degree “flow chart” according to first-order probabilities. The strength of a scale-degree’s tendency is marked by the arrow’s width.

non-random, with scale-degrees exhibiting a range of probable (or improbable) “be- haviors,” or tendencies to proceed in a predictable way. For instance, 7ˆ tends to move to 1,ˆ and 4ˆ tends to move to 3.ˆ

Huron argues that because the brain ceaselessly attempts to predict what will happen next, our sense of anticipation and the unconscious confirmation or denial of the arrival of some scale-degree are, in part, what lead to scale-degree qualia. In other words, because of statistical regularities, the qualia of an individual scale-degree become closely tied to feelings of tension or resolution that have become associated with it. Huron, of course, was not the first person to suggest the role of anticipation in leading to phenomenal experience. Dennett (1991) has famously said that “brains are, in essence, anticipation machines,” and Meyer (1956) has proposed that “emotion in music is evoked when ... an expectation is not met.”

18 Of course, the entire concept of scale-degree requires a connection not only to the

other scale-degrees and where they might proceed, but to a particular position within

the scale. Browne (1981) theorized about what has come to be known as the “rare

interval hypothesis.” Browne proposed that it is the unique intervallic properties of

the diatonic collection that allow for “position finding”, or a sense of maintaining

one’s bearings with regard to a particular position within the scale or reference to a

tonic. Specifically, the subset of all possible intervallic (dyadic) possibilities within

the diatonic collection can be tabled into a set of interval-classes, where each interval

class appears a unique number of times. Thus, the rare intervals — the and

— function as the “position finding” elements. This theory was tested

empirically by Butler and Brown (1981) who found that listeners were best able to

correctly infer the tonic from three note subsets when they included the rare interval

of the tritone. Shepard (2009) argued that we build an internalized mental represen-

tation of the pattern of the scale, which is what leads us not only to understand the

music we hear, but also to the generation of qualia. Note that Shepard’s position also

implies that simple exposure (implicit learning) is at work in the generation of these

internalized representations. While Browne doesn’t explicitly mention an internalized

representation of the scale, he does imply that the structural properties of the scale

are what generate their qualia: One might look at a [tonal] usage, even one which is merely a “feeling” long noted, and attempt to provide the structural differentiation which might account for that usage in terms of stateable “facts.” It seems clear that an event when perceived or imagined in context, is somehow enriched by its context.

Here Browne provides a footnote in which a colleague comments on the relation between Browne’s theory and her own observations in the classroom:

19 But of course! Students have always insisted that the I chord doesn’t “sound like” the IV chord or the V chord — even though they are all obviously major triads.

Thus Browne and his colleague are suggesting that this internalized information about a note’s (or chord’s) position in the scale (here referred to as “context”), contribute to the generation of unique qualia.

Raffman (1993) argues there is a “structural musical ineffability” which arises, for example, when an untrained musician is trying to describe some aspect of music but lacks the proper understanding and/or terminology; or when a performer feels compelled to perform a passage in a particular way without knowing why. What

Raffman seems to be describing is musical intuition. Since musical understanding and

cultural norms are not hard-wired at birth, presumably musical intuition arises from

experience. I therefore argue that musical intuition is simply a form of unconscious

knowledge, or, in other words, a form of implicit learning. While Raffman takes these

scenarios as examples of music’s ineffability, it does not follow that all qualia (or all

aspects of some quale) are ineffable, as instances of music’s effability have already been

discussed. (Note that musical intuition might best be explained via Daniel Dennett’s

line of thinking — which claims that all knowledge could be capable of expression in

verbal form if we only possessed the necessary vocabulary and paid close attention to

the details of our experience — since many musicians can express the “nuances” of those qualia with proper training.)

Since the notion of implicit or statistical learning features prominently in theo- ries of musical expectation, and expectation has been implicated in the generation of scale-degree qualia, Chapter 3 will continue this inquiry by investigating the statis- tical probabilities of scale-degrees in a harmonic context. In particular, the notion

20 of musical context itself poses an interesting question for theories of qualia: If mu-

sical “objects” such as scale-degrees evoke seemingly unique qualia that are largely

brought about through implicit learning, do their qualia remain stable across different

contexts, or would the statistical frequency of the context itself bear on the qualia of

a scale-degree? For instance, the quintessential scale-degree qualia are perhaps those

evoked by scale-degree 7, which typically are characterized with terms such as “lead-

ing”, “leaning”, “pulling”, and “restless”. However, scale-degree 7 is unique in that,

within Western classical music, it is the only scale-degree that is so strongly associated

with chords of dominant function, which inherently carry strong tendency to resolve

to tonic. Interestingly, however, if the so-called “leading tone” (i.e., 7)ˆ is placed in a

context, or, more rarely, in a tonic context, it appears to lose those original

leading-type qualia, and in fact in those contexts (iii and I7) scale-degree 7 tends to

resolve downwards. Thus, one question which will be addressed in this dissertation

is how the role of context might shape certain aspects of musical qualia.

2.7.2 Rhythm Qualia

While most theories of musical qualia have focused on scale-degree, aspects of

rhythm, meter, and timing also have much to contribute to the dialogue of musi-

cal qualia, although, their effects are rarely referred to as “qualia,” but rather, are more likely to be described using terms such as “groove,” “nuance,” and “feel.” Ro- holt (2014), for example, devotes an entire book to the “phenomenology of rhythmic nuance,” and while he avoids the term “qualia,” he nevertheless grapples with the problem of ineffability. Roholt, like Raffman, believes that musical nuance is ineffa- ble, and argues that the “feel of a rhythm” is something that arises only through our

21 “embodied engagement with the music,” or, more specifically: “the feel of a groove is

the affective dimension of the relevant motor-intentional movements”(p.105). London

(2016), however, criticizes Roholt’s logic, arguing that ineffability is not necessary to

his claim of embodied cognition:

If I am aware of the extent to which a groove is pushing or pulling (it can be pushing a little or a lot, violently and jerkily, or steadily and so forth), then my sense of that groove/nuance can be fairly determinate and hence effable.

Many scholars have written extensively about the nature of embodied cognition and its role in music, and especially with regards to our phenomenological experience of rhythm and meter (e.g. Abraham, 1995; Iyer, 2002; Thaut, 2008; Grant, 2010; Janata et al., 2012; Schiavio & van der Schyff, 2016). In addition, many works writing about rhythmic nuance or groove, typically focus on the role of microtiming, meaning tiny deviations from a metronomic, “square” pulse (e.g. Iyer, 2002; Fruhauf et al., 2013;

Roholt, 2014). (Roholt, for example, argues that it is precisely these deviations in microtiming that create “groove.”) However, while bodily motion and expressive timing likely both play an important role in our experience of rhythm, neither of these factors are investigated in this dissertation. As such, further review of this literature here would only dilute the aims of this chapter.

Of course, microtiming is only one contributing to our phenomenal experi- ence of rhythm. And knowing that the notion of groove relates to a desire for physical movement (Janata et al., 2012), does not elucidate what it is about the music that provokes movement. Indeed, London (2016) points out that both Roholt (2014) and

Janata et al. (2012) “remain agnostic as to what ‘groove’ actually is; they do not, for example, analyze the most and least groovy songs in their survey to determine the

22 structural requirements for grooviness.” In addition, while the study of microtiming

and nuance can give us insight into the differences between multiple performances

of the same music (with the same rhythms), presumably two pieces composed with completely different rhythms will have a much larger effect, or perceived change, com-

pared with changes in microtiming. That is, the study of microtiming and nuance is

necessarily a study of musical “minutiae,” and, while a fascinating topic in and of it-

self, it is curious that more attention has not been given to the study of the perceived

differences between more obvious changes in rhythm; namely, the perceived effects of

different rhythms and meters. Aristides Quintilianus, writing as far back as the third

century, attempted to describe the feel and affective influence of various rhythms,

even prescribing (or warning against) certain rhythms for “ethical” purposes: “a

quieting of the heart...useful in war dances...sacred...the most healthful...bring the

heart into not a little disorder...pull against the soul...fearful and deadly...orderly and

manly...indulgent...lowly and ignoble...stimulating to actions...supine and flabby...”

(Matheisen, 1983). It would seem, then, that indeed different rhythms (and meters)

give rise to different qualitative experiences. Thus, this dissertation will, in part,

investigate the variety of qualia (if any) evoked by rhythm and meter. The topic of

rhythm and meter will be given a more complete introduction in Section 5.1.

2.8 Chapter Summary

This chapter has provided a brief explication of qualia as a term with origins in

philosophy. The topic of qualia features prominently in current discussions in the

philosophy of mind, as it is commonly used as a central argument in claims about the

mind-body problem and the nature of consciousness. While those debates are beyond

23 the scope of this dissertation, there are commonly-made claims about qualia that prove problematic when applied to certain aspects of music such as scale-degree and rhythm. In this chapter, those claims were pointed out, and shown to be incompatible with known properties of certain musical qualia (especially scale-degree qualia), which poses a problem for musical qualia as an object of empirical study, as well as posing

(yet another) problem for philosophical arguments about qualia, which frequently define qualia as ineffable, non-representational, and intrinsic.

An operational definition of qualia is proposed, following a similar stance taken by both Dowling (2010) and Zentner (2012). In particular, a crucial distinction between my stance on understanding qualia in comparison with traditional philo- sophical approaches, is that I do not propose to uncover or “capture” the essence of what constitutes the purely phenomenological component of some experience, but rather, I propose that various “top-down” and “bottom-up” factors, including con- scious and unconscious memory (implicit knowledge), conceptual knowledge, emo- tional and physiological reactions, can all contribute to the resulting perceived expe- rience. In addition, I propose, like Zentner, that language can be useful as a means of communicating common reactions to the same stimulus, and that this can provide converging evidence in support of some component that might either contribute to the qualia, or help describe the overall qualitative experience. In this dissertation

I aim to uncover some common ground (or lack thereof) for the perception of both scale-degree qualia and rhythm qualia.

24 Finally, this chapter has provided a brief background on the work of others who have proposed some theoretical explanation for the origin of musical qualia, in par- ticular, the role of statistical learning (or implicit knowledge) as an important com- ponent in the shaping of our musical experiences. In the following chapter, I begin by first investigating the functional and resolutional properties of scale-degrees from a statistical perspective.

25 Chapter 3: A Corpus Study

Abstract Probabilistic models have proved remarkably successful in modeling melodic organization (e.g. Pearce, 2005; Huron, 2006; Temperley, 2008). However, the majority of these models rely on pitch information taken from melody alone. Given the prevalence of homophonic music in Western culture, how- ever, little attention has been directed at exploring the predictive power of harmonic accompaniment in models of melodic organization. The re- search presented here uses a combination of three main approaches to empirical — exploratory analysis, modeling, and hypothesis testing — to investigate the influence of the harmonic accompaniment on melodic behavior. In this study a comparison is made between models that use only melodic information and models that consider the melodic information along with the underlying harmonic accompaniment to pre- dict melodic continuations. A test of overall performance shows a sig- nificant improvement using a melodic-harmonic model. When individual scale-degrees are examined, the major diatonic scale-degrees are shown to have unique probability distributions for each of their most common harmonic settings. That is, the results suggest a robust effect of harmony on scale-degree tendency. If scale-degree tendencies originate in part from their statistical probabilities, then the finding that these probabilities are mediated by the harmonic context suggests that the qualia they elicit will differ depending on the supporting harmony.

Research has suggested that statistical learning plays a substantial role in forming musical expectations (e.g., Krumhansl, 1990; Huron, 2006; Temperley, 2007). Given that implicit knowledge can arise from probabilistic exposure to sequences and pat- terns (Saffran et al., 1999; Romberg & Saffran, 2010), it is appropriate to begin

26 an investigation of musical expectations by examining the statistical properties of melody. (For a review of implicit and probabilistic learning see Reber, 1993).

The majority of these models focus on the evaluation of information taken from the melody alone (e.g., pitch height, interval size, interval direction, etc.). However, re- search has also established the importance of rhythm and phrasing in contributing to melodic expectation. For example, rhythmic information can help predict the location of phrase endings (Palmer & Krumhansl, 1987; Krumhansl & Jusczyk, 1990; Jusczyk

& Krumhansl, 1993; Krumhansl, 2000), and pitches located near phrase endings will have an increased probability to move towards their note of resolution (Aarden, 2003;

Pearce, 2005). Given the prevalence of homophonic music (i.e., melody with accom- paniment) in Western musical cultures, an obvious avenue for further exploration in models of melodic organization would be that of harmonic context. In other words, in modeling melodic expectancy, it may prove beneficial to examine melody not just as isolated lines, but as lines embedded in a harmonic context. Accordingly, this chapter investigates the role of the supporting harmonic accompaniment in shaping melodic organization. Taking a probabilistic approach, in this study a model is constructed that combines harmonic information from the accompaniment with first-order melodic information from a digital corpus of encoded musical scores from the common prac- tice period. This model is compared to a model that uses solely first-order melodic information in order to investigate the effect of harmony on the predictability of melodic continuations. To anticipate the results, it appears that melodic prediction is significantly improved when harmonic information is taken into account. Following the overall model comparison, an exploratory analysis considers unique interactions

27 of harmony and scale-degree, in order to examine the specific effect of the former on the latter.3

There are many ways to quantify and dissect melodic information. In cognitive models of musical expectation — in particular those based on Narmour’s Implication-

Realization theory (1990; 1992) — pitch information is tallied, but also the specific interval and direction from pitch to pitch. The current approach, however, is based on ideas of statistical learning proposed by Huron (2006) where only first-order scale- degree information is considered. Using scale-degree, like pitch-class, assumes equivalence and removes information about the size and direction of an interval. This means that the model does not distinguish between, for instance, second and . However, one should bear in mind that the possible interval and direction between two scale-degrees has only two options (ignoring compound intervals, which are extremely rare in melodies). Furthermore, it has been shown that small melodic intervals are much more common than large ones (Ortmann, 1926;

Merriam et al., 1956; Dowling, 1967; Huron, 2001; Temperley, 2008). Thus, most scale-degree successions will represent the smaller of the two intervallic possibilities.

As will be discussed in further detail in the following sections, “harmonic informa- tion” in this case refers to a reduction of the accompanimental texture to a Roman numeral. This approach has several limitations which should be acknowledged; such as the reliance on a single (subjective) analysis, the marginalization of voice-leading practices, and the insertion of a “presentist” bias which discards relevant stylistic

3Notice that the causal influence between melody and harmony can go in both directions: That is, melody might be expected to affect harmony as well as harmony affecting melody. For the purposes of this chapter, however, only the probabilities relationship of harmony on melody will be investigated, leaving aside the reverse analysis for another occasion.

28 information (Gjerdingen, 2014). In this case, however, using Roman numerals affords a simple method that significantly reduces the number of necessary comparisons.

It should be emphasized that the ensuing model of melodic probability is not be- ing presented as an optimal model of melodic prediction. Any realistic model will certainly consider more than first-order scale-degree successions and a crude har- monic analysis. Rather, the goal is to learn whether the harmonic information in the accompaniment can improve the predictive power of a model of melodic continuation.

The motivation for this research is to better understand melodic tendency as it functions in a (relatively) ecologically valid context. If scale-degree tendencies orig- inate in part from their statistical probabilities, and the probability of any given melodic event turns out to be largely dependent on the underlying harmonic context, then their qualia would be expected to differ depending on that context. The hy- pothesis tested here is that harmonic context plays a sizable role in shaping melodic behavior. This hypothesis is tested using a statistical (i.e., probabilistic) approach.

3.1 The Corpus Analysis

3.1.1 Overview

The ultimate goal of the project is to better understand the relationship between harmony and melody; more specifically, to identify possible influences of the sup- porting harmonic context on melodic organization. The approach taken here is a strictly probabilistic one. That is, the probabilities of melodic continuation are eval- uated in two conditions: the first examines melody in isolation, the second examines melody along with the underlying harmony in the accompaniment. In this way, if

29 the probabilities of melodic continuation are significantly altered in the latter condi- tion (where underlying harmony is taken into account) then one can infer that the melodic behavior is influenced by the harmonic context. To this end, a musical cor- pus was assembled containing melodies set within unambiguous harmonic contexts.

To test the main hypothesis, two models were needed: one with first-order melodic succession probabilities calculated using only melodic information, and another which combined first-order melodic information with the harmonic information taken from the accompaniment of the antecedent scale-degree. In order to have some measure of improvement, a zeroth-order melodic model was also included. Since it is well estab- lished that first-order models perform better than zeroth-order models, the zeroth- order model provides a baseline against which to calculate each successive model’s improvement. Thus, probabilities were calculated for melodic antecedent-consequent pairs of scale-degrees, first in isolation, and then with the harmony supporting the antecedent scale-degree of the consequent tone, in order to investigate whether the harmonic support of the antecedent note can help predict the melodic behavior of the consequent. In brief, the results of the study will show that harmonic context appears to have a strong influence on melodic behavior.

3.1.2 Sampling

In light of the research hypothesis, a corpus of music featuring melodies with harmonic accompaniment was needed. While there are many Western forms of music that have rich harmonic traditions, such as rock, pop, and , classical music makes an ideal jumping-off point for this research, as it forms a large body of notated music that is easily accessible. In addition, the use of classical scores provide a

30 convenience sample, since the pitch information for many works is already encoded in digital format. Therefore, this study specifically investigates the statistical properties of classical melodies from the . The corpus thus excluded material typically considered outside of the bounds of the common practice era, such as 20th century and Renaissance music, as well as solo and two-part works (e.g., Bach inventions) where harmonies are incomplete or implied. Although it can be argued that the common practice period represents a wide range of harmonic practices (for instance, some composers included in the corpus were not yet thinking in terms of triads with roots), it is nevertheless commonly thought of — and taught as — one coherent tonal system.

Given the aims of the study, there were several features that were desirable in the corpus: 1) Voice-leading and use of harmony should be representative of the common practice period; 2) The melody could be easily determined and, as much as possible, clearly distinct from the harmonic accompaniment; 3) Several composers should be represented whose works span the time frame of the common practice period; 4) Both vocal and instrumental styles should be included in roughly equal proportions; and

5) Homorhythmic and non-homorhythmic styles should be included in roughly equal proportions. Note that “proportion” is in relation to the total number of melody notes, not the number of pieces, since melody tones provide the unit of measure in the study.

In assembling the desired sample described above, for practical purposes it was preferred to use any existing digitally encoded scores. A subset of Bach chorales were chosen as a convenience sample, as they had harmonic analyses which were already encoded by other scholars. This meant that the works only had to be checked

31 for errors, rather than encoded entirely from scratch. Similarly, the Schubert lieder had melodic information already encoded, and only the harmonic analyses had to be added. The remainder of the corpus was supplemented by consulting Burkhart’s

Anthology for Musical Analysis (1994), specifically searching for pieces composed between nominally Baroque and Romantic eras (roughly 1600–1850), preferring those which had clear separation of melody and harmony. In this process, an effort was made to gather works which represented a variety of composers, styles, and instrumentation.

The Anthology was an appropriate source since it contains many works featuring harmonic accompaniment by many composers in different styles and instrumentation dating from the common practice period.

The corpus used in this study yielded close to 10,000 melody notes and was com- prised of 68 pieces, including: 50 Bach Chorales, 1 Haydn string quartet, 2 Mozart sonata movements, 3 Beethoven sonata movements, 6 Schubert Lieder, 3 Schumann piano miniatures, 1 Clementi Sonatina, and 2 Mendelssohn Songs without words.

(See Appendix A for the complete list of works included in the corpus). All musi- cal information in the corpus was encoded in Humdrum format (Huron, 1995). All pieces (aside from the Bach chorales) were harmonically analyzed and encoded by the author. Details of the analytic procedure are outlined in the following section.

It should be mentioned that while the corpus may appear to be somewhat biased due to the overrepresentation of Bach chorales, the chorales are much shorter in length than the remaining pieces, meaning that, in fact, the chorales do not constitute the majority of notes or measures in the corpus (e.g., the chorales constitute a total of 726 measures; the remaining pieces make up 1306 measures.) This means that, bar-for-bar

(and note-for-note), there may in fact be an overrepresentation of non-homorhythmic

32 music, where melodies commonly continue over an unchanging harmony. However,

given that the majority of music in the classical genre is not homorhythmic, this

slight imbalance may prove more representative of melodic expectations in general.

3.1.3 Methodology

In assembling the corpus, the goal was to be able to compare adjacent “slices”

of the musical texture. Since melodic tendency was the primary area of interest,

each slice would be one melodic tone (or attack), regardless of duration. Two models

were created which both consider a prior musical “state” to predict the subsequent

melodic tone. The first model looks only to the prior melodic note to predict its

successor; the second model (referred to as the harmonic model) looks to the prior

melodic tone and that tone’s harmonic accompaniment to predict the successor. Of course, most melodic tones in the corpus do not coincide with an onset in the har- monic accompaniment. In order to best represent the harmonic information present in the accompaniment at any given point in a melody, every melodic slice is assigned a

Roman numeral based on the most recent sounding harmony present in the accompa- niment. As will be explained in the paragraphs below, determining scale-degrees and

Roman numerals relies on the determination of a key, which was resolved via human analysis (as opposed to automatic methods such as feature extraction.) Furthermore, apart from the chorales which have clear harmonic onsets (i.e., homorhythmic vo- cal entrances), the remainder of works in the corpus typically have arpeggiated (or non-homorhythmic) harmonic textures. In these cases harmonic labels were assigned based on the harmonic rhythm of the accompaniment. Currently, the most accurate method for extracting information about harmonic relationships from homophonic

33 musical textures (in classical music) is via manual analysis. While such methods are necessarily subjective, the level of difficulty posed by the analysis of these works was trivial, suggesting that error rates would be minimal. In addition, although works were not analyzed in duplicate, a recent collaborative project by the author involving duplicate Roman numeral analyses on a similar data set suggests that the level of discrepancy between subsequent analyses would have been minor (Devaney et al.,

2015).4

Since the primary purpose of this study is to investigate the influence of harmony on scale-degree behavior, it is imperative to limit the number of variables in order to have enough data to be able to make generalizable conclusions. As such, certain decisions were made to simplify the harmonic context. For instance, there may be a difference in melodic behavior for scale-degree depending on whether the supporting harmony is in position or not. However, classifying all harmonies separately based on both Roman numeral and inversion would divide the data too much, re- ducing statistical power and making it more difficult to discover patterns. Thus, in the analysis all inversion information was collapsed to . Although it is possible that the bass tone may be a more accurate predictor than the generic triad, as a preliminary corpus study of homophonic music, a simpler model using fewer pa- rameters was preferred over a complex one to start with. In the same vein, rhythmic information, such as the duration of melodic tones or chords, was removed. Other simplifications included collapsing seventh chords together with triads. For example, instances of V7 and V were all labeled as “V”. It should be noted, however, that since

4In fact, the texture in the TAVERN data set was more complex than the textures in the present corpus, and discrepancies between analyses were still minor. The most common discrepancies in- volved questions of inversion (typically during complex left hand passages) and versus true modulation.

34 the scale-degrees were tallied and paired with their associated harmonies, it becomes clear when a given scale-degree is functioning as a harmonic seventh. Like any empir- ical experiment, corpus studies must also operationalize concepts in order to render the hypothesis testable. However, operationalizing the various parameters raises a number of complicated issues, which are described in the following paragraphs.

The Problem of Multiple Contexts

In this study, scale-degrees and harmonies are examined in their musical contexts.

This means both in a vertical context (i.e., which scale-degree sits above a given harmony?), and a horizontal context (i.e., which melodic event precedes another?).

This simple scenario is complicated by the presence of rests, since it would not be beneficial to count every instance where a scale-degree proceeds to a rest (i.e., there would be too many). If rests were to be removed, however, care must be taken to preserve the synchronization of the melodic and harmonic information. In order to preserve the original melodic-harmonic alignment, other musical information would need to either be deleted, or added, or both. Thus, rests were treated in the fol- lowing way: Rests were removed whenever they were present simultaneously in both melody and accompaniment. For rests comprising less than one measure, the melody note prior to the rest was copied, as if it were present for the remainder of the mea- sure. This procedure of extending harmonic or melodic material was preferred over deleting material, since it was decided that the latter would create discontinuities in the analysis of sequential progressions. Rests of less than one measure make up the majority of rests in the corpus. Thus, two notes with a rest in between are tallied such that the first note is understood to proceed to the second note. Rests (in either the melody or accompaniment) greater than one measure in duration were dealt with

35 on an individual basis. For instance, the majority of longer rest periods were due to breaks in the melodic lines of Lieder, such as during introductions, codas, or pauses between phrases. If this type of break consisted of a complete solo phrase in the accompaniment, and did not involve any transition or change of key, then the entire accompanimental phrase was deleted. Often a break comprised only part of a phrase, where the melodic line was passed to the accompaniment, in which case the material was not deleted and the melody could be traced. Using this procedure, harmonic and melodic alignment was preserved as best as possible.

Another challenge was how to handle repeating sections of music. For every section of music that repeats, should it be encoded (and thus tallied) once or twice?

The most common scenario was that of repeat signs written into the score. However, there were some pieces with repeats that were written-out. The procedure for handling repeats was as follows: ignore repeat signs but include repeated musical segments that are written-out, unless the written-out portion comprises more than eight measures.

Given that most phrases are four or eight measures, it seemed that eight measures would be sufficiently long to avoid discarding most written-out repeated segments.

The rationale for ignoring repeat signs was that long segments of music that are typically repeated (such as the entire A and B sections of many binary forms, or complete sonata expositions) would occupy a disproportionate volume of the corpus, thereby reducing data independence and potentially biasing the data.

The Problem of Modulation and Tonicization

In the musical analyses for this study, melodies are represented as scale-degrees.

Since scale-degrees rely on a reference to a tonic, it is important to reflect, as ac- curately as possible, the implied key at each moment. Of course, in real musical

36 composition, key determination can be rather complicated. Music theorists typically distinguish three categories of key migration: modulation, tonicization, and - rary application of a chord (or chords) from outside the primary key area (Laitz,

2008). With modulation, there may or may not be an explicitly notated change of key in the score. Furthermore, what sets modulation apart from tonicization is not always clear. Although not all theorists agree on the criteria for classifying key changes under one label or another, for the purposes of this analysis it was decided that in order for the scale-degrees to accurately reflect their most appropriate har- monic context, and to maximize consistency, there should be a systematic procedure in place. Thus, the working definition of key change for the purposes of this analysis was as follows: for any segment with four or more harmonies in a row (regardless of harmonic rhythm) that applied to a secondary key area, a key change was deemed to occur. For example, Table 3.1 shows the following arrangement of harmonies: The section begins in the original key of , but a C7 harmony appears as a re- peated applied dominant to F (III) after which it appears to go back to D minor (at

A7-Dm), but then continues in F. According to the four-harmonies-in-a-row protocol, the resulting is given in the right-most column of Table 3.1 below. Note this means that, in the example below, the first appearance of A7–Dm are given different Roman numeral encodings from when they return towards the end of the passage, where the same chords are coded as applied chords in F.

One other problem arises in the process of changing keys; wherever a key change occurs, it will affect the flow of both Roman numerals and scale-degrees. Thus, when pitch classes are represented as scale-degrees, and compared in their horizontal

37 (antecedent-consequent) context, a problem arises at key-change locations.

Table 3.1: Illustration of Encoding Key Changes. An illustration of how key changes were operationalized for the purposes of encoding the harmonic progressions of complete pieces that included applied chords, temporary , and modu- lations to other keys. When four or more harmonies in a row belonged to a secondary key area, they were labeled as local chords in the secondary key. If there were less than four chords belonging to a secondary key, they were labeled as applied chords.

Chord R.N. in key of Dm R.N. used in analysis Dm i i Eø7 iiø7 iiø7 A7 V7 V7 Dm i i Eø7 iiø7 iiø7 (viiø7) *key of F C7 V7/III V7 F III I C7 V7/III V7 F III I A7 V7 V7/vi Dm i vi BbM7 IV7/III IV7 F III I C7 V7/III V7 F III I

Consider a scenario represented in the two illustrations below, with each column representing a measure of music, where the third measure (C7) initiates a key change from D minor to .

D Minor: F Major: Melody: A, Bb, A, F, A G, A, G, E, D C, Bb, A, G A Chords: Dm Eø7 C7 F

38 D Minor: F Major: Scale-degrees: 5, b6, 5, b3, 5 4, 5, 4, 2, 1* *5, 4, 3, 2 3 Roman Numerals: i iiø7* *V7 I

As indicated by the asterisks (*), (and despite that, by chance, the resulting Ro-

man numeral progression is a logical one), this analysis does not accurately represent

the melodic nor the harmonic progression, since the Roman numerals and scale-

degrees asterisked do not belong to the same key. To remedy this, at the point where

the music changes key, the harmony (and scale-degree) immediately prior to the key

change is notated as a pivot chord — regardless of whether it would be interpreted

as such by a music theorist — and then given a boundary separation, followed by a

second representation (as a distinct Roman numeral) in the second key area. In this

way, “false progressions” can be ignored (i.e., skipped over) in the process of tallying

the first-order melodic and harmonic contexts. Take again the following progression

from Table 3.1, where the half-diminished ii chord pivots into a vii chord:

D minor: i iiø7 *F major: viiø7 V7 I

A side-by-side “sliding window” of antecedent-consequent pairs of harmonies and scale-degrees can thus be represented as shown in Table 3.2.

Rows showing the chords or scale-degrees to be ignored in the analysis are shown with asterisks. By preceding key changes with a notated pivot chord, and ignoring antecedent-consequent pairs which cross a key boundary, the correct harmonic and melodic contexts are preserved, and spurious progressions are avoided. Although it is recognized that this solution causes the pivot chord to be counted twice — once

39 Table 3.2: Melody and Harmony Encodings at Key Changes. Illustration of how antecedent-consequent pairs of Roman numerals or scale-degrees were encoded at key changes. Sliding windows show each consequent as it becomes the antecedent. At a key change, a boundary separation is indicated by a dash, and the pivot chord (or scale-degree) gets reinterpreted in the second key. Rows with dashes are skipped in the tallying process so that illogical progressions (chords or scale-degrees from different keys) are never counted.

Harmony Scale-degree Antecedent Consequent Antecedent Consequent i iiø7 5 b6 *iiø7 — b6 5 viiø7 V7 5 b3 V7 I b3 5 5 4 4 5 5 4 4 2 2 1 *1 — 6 5 5 4 4 3 3 2 2 3

with each label — the ratio of pivot to non-pivot chords in the corpus is sufficiently small that the chord duplication should not raise cause for concern. Furthermore, this method appears to be the best way of accurately capturing the antecedent- consequent harmonic and melodic relationships as they relate to the implied key at any given moment without unnecessarily discarding information. Thus, preservation of immediate key structures was given priority over preservation of higher-order key interpretations.

40 3.2 Evaluating the Models

3.2.1 Global Hypothesis Test

The primary research question asks: is melodic tendency dependent on harmonic context? In order to test this hypothesis overall, two models predicting melodic continuations were created: The first uses first-order melodic information, and the second uses first-order melodic information plus the harmonic information from the accompaniment of the antecedent melody note. In order to have some measure of the degree of improvement of the harmonic model, a third model was included which uses only zeroth-order melodic information, since it is well established that first- order models tend to perform better than zeroth-order models (e.g. Pearce, 2005;

Huron, 2006; Temperley, 2007). In the model tested, all scale-degrees and harmonies were included from the complete corpus. Each model assigns a probability to each melodic note, and the overall strength of the model can be determined by evaluating the product of all the probabilities. Since the resulting products are infinitesimally small, the standard procedure is to instead take the log of each probability, and then sum them together to generate a log-likelihood. The log-likelihood values are then divided by the length of the model’s data set and converted to a positive number.

The resulting values are labeled as cross-entropy scores to be consistent with similar techniques in recent literature (e.g. Temperley, 2007).5 Lower cross-entropy values indicate a better model fit. Using this methodology the following cross-entropy values were obtained: zeroth-order model = 2.22, first-order model = 1.87, harmonic model

5This use of the term cross entropy may not fit the traditional definition (see Rubenstein & Kroese (2004), p. 29), however, it is consistent with Temperley’s usage in Music & Probability (2007).

41 = 1.60. The lower cross-entropy score comparing the first-order model to the zeroth- order model demonstrates that the former is more accurate at predicting the melodic continuation. What is relevant to the primary hypothesis is that the harmonic model generates a substantially lower cross-entropy value compared to the first-order model.

Notice that the difference in moving from the first-order model to the harmonic model is similar to the difference between the zeroth-order model and the first-order model.

Of course, a model with more parameters will always fit equivalent to, or better than, a model with fewer parameters. Thus, there is some possibility that the im- proved model fit might be due to chance. As such, it is appropriate to perform a statistical test. A log-likelihood ratio test was conducted to compare the first-order model to the harmonic model. This test compares a simpler model with a more complex model by evaluating the difference of log-likelihoods between models. This difference is multiplied by -2 to produce a value known as the deviance, which is known to be χ2 distributed, with degrees of freedom equal to the difference in the number of parameters between the two models. This ratio test produced a deviance of 4840. Unfortunately, determining the degrees of freedom for this test is problem- atic. One would expect the degrees of freedom to represent the difference in the number of parameters in the two models, which in this case is 7424.6 However, many of these parameters are highly correlated with each other, and furthermore, many of the parameter estimates are zero (i.e., not every scale-degree actually appears with every possible chord type). Consequently, this number exaggerates the degrees of

6The first order model has 16 × 16 parameters: 16 antecedent scale-degrees (e.g., ]1ˆ and [2ˆ are considered separately) and 16 potential consequent scale-degrees. The harmonic model has 16 × 16 × 30 parameters (the same scale-degrees from the first-order model, but with 30 possible chord types).

42 freedom. Since a theoretical χ2 distribution could not be calculated, an empirical dis- tribution was calculated through Monte Carlo computer simulation. Five thousand log-likelihood tests were conducted using the first-order model and a “scrambled” version of the harmonic model where the harmony paired with the antecedent scale- degree was randomized, producing 5000 null χ2 values. The distribution of these

5000 χ2 values were centered around 2350 with the top five percent of values falling within the range of 2406–2517. The χ2 value of 4840 observed using the actual data is far above this range, allowing us to conclude that the improvement of the model is significantly better than chance, even without the precise calculation of a p value.

This procedure follows the traditional approach of statistical testing, where a critical test applied to the complete data set determines the likelihood of a given outcome appearing by chance. Modeling techniques in computer science, however, typically use a form of evaluation that reserves a portion of the data set to test the efficacy of the model using the probabilities calculated from the remainder of the data set. Thus, this complementary method was performed as well. The complete data set was divided into five portions, where each fifth was rotated through as a reserve set, and tested against the remainder which acted as the training set. If a parameter

(i.e., scale-degree + harmony) in the test set was not encountered in the training set, the probability for that parameter was pulled from the higher-order (i.e., first-order) probability for the scale-degree alone. For each test set this was only necessary for approximately .05% of the data (or 10 in 1813) on average. Cross-entropy values were calculated for each model on each of the five test sets, and an overall value for each model was obtained by taking the average of all five. The cross-entropy values found using this method were: Zeroth-order model, 2.22; First-order model, 1.88; Harmonic

43 model, 1.67. As can be seen by comparing with the earlier test, the cross-entropy values found (and the differences between them) using both testing methodologies are very similar.

These global tests show that evaluating a melodic tone along with its underlying harmony allows one to make more accurate predictions about an ensuing melodic tone than simply considering the first-order distributions of melody alone. This is consis- tent with the main hypothesis. Of course, this overall test does not tell us anything of interest about the specific interactions of harmony and scale-degree, and in particular, how the former might affect the latter. In order to investigate this, the subsequent paragraphs make an exploratory examination of the scale-degree distributions in each model.

3.3 Descriptive Statistics

3.3.1 Mode Classification

The complete corpus consists of pieces both in keys. In order to evaluate the summary statistics, a question arises as to whether to keep the corpus as a whole, or divide it into two parts, with one part comprising the pieces written in the major mode, and the other part comprising pieces in minor mode. Of course, major key works can include brief transitions into minor keys and vice-versa, so classifying entire works based on modality may appear to be an arbitrary decision. However, it is possible that the distribution of harmonies and/or scale-degrees might differ depending on the primary modality. Furthermore, if the corpus remains undivided, the tallying of all harmonies will produce an enormous table which may prove difficult to interpret. For instance, there would be three versions of the chord: vi,

44 VI, and bVI. This would make it difficult to distinguish, for example, the proportion of harmonies originating in minor key contexts from those that are a result of modal mixture. Accordingly, it was decided to split the corpus into two, based on the primary modality of the piece.

Finally, there remained one additional problem with mode classification, even after the division of the corpus into two. A given piece that modulates to its would contain both major and minor tonic harmonies which would be tallied separately (i.e., “I” in a major key; “i” in a minor key). However, dominant function chords, and a few other harmonies such as applied chords or Italian 6th chords, share the same symbols regardless of the mode they are employed in. The result of tallying all instances of such chords, is that while the tonic harmonies would be subdivided and grouped by modality, the dominant harmonies would all be pooled together, possibly distorting the true ratio of dominant to tonic function harmonies. As such, harmonies that commonly appear in either mode were specially labeled in the analysis according to the subsequent harmony in the context of the piece. For example, if “V” is followed by “i” then that V chord is marked with an additional symbol to indicate that it preceded a . These V chords would be tallied separately from V chords that were followed by a . In this way, common harmonies with labels that could apply to either the major or the minor mode were distinguished so as to most accurately represent the proportion of harmonies in the corpus.

In the end, despite dividing the corpus in two, the total number of harmonies found in either part of the corpus remained unwieldy. In order to simplify the results, harmonies which represented less than one percent of their respective major or minor corpus were omitted from the graphs and tables. Overall, these rare harmonies tended

45 to be comprised of: modal mixture chords, Neapolitan chords, uncommon applied chords, and harmonies resulting from modulations to opposite-mode key areas. Lastly, in an attempt to retain as much musical information as possible, applied chords in the form of leading-tone chords were merged with dominant applied chords (e.g., counts of V/vi and viio/vi were pooled together as dominant function applied chords).

3.3.2 Zeroth-order Probabilities

The zeroth order probabilities (i.e., the overall distribution) of both scale-degrees and harmonies (independent of each other) were extracted from the corpus in order to determine their relative proportions. These are shown in Figure 3.1 and Figure 3.2.

Recall that the corpus was divided into two sub-corpora representing the major mode pieces and the minor mode pieces, respectively. Thus, the percentages and counts shown in the figures below are labeled with regards to their respective corpora totals.

The x-axis lists all possible scale-degrees (Figure 3.1) or harmonies (Figure 3.2) found in the corpus, and the y-axis represents the proportion of the corpus that is made up by each of those scale-degrees or harmonies. As mentioned above, harmonies making up less than one percent of the corpus have been omitted from the figures.

Not surprisingly, neither scale-degree nor harmony follow a simple uniform dis- tribution. Rather, some scale-degrees and harmonies occur more often than others.

Specifically, scale-degrees from the diatonic collection appear more often than chro- matic scale-degrees. Even within the diatonic collection, the first five scale-degrees

(1 to 5) are more common than 6ˆ and 7.ˆ This is finding is consistent with pitch distributions reported by other scholars (e.g. Huron, 2006; Temperley, 2007; Albrecht

& Huron, 2014).

46 Figure 3.1: Zeroth order distribution of scale-degrees in the corpus. Bar graphs show the overall proportions of the scale-degrees as a percentage of the whole that each scale-degree makes up in the major or minor corpus, respectively. The numeric labels above each scale-degree tally the actual number of instances found for that given scale-degree. These tallies demonstrate the under-representation of minor key works in the corpus overall.

47 Figure 3.2: Zeroth order distribution of harmonies in the corpus. These charts show the overall proportions of the harmonies as a percentage of the whole that each harmony makes up in the major or minor corpus, respectively. The numeric labels above each harmony give a count of the actual number of instances found for that given harmony. These counts demonstrate the under-representation of minor key works in the corpus overall.

48 Similarly, the use of non-diatonic harmonies is mostly outweighed by diatonic

harmonies. Interestingly, the use of dominant and tonic chords grossly outweigh

all other harmonies combined. This finding is relatively consistent with the work

of Budge (1943).7 Since the tonic chord supports 1,ˆ 3ˆ and 5,ˆ and the dominant

chord supports 5,ˆ 7,ˆ 2ˆ (and sometimes 4),ˆ one might expect, given this abundance of tonic and dominant chords, that 6ˆ ought to be the least common scale-degree. This does appear to be the case in the minor mode corpus, where [6ˆ and 6ˆ are used less

frequently than nearly all other diatonic scale-degrees. (The least common diatonic

scale-degree is [7,ˆ which is not surprising given the common practice of raising [7ˆ to

7ˆ in the minor mode.) However, this is not the case in the major mode corpus, where

7ˆ is the least used, despite the abundance of dominant chords in the corpus. This suggests that composers may be avoiding scale-degree 7 in the melodic line. However, what appears as avoidance may simply be the result of voice-leading preferences; a possibility that cannot be ruled out with the information at hand. In any case, this

finding is worthy of further study.

3.3.3 First-order Probabilities

The conditional first order probability for each scale-degree was independently tallied. That is, given some scale-degree x, what scale-degree is likely to follow? For the sake of clarity and brevity, and to ensure sufficient statistical power, the remaining analyses only consider the data from the major-mode portion of the corpus.

Figure 3.3 illustrates the percentage of the time that a given scale-degree pro- ceeds to another. The x-axis shows the antecedent scale-degree (i.e., the note of

7Budge found that all inversions of I and V combined made up roughly 45% of her corpus.

49 Figure 3.3: First-order probabilities for scale-degrees. Visual representation of the likelihood of a consequent scale-degree (y-axis), given the antecedent (x-axis). The size of the circle is roughly proportional to its probability (expressed in percent), as shown in the legend. For example, the probability of ]2ˆ moving to 3ˆ is 100%, whereas the probability of 1ˆ moving to 2ˆ is 19%. Clustering of large circles around a line with a slope of 1 arises from high probabilities for note repetitions and step-wise motion, consistent with previous literature.

50 origin) and the y-axis represents the consequent note. Circle sizes represent the like- lihood (expressed in percent) for a given antecedent-consequent pair, with an empty space representing 0% and the largest circle representing 100% likelihood. Note that

Figure 3.3 reveals certain musical features that may not be evident from examining purely vocal corpora. For example, the tendency for note repetition in these melodies is quite high, and to a lesser extent, the tendency for arpeggiation (moving between scale-degrees that are a third apart). A line with a slope of 1 represents note repe- titions, with clustering around that line by 1 representing motion by step. Note also that the scale-degrees with the most predictable melodic continuations are those that music theorists would classify as tendency tones (e.g., scale-degrees ]2, ]4, and ]5 all have 70% or higher probabilities of moving upwards by one ). This is in part because there are relatively few instances of these scale-degrees in the corpus overall, nevertheless, they do have highly predictable behavior. This high predictability can be seen by looking at the vertical spread of each scale-degree. The diatonic scale- degrees show more spread (i.e., more possibilities for melodic continuation) compared with the non-diatonic tones which show less spread. Scale-degree ]2, for instance, shows all 19 occurrences move upwards to scale-degree 3. (Refer to Figure 3.1 for numeric counts of scale-degrees and their distributions). Finally, this analysis is con- sistent with existing literature that shows step-motion and repetition to be more common than motion by leap (Ortmann, 1926; Merriam et al., 1956; Dowling, 1967;

Huron, 2001; Temperley, 2008). The trend for step-wise melodic motion can be seen in Figure 3.3 by the clustering of larger circles around the diagonal. Note also that,

51 aside from the tendency tones which all move upward, there is slightly more cluster- ing below the diagonal than above it, implying that downward step motion is slightly more common than upwards step motion overall.

Notice that the zeroth- and first-order distributions of scale-degrees provide use- ful null distributions against which harmonically informed melodic practice can be contrasted. By comparing the distributions of purely melodic scale-degree continua- tions with those of scale-degrees set in a harmonic context, we can see what effect the harmonic accompaniment has on the likelihood of the ensuing melodic tone. In other words, the zeroth and first order probabilities of melody alone provide a sample of predicted melodic behavior which is then compared, using the melodic-harmonic data, with the observed melodic continuation for each scale-degree in their most common harmonic contexts.

3.3.4 Change in Melodic Probabilities when Harmony is Con- sidered

In order to facilitate interpreting the results, Figure 3.4 shows an enlarged por- tion of Figure 3.5. Here, the predicted melodic continuations for scale-degree 1 are compared with the observed continuations for all scale-degree 1s when supported by tonic harmony (I). The y-axis shows the probability of occurrence that the given scale- degree (in this case, 1)ˆ will proceed to any of the diatonic scale-degrees, listed along the x-axis. The shaded bars represent the predicted, or expected, melodic behavior

(again, from considering only the first-order melodic information), whereas the white bars represent the observed melodic behavior when the given chord (in this case, I) is supporting the antecedent scale-degree.

52 Figure 3.4: Predicted and observed probabilities for 1ˆ in tonic context. Probabilities for both the predicted and observed melodic continuations for scale- degree 1. The predicted probabilities, shown with shaded bars, are taken from the first-order distribution of melody alone. The observed probabilities, shown in white bars, are calculated from the first-order conditional probabilities of scale-degree 1 when it is embedded in a tonic (I) harmonic context. Reported sample size represents the total number of instances for the observed condition. The p value indicates the results of a χ2 test (see footnote 8) comparing the predicted and observed distributions (df = 6).

53 Figure 3.4 shows that there is a statistically significant difference between melodic probability distributions for scale-degree 1 when embedded in a tonic harmony context compared with the corresponding melody-only distribution. Specifically, the melody- only distribution predicts that 1ˆ is most often followed by a repeat of 1,ˆ with 7ˆ and 2ˆ being the second- and third-most likely consequent scale-degrees, respectively. When scale-degree 1 is embedded in a tonic harmony context, however, the probability for a repeat of 1ˆ increases, and there is a decrease in probability for continuing to either

7ˆ or 2.ˆ In fact, the observed distribution shows a roughly equal probability of moving to scale-degrees 7, 5, 3, or 2.

Figure 3.5 shows the predicted and observed continuations for all diatonic scale- degrees in the major mode. On the far left column, Arabic numbers represent the antecedent scale-degree. Each row of graphs examines the behavior of that single antecedent scale-degree in three different harmonic contexts. For instance, the first row of graphs shows the probabilities of the different melodic trajectories for scale- degree 1 when either I, vi, or IV is the supporting harmony. Using this graph, one can see that 1ˆ is more likely to ascend to 2ˆ when it is supported by IV or vi than when it is supported by a tonic harmony (I). A more dramatic example can be seen in the graphs for scale-degree 6, where it appears far less likely to move down to 5ˆ when supported by submediant harmony (vi).

Some unexpected findings come from examining these graphs. For instance, in examining the probable continuations for scale-degree 2, we can see that 2ˆ is most likely to repeat when supported by ii. (This is not surprising given the propensity for ii to move to V, and that 2ˆ is a to both harmonies.) However, we find this same likelihood for 2ˆ to remain stationary when supported by vii. In

54 Figure 3.5: Effect of Harmony on Scale-degree Probability (continued on next page)

55 Figure 3.5: Effect of harmony on scale-degree probability. Each graph shows the observed and predicted distributions of melodic continuations for a given an- tecedent scale-degree in three different harmonic contexts: as the root, 3rd, or 5th of a given harmony. (An exception is scale-degree 4, where V replaces vii.) The y-axes show the probability of occurrence that the given scale-degree will proceed to any of the diatonic scale-degrees, listed along the x-axes. Shaded bars represent the predicted melodic probabilities (using melodic-only data), and white bars rep- resent the observed (from the melodic-harmonic data). A missing graph indicates that there were fewer than 50 observations in the given context, and so a test was not performed. The counts represent the total number of instances for the observed conditions. p values indicate the results of χ2 tests (see footnote 8) comparing the predicted and observed distributions (df = 6). Non-significant values are shown in parentheses.

56 comparison, 2ˆ is most likely to descend to scale-degree 1ˆ when paired with dominant

harmonic support (V). Given that vii and V have the same harmonic function (i.e.,

dominant function), one might expect the distributions for scale-degree 2 supported

by vii or V to look similar, yet they do not. Likewise, the consequent behavior of scale-

degree 7 supported by either vii or V again show differing distributions despite the

fact that these chords share the same harmonic function and tendency for resolution.

However, the observed distributions for 2ˆ and 7ˆ in these dominant function contexts

are dramatically different from each other. In this latter example, 7ˆ over vii has

more than a 60% chance of moving to 1,ˆ the highest probability found in the results, whereas 2ˆ over vii has only about an 8% chance of proceeding to 1.ˆ Recall that all inversion information was discarded from the model, and that seventh chords were collapsed into triads. Although it is possible that this is a result of unequal sample sizes, it suggests that for the melodic succession 7–ˆ 1,ˆ scale-degree 7 may be more likely to be supported by vii (or some inversion of vii or viio7) while for the melodic succession 2–ˆ 1,ˆ scale-degree 2 may be more likely to be supported with V (or some inversion of V or V7).

A series of χ2 tests were performed comparing the first-order melodic distribution of each diatonic scale degree, with the first-order conditional distribution of that scale-degree in three different harmonic contexts.8 The p values are reported on each graph, with non-significant values shown in parentheses. If there were less than 50

8The counts for the melody-only condition far outweighed the melody+harmony condition. In order to compute a χ2 test, the total counts from the expected and observed distributions must match. The solution was to apply the overall proportions from the melody-only (predicted) distri- bution to the total count from the melody+harmony distribution thereby reducing the counts for the melody-only condition while retaining its original proportions. In this way, the expected and observed distribution totals were matched. For example, if the melody-harmony count was 100, and the melody-only condition was 1000, a hypothetical distribution of 100, 200, 300, 400 would be reduced to 10, 20, 30 and 40, respectively.

57 observations for a scale-degree in a given harmonic context, the graph was omitted and no tests were conducted. As evident in Figure 3.5, when the supporting harmony is taken into account, the observed melodic behavior differs significantly from what was predicted in 15 out of 19 cases. Of course, the tests themselves are not particularly important given the global test already conducted. Rather, the graphs were meant to provide an illustration of how the underlying harmony can impact the movement of specific scale-degrees. Since these tests follow the main hypothesis test and are exploratory in nature, the p values reported in Figure 3.5 have not been corrected for multiple tests.

3.4 Conclusions

This chapter presented a study in which melodic trajectories were examined em- pirically through use of a classical music corpus. Compared with other corpus-based approaches to the study of melodic probability, this study looked beyond the surface features of the melody itself and considered the impact of the harmonic accompani- ment on the likely trajectory of a melodic tone. It was found that observing a melodic tone within its harmonic context conveys significantly more information about its continuation than looking at the melody in isolation. This is consistent with musical intuition.

In order to evaluate the main hypothesis, three models were created to test whether harmonic information contributed significantly to the successful prediction of melodic continuations. The first model used only zeroth-order information to predict melodic continuations, the second model used first-order information from melody alone, and the third model combined the first-order melodic information along with

58 information about the harmonic support of the antecedent scale-degree. In each

model the cross-entropy (Temperley, 2007) was calculated. The cross-entropy val-

ues decreased significantly with each subsequent model, with the harmonic model

showing a substantial decrease in cross-entropy. The improvement in moving from

the first-order (melodic) model to the harmonic model is roughly comparable to the

improvement between the zeroth-order model and the first-order model, suggesting

that harmony is playing a sizable role in predicting melodic continuations.

In addition to this global test of the overall effect of harmony on melodic behavior,

further analyses were carried out in order to investigate the effects for each diatonic

scale-degree. Specifically, a distribution of predicted consequent melodic behavior was

calculated for each diatonic scale-degree based on the purely melodic information from

the major-only portion of the corpus. Each of the predicted distributions was then

compared against an observed consequent distribution for each diatonic scale-degree

in three different harmonic contexts, such that the given scale-degree was either the

root, third, or fifth of the supporting harmony. These graphs indicate the specific

changes in probability for each diatonic scale-degree under the different harmonic

. Many of the observed contexts had a small number of occurrences

in the corpus, yet, when tested, many of those contexts produced very small p values, suggesting that the effect size may be quite robust for certain scale-degrees in certain harmonic contexts. Once again, the findings from this study are consistent with the notion that melodic organization is not independent of harmonic support.

Several caveats should be reiterated regarding the methodology for analyzing the corpus. As mentioned in Section 3.1.3, several decisions had to be made about how to best encode the musical analysis, given that these are complex works that contain

59 tonicizations, temporary applied chords, and modulations to secondary keys. The works also contain a fair amount of repetition, which might be expected to reduce the data independence. The methods for analyzing and encoding necessarily involved making decisions such as including or discarding sections that were repeated, or how to encode and represent harmonies that bordered two different key areas. Of course, these decisions are subjective and therefore open to question. However, although some decisions were made in order to minimize the segregation of data and maximize power — such as pooling Roman-numerals regardless of inversion — the majority of decisions were made with the intention of how to best represent the music as one might hear it in real time. It is worth noting that in any large-scale analysis of complex corpora, this decision-making process is inevitable, and some interpretive decisions must be made in order to reach the final stage of testing the hypothesis.

Although music theorists may find the results of this chapter unsurprising, there is a wealth of literature in which conjectures are made about melodic expectations, statistical learning processes, etc., where typically the only musical parameter exam- ined is melody in isolation. Furthermore, some might imagine that all the harmonic

“tendency” information is contained in the melody alone, and therefore that consid- ering the harmonic accompaniment does not add much information. However, the results of this study suggest that is not the case, as measured by the relative changes in log-likelihood from the zeroth, to first, to harmonic model. It is acknowledged that, in terms of modeling, using only one musical parameter (i.e. melody) makes the task at hand substantially easier. Furthermore, there are now several accessible corpora which are monophonic (e.g., Essen Folksong Collection), and therefore offer researchers a convenient sample from which to test a theory or build a model about

60 melody. This chapter hopefully will not only contribute to the existing body of liter- ature on melodic expectation, but also support existing research (e.g. Aarden, 2003;

Pearce, 2005; Albrecht & Huron, 2014) promoting the point of view that melodic distributions do not come in “one size fits all”, and that factors such as rhythmic duration, metric position, phrase position, and even historical period, all contribute information to melodic expectancy. Although often challenging — as demonstrated by my own collapsing of information in the corpus — the more parameters that one can include, the more accurate those distributions will be.

3.5 Discussion

Given these findings of the effect of harmonic context on the organization of melody,9 a logical next step would be to consider the relative weightings of factors such as bassline (or chordal inversion) and voice-leading. Certainly voice-leading and chord doubling play an important role in governing the behavior of melodic tone suc- cession, and it would be useful to know how important they are in terms of predicting melodic successions. For instance, perhaps the finding that scale-degree 7 appears to be avoided in the melodic line partly arises from the voice-leading principle which warns against the doubling of tendency tones. Or, take as another example the find- ing that scale-degree 6 is far less likely to move to scale-degree 5 when supported by submediant harmony: If submediant harmony frequently moves to dominant har- mony, the avoidance of 5ˆ as a consequent tone in this context may arise from an

9Note that this was a correlational study, and as such it might be thought that the melody could be equally influencing the harmony. However, recall that antecedent information was used to predict consequent information. As such, while an influence of the consequent on the antecedent is still possible, it is less plausible, since we typically assume that composers don’t write melodies in reverse.

61 avoidance of parallel fifths. Unfortunately, in this study key pieces of information – such as that of chordal inversion – were discarded in the process of simplifying the mu- sical texture in order to examine the main hypothesis. As such, questions pertaining to the importance of basslines and voice-leading cannot be tested here. Neverthe- less, the role of voice-leading implied by the findings mentioned above warrant future study.

Having found these statistical effects, it would be appropriate to test the perceptual effects of harmony on melodic continuations. As mentioned, the differences found in the exploratory analysis suggest that there may be moderate effect sizes for some scale-degrees in certain harmonic contexts. If this is true, and the sample used in the corpus is sufficiently representative of classical music, then perceptual effects would likely be quantifiable. Given the evidence in support of statistical learning, these findings suggest that listeners may be sensitive to the influence of harmony on melodic continuations. That is, if a given scale degree regularly and consistently tends to move (in the context of real music) in a particular way when framed with harmonic support x, but not y, then a classical listeners expectations could be tested, for example, with the use of reaction-time studies. More importantly for the present study, if a particular scale-degree appears to carry different probabilities for resolution depending on the harmonic framework, then a related question arises, which is: can a melodic tone elicit a different feel or qualia in a listener when it is framed in different harmonic contexts? We turn next to this question in Chapter 4

62 Chapter 4: A Perceptual Study of Scale-degree in Context

Abstract

A perceptual study investigated the ability of scale-degrees to evoke qualia, and the impact of harmonic context in shaping a scale-degree’s qualia. In addition, the following questions were addressed: What role does musical training have in shaping qualia? Are listeners consistent in their descrip- tions? Are experiences similar across participants, or are they individual and subjective? Listeners with or without music-theoretic training were asked to rate the qualia of scale-degrees following various chord progres- sions, each ending with a different final harmony. Scale-degrees were found to exhibit relatively consistent musical qualia; however, the local chord context was found to significantly influence qualia ratings. In general, both groups of listeners were found to be fairly consistent in their ratings of scale-degree qualia; however, musician listeners were more consistent than non-musician listeners. Finally, a subset of the musical qualia rat- ings were compared against Krumhansl and Kessler’s (1982) scale-degree “profiles”. While profiles created from the present data, overall, were cor- related with the K&K profiles, their claim that tonal stability accounts for the high ratings ascribed to tonic triad members was found to be better explained by the effect of the local chord context.

4.1 Introduction

Qualia are, by definition, subjective. Yet, it seems feasible that a majority of individuals might describe the qualia of a sunset, or of eating a pear, in similar ways.

Perhaps the qualia of scale-degrees, then, might similarly be described using common language across individuals?

63 Huron (2006) describes an informal study in which he asked ten experienced mu- sicians (all professors and graduate students in a music department) to imagine each scale-degree, and to “free associate” words or phrases that they felt described that par- ticular scale-degree. He then analyzed the responses by grouping words and phrases that were alike in meaning, and found that there was a clustering of similar responses according to scale-degree. This suggests that experienced musicians not only have the ability to hear qualia, but also that their personal experiences of the qualia of scale-degree appear to be similar.

When teaching scale-degree (or interval) identification in music, it seems like what we attempt to do as teachers is to bring about an awareness in our students of these shared attributes of experience. Identifying scale-degrees, however, comes more eas- ily for some than for others. Perhaps students that have difficulties with scale-degree identification tend to confuse the qualia of scale-degrees? Or perhaps scale-degrees with similar functions (e.g., ]4ˆ and 7)ˆ tend to elicit similar qualia? However, there is a possibility that these labels that we attach to certain scale-degrees do not arise or- ganically from experience, but rather — perhaps from generations of teachers passing down learned vocabulary — scale-degrees become imbued with the representations we attach to them. Perhaps the qualia are not useful for identification, and instead we come to identify scale-degrees by other means and then, once identified, gain access to all the associated features we have learned.

In this chapter, I aim to investigate, firstly, whether everyone is capable of expe- riencing scale-degree qualia. It may be that certain listeners are incapable of hearing in this way. For those that can distinguish the qualia of scale-degrees, do those lis- teners do so in consistently similar ways? And what about the role of learning and

64 experience? Are the participants in Huron’s study merely responding according to learned associations? Can individuals without any music-theoretic training distin- guish the qualia of scale-degrees in consistent ways? And if so, are their responses similar to those who do have musical training? Finally, do scale-degree qualia remain stable within real musical contexts, where typically — at least in Western music — a melody is most commonly embedded in a harmonic context? In addition to in- vestigating the exploratory questions listed above, this chapter will also evaluate a formal hypothesis: that changes to the immediate harmonic context will modify the perceived qualia of scale-degrees.

In a series of well-known experiments, Carol Krumhansl, along with Edward

Kessler and , tested participants’ responses to perceived “goodness of fit” of a scale-degree after a given harmonic progression in an attempt to investi- gate the properties of scale-degrees as they relate to the overall key. (Krumhansl &

Shepard, 1979; Krumhansl & Kessler, 1982; Krumhansl, 1990) (This work directly led to the use of the famous Krumhansl and Kessler “key profiles” which are still widely in use today.) Krumhansl and Kessler claim that their goodness of fit ratings “con-

firm” a tonal hierarchy theory, in which they propose that what listeners are really responding to are the varying levels of tonal “stability” of the scale-degrees within a given key, which fit into three categories: tonic chord, remaining diatonic tones, and remaining chromatic tones. However, the “various harmonic contexts” in the experi- ments that led to the key-profiles were all variations of a predominant-dominant-tonic progression, and therefore all ended with the tonic chord. As discussed in Chapter 3, a listener’s expectations for a scale-degree’s resolution (or progression), may be tied to the harmonic context. Therefore, in order to ensure that their key profiles truly

65 represent effects of key, it would be prudent to examine the “goodness of fit” using a greater variety of chord progressions, and specifically, using chord progressions that end on a chord other than tonic. Accordingly, this chapter will also examine rat- ings related to “goodness of fit” in various harmonic contexts, and compare them to

Krumhansl and Kessler’s findings.

In sum, this chapter ultimately has three goals. The first is to re-investigate

Huron’s (2006) findings that musicians generally agree on qualitative terms for the various scale-degrees, and then to explore whether listeners without musical training describe the qualia of scale-degrees using similar terms, or whether they can do the task at all, given that they cannot identify scale-degrees. If descriptions are relatively consistent across all levels of experience, it would suggest that scale-degree qualia are not dependent on training. The second goal is to test the role of harmony in the perception of scale-degree qualia. If scale-degree qualia are found to be altered by the harmonic context, then this finding would carry implications for models of melodic expectation. In addition, it might suggest a revision to current models of aural skills pedagogy. Lastly, “goodness of fit” judgments for scale-degrees in different harmonic contexts will be compared with the findings from Krumhansl and Kessler to re-examine the relation of scale-degree to chord and chord to key.

4.2 Method

Prior to the main experiment, an informal pilot study was conducted with 10 participants of mixed musical backgrounds. Some were undergraduate students pur- suing a music degree, and others were persons with little-to-no musical experience and no training in music theory or aural skills. The primary purpose of the pilot

66 study was to figure out the descriptive terms that should be used in the main ex- periment, and to evaluate whether individuals without musical experience would be able to perform the task. Therefore, an approach similar to that of Huron (2006) was taken, where participants were asked to free-associate words with various scale- degrees. In contrast to Huron’s approach, however, the scale-degrees were actually heard (as opposed to imagined) with key contexts first established by performing simple scales or short chord progressions at the keyboard. After setting up a key context, a single scale-degree was played, after which the participant would respond with their free-associated terms which were then written down by the experimenter.

Given the goals of the present research, and that perceived scale-degree qualia may be a highly individual experience, an appropriate design for the main experiment might take the large list of adjectives given by these participants, and ask listeners in the main experiment to check all that apply. However, it would be useful to establish, at first, whether scale-degree qualia are experienced in any kind of similar ways. Furthermore, since the task is rather abstract (according to feedback from the participants in the pilot study), responses are likely to contain a large amount of variation, and the more terms given, the more variation one is likely to find. Although there may be subtle differences between words like “jarring” and “harsh”, or “gloomy” and “sad”, a primary goal of this research is to establish whether there are general similarities in descriptions across participants, especially given differences in musical training. Therefore, the participants’ responses from the pilot study were subjected to content analysis, in an attempt to come up with a representative set of descriptive words that would be able to be rated by participants in the main experiment.

67 Thus, the complete set of vocabulary obtained in the pilot study was grouped by similarity into larger categories, and then the categories were given names. For ex- ample, the following words were grouped into a category designated as relating to the concepts of “strength and/or stability”: confident, centered, weak, heavy, strong, sta- ble, light, unsure, unsteady, airy, tipsy, fragile, unstable. Remarkably, despite a few unusual and unexpected responses from the non-musician group, the results of the pi- lot study suggest that even when participants have no musical training, listeners came up with similar descriptive terms that could be put into similar categories as found in Huron (2006). The content analysis initially resulted in five categories: movement, strength/stability, emotional valence (primarily happy and sad), lightness/darkness, and tense/relaxed. After comparing these results with Huron’s categories (which were: certainty, tendency, completion, mobility, stability, power, and emotional va- lence), it was acknowledged that the “movement” category held the largest number of terms, and therefore could easily be divided into the subcategories: tendency and completion.

Thus, since the majority of descriptions given by participants could fit within these categories, it was decided that these seven category names themselves would be the terms used as dependent variables in the main experiment. Notice that many of the terms given by the participants in the pilot study, as well as the resulting vocabulary that make up the dependent variables, can be considered metaphorical. Some scholars

(especially those who believe in the strict definitions of qualia discussed in Chapter 2) might consider the reliance on metaphor in describing aspects of music as providing a supporting argument for the claim that musical qualia are ineffable. However, I have already noted that the language gathered and provided in these experiments is

68 in no way meant to directly represent their qualia. Rather the vocabulary used (and collected) are thought of as aspects contributing to the overall qualia, and are simply used as a way of measuring converging evidence. In addition, metaphors of motion, energy, brightness or darkness, tension or relaxation have permeated pedagogical language for centuries (as evidenced by historical pedagogical treatises,) and therefore appear perfectly appropriate to use as descriptive elements of music in these studies.

4.2.1 Participants

Sixty-five participants were recruited for this study. Of the total participants,

43 were second-year undergraduate music students, and 22 were graduate and un- dergraduate students from various other disciplines. All participants were given the

Ollen Musical Sophistication Index (OMSI), primarily to distinguish those who had received formal music-theoretic training from those who had not. For the purposes of this study, only two questions from the OMSI questionnaire were considered: one which asks them to self-identify as either a non-musician, or a musician (of varying degrees of competency); and another which inquires about how many years (if any) of college-level coursework they have completed. The main factor for including persons in the “non-musician” group was that they had not been exposed to the language and terminology commonly used in music theory pedagogy, and that they would not have developed the skill of identifying scale-degrees by name. Thus, in addition, par- ticipants were informally asked during pre- and post-experiment interviews whether they had ever received any music theory or aural skills training of any kind, and whether they believed they possessed . Based on their responses to the above questions, participants were then divided into two groups: “musicians” and

69 “non-musicians”. Note that a few individuals in the “non-musician” group did claim to have vocal or instrumental training, but did not consider themselves musicians, nor had they had any experience with music theory or aural training. Of the total participants, five identified themselves as having perfect pitch. These participants’ data were first examined in comparison with the musician group, to see if, as a group, their responses were markedly different from the rest of the musician group. A pri- ori, it was determined that if the absolute-pitch group responded in a significantly different way, they would be considered separately from the other two groups. If not, their responses would be included in the musician group. In the end, two participants were excluded from the experiment due to technical malfunctions. Of the remaining participants, 41 music students fell in the “musician” group and the other 22 students fell in the category of the “non-musician” group.

4.2.2 Stimuli

The study was broken into two blocks of 35 trials each. The stimuli followed a context+probe design. In each block, participants were presented with a key- defining progression in a major key. All progressions were four chords in length, began with the tonic triad, implied the same key, but each ended with a different harmony. The possible chord progressions (contexts) consisted of: I–IV–V–I, I–V–

I–IV, I–IV–V–vi, I–IV–I–V, or I–V–I–ii. Thus, each progression contained the tonic and dominant chord, but did not necessarily end on the tonic harmony. Each trial consisted of one such key defining progression, immediately followed by a single probe tone, which could be any one of the seven diatonic (major) scale-degrees, or one of three -degrees: ]1,ˆ ]4,ˆ or ]5.ˆ Note that since these chromatic notes

70 are being presented aurally, they may equally be interpreted as their

equivalents. However, for the sake of clarity these chromatic scale-degrees will only

be given a single representation (as sharp scale-degrees) in all figures and text.

It was decided that the sounds should be as realistic as possible. As such, all

stimuli were recorded using a Yamaha P-90 keyboard, Sonar sound editing software,

and VST instruments by Roland’s Sonic Cell (VST used was the “ultimate grand”

003). The stimuli were not quantized, however they were recorded with the use of a metronome. Although this meant that the inter-onset intervals and coordination of voice onsets for chords would be inexact, it was decided that any effects resulting from these liberties would likely be negligible for this particular experiment; and given the already abstract nature of the task, would in fact be preferred over more highly constrained and less realistic-sounding stimuli. However, the velocity of the stimuli were equalized such that all chord progressions and scale-degrees would have equal velocity (and so roughly equivalent apparent loudness). Chords were voiced in a traditional keyboard spacing, with the left hand playing a single and the right hand playing a close-position triad approximately one octave above the left.

In order to prevent qualia responses from being biased towards describing pitch height, the original stimuli — which were all recorded in the key of C — were randomly transposed to a new key after every 10 trials. This was preferred over changing key after every trial, since it was informally noted in the pilot study that participants

(especially those in the non-musician group) found the task more difficult immediately following a key change. Thus, this design introduced a randomization of keys while still allowing participants enough time in one key to familiarize themselves with it before moving on to a new one. Key presentation order, and the order of stimuli

71 within each key group, were randomly assigned for each participant. The scale- degrees were all transposed along with the in order to maintain equivalent spacing from the chord progressions across transposition levels. Scale- degrees (probes) were always at least a higher than the upper-most tone of the chord progression in order to minimize response biases due to pitch proximity

(see Krumhansl, 1982). Participants heard each scale-degree (total 10) in each key context (total 5) once each for a total of 50 trials, plus an additional 20 trials which were repeated and arranged randomly throughout the two blocks in order to gather a measure of within-subject variability.

4.2.3 Procedure

As explained in Section 4.2, it was decided that the best method of approach was to use a condensed version of the most commonly used adjectives to describe the qualia of scale-degrees, and have participants rate the appropriateness of the terms. Limiting the variables also makes the somewhat daunting task a bit easier for those with no musical training. For statistical purposes, a rating task simplifies the analysis by having numerical values on a continuous scale, rather than counts of categorical variables. In addition, using a rating scale rather than yes/no categories allows possible subtle differences -degrees to come to the fore that might otherwise be described with similar terms. That is, using a checkbox approach, one might find that 1ˆ and 3ˆ both tend to be described as “stable”, “relaxed”,“solid”, etc.

However, by using a rating scale it becomes possible to discern whether, for example,

1ˆ might be more “stable”, “relaxed”,“solid”, etc., than 3.ˆ

72 Figure 4.1: Image of digital interface used in experiment. Rating terms are listed in opposite-facing pairs with a slider in-between. Participants were instructed to move the sliders to indicate the degree to which they believed a particular term described the qualia of the probe tone.

Participants were asked to rate the qualia of each scale-degree using a digital inter- face with sliders that could be dragged to one side or the other indicating that a given scale-degree was heard as either more x or more y, where x and y represent opposite qualia terms (e.g., happy or sad). As can be seen in Figure 4.1, the interface contained

14 terms, arranged in opposite-facing pairs, and participants were instructed to move sliders to indicate the degree to which they believed each particular term described the qualia of the probe tone. Participants were also given the option to “opt-out” of using any particular rating scale on any given trial by checking “does not apply / I don’t know” if they felt that it was not a useful or applicable descriptive term.

73 Although it is common in experimental designs to list terms on a unidirectional scale (e.g., “tense — not tense”), this binary pairing was preferred because it intro- duced more descriptive terms (and therefore, theoretically, more options). That is, it might avoid participants thinking of scale-degrees along only one parameter (e.g.,

“tense”). Also, by presenting the terms in this binary manner, it was believed that it would simplify an already abstract and difficult task, and make the midway point on the sliders more likely to represent “complete neutrality”.

In between blocks participants were given a break during which they completed the

OMSI questionnaire. The experiment was conducted in a sound-attenuated booth, and stimuli presented through stereo headphones (Sennheiser, HD 280 pro), with the volume adjusted to a comfortable listening level. Prior to beginning the experiment, participants were shown the list of terms to be used in the experiment and were given basic explanations of how they were meant to be interpreted.10 In addition, it was explained that the chord progressions were only setting up the context for the probe tone and that they were to rate the note that followed and not the progressions themselves. The instructions to participants read as follows:

Please rate, using the sliders provided, how much each term describes the given note. If you feel that a term is not useful for describing the note, or you are unsure, please check “does not apply / I don’t know”.

10For example, after a practice round, the first participant asked if she should be rating every note as “incomplete” since in real music one never encounters a single note by itself! After this it was determined that in order for the results to be meaningful, participants would have to be using the sliders in roughly equivalent ways. Therefore, a basic explanation for how to interpret each term was given at the beginning of the experiment.

74 4.3 Results

Recall that the main hypothesis predicts that the harmonic context will influence scale-degree qualia ratings. The remaining exploratory questions asked: How con- sistent are individuals in their qualia judgments? And, do those with music theory training perceive scale-degree qualia in similar ways as those that do not? Mixed effects regression was used to evaluate the main hypothesis and exploratory ques- tions. A separate test was conducted for each dependent variable (i.e., qualia rating or slider). The main factors were: the level of training (musician/non-musician),11 scale-degree (10 possibilities), and progression (5 possibilities). The main result of interest was, however, that of the interaction between scale-degree and progression.

Indeed, for each dependent variable, a significant interaction was found for scale- degree and progression (all p < .01). Even with a Bonferroni correction for multiple comparisons, all p values remain significant. In other words, this is consistent with the notion that harmonic context can alter a scale-degree’s qualia. While every compar- ison does not necessarily yield a meaningful result, the complete statistics are listed for each dependent variable in Table 4.1.

Looking at Table 4.1 one can see a clear main effect for scale-degree across all dependent variables. This can be interpreted to mean that despite all differences in training, and despite the changes in harmonic context, participants tended to rate at least some scale-degrees in consistently different ways from other scale-degrees.

That is to say, while scale-degree qualia can clearly be mediated by the harmonic context, they appear to elicit a certain degree of unique qualia in and of themselves.

11Note that responses from individuals with perfect pitch were not found to be different from the remaining musicians.

75 Table 4.1: Result statistics by dependent variable. Columns list the dependent variables, or sliders. Rows list the main effects and various interactions from the regression analysis. Each cell reports a p value category (i.e., p < .001 ∗ ∗∗, p < .01 ∗ ∗, p < .05∗, n.s.)

Strong/ Complete/ Bright/ Expected/ Happy/ Tense/ Move/ Weak Incomplete Dark Surprising Sad Relaxed Stay

Scale-degree *** *** *** *** *** *** *** Progression n.s. n.s. *** n.s. *** n.s. n.s. Training n.s. * n.s. n.s. n.s. n.s. *** S-deg*Prog *** *** *** *** ** *** *** S-deg*Train *** *** *** *** *** *** *** Train*Prog n.s. n.s. n.s. n.s. n.s. * * S-deg*Prog*Train n.s. n.s. n.s. * n.s. n.s. n.s.

All tests of the main effect for progression were not significant except for in two

cases: Bright/Dark and Happy/Sad. This means that, for at least one of the five

progressions, the progression itself was having the effect of causing participants to

alter their Bright/Dark and Happy/Sad ratings of the probe tones in a consistent

way. Indeed, as will become evident from examining Figure 4.2, when the progression

ends with a minor chord (vi or ii), it tends to cause more scale-degrees (in particular,

those that are chord tones) to be rated as both sadder and darker.

Recall also that participants were not given a forced-task, but rather could “opt-

out” of using any particular rating scale if they felt that it did not apply to the

particular scale-degree in question. A simple tally by dependent variable (shown in

Table 4.2) shows that Happy/Sad, and, to a lesser degree, Dark/Bright received the

highest number of opt-outs. In post-experiment interviews it was commonly reported

that chromatic tones proved difficult to rate as happy or sad. It should be noted that

76 Table 4.2: Tally of “opt-outs” by dependent variable. Values indicate the number of times participants opted-out of rating any particular scale-degree on any of the dependent variables.

Strong/ Complete/ Bright/ Expected/ Happy/ Tense/ Move/ Weak Incomplete Dark Surprising Sad Relaxed Stay 223 108 314 215 544 230 279

Happy/Sad and Dark/Bright are the two dependent variables that are not associated with tendency, strength, or melodic completion. It should also be noted that in post-experiment interviews, participants regularly commented on how they tended to associated notes that were higher in pitch with “bright”, and notes lower in pitch with “dark” (recall that the stimuli were transposed so that the various scale-degrees appeared at a number of pitch heights.) In Chapter 2 the argument was forwarded that scale-degree qualia may largely be attributed to statistical regularities associated with tendency or motion. However, there appear to be some qualitative aspects that are not learned statistically. Specifically, the association of both happy/sad and bright/dark with high/low (respectively) have been associated with ethological cues

(Kraepelin, 1899; Morton, 1977; Huron & Davis, 2013; Huron, 2015). Studies have suggested that the association with bright/dark and high/low (respectively) may be an intrinsic characteristic of perception (e.g. Marks et al., 1987).

Two oddballs were also found in the main effects for training. While it is possible that the two sliders where this effect was found to be significant (Complete/Incomplete and Move/Stay) may simply reflect an oddity in the data, it appears that those in the non-musician group were using these sliders in different ways than those in the

77 musician group. Interestingly, of the correlations between all dependent variables,

Move/Stay and Incomplete/Complete had the strongest correlation. Said another way, it seems that what a musician deems to be musically “incomplete” may not be heard as such by a non-musician. Somewhat surprisingly, in addition to the significant interaction for scale-degree and harmony, a significant interaction was also found for scale-degree and training across all dependent variables. This finding is slightly more difficult to interpret, but suggests that at least some scale-degrees are being rated as qualitatively different for the musician group compared to non-musician group.

Recall that these tests merely compare means. Given the task that was required of participants, a large amount of variation in the responses is expected. In other words, what is of interest is not necessarily the means of the two groups, but how much their responses vary. For instance, perhaps individuals in the musician group tend to be in more agreement with each other compared to those in the non-musician group, in which case the main difference would be the spread in variation (or standard deviation). Thus the remaining graphs and figures attempt to re-examine the data in order to clarify and expand upon the results found from the statistical tests.

Figure 4.2 highlights differences in perceived scale-degree qualia across the various harmonic contexts for each slider (with one graph per slider). Each circle represents the average qualia rating for that particular scale-degree+harmony pair. Shading is used to designate which side of the slider (left or right) the average value falls on, and the size of the circle represents the strength of the rating. The first thing to no- tice is that there is a clear tendency for some scale-degrees to have somewhat similar qualia regardless of harmonic context. This can be seen from looking for consistent

78 horizontal patterns. For instance, there are clear “stripes” for chromatic tones com- pared with diatonic ones; and scale-degree 1 tends to be rated as the more positive of the two options across all dependent variables (i.e., more happy, strong, bright, etc.), compared with all other scale-degrees which tend to have a bit more variation.

These graphs also illuminate the main effect for chord progression, mentioned earlier, where the progressions ending with minor chords (vi and ii) tend to have the effect of making the ensuing scale-degree probe sound darker and sadder. The most interesting effect is the change in scale-degree ratings as the harmonic context changes. For many scale-degrees, changes in color and/or circle size can be seen in the graphs as one moves laterally through the harmonic changes. This tendency is especially marked, perhaps unsurprisingly, for scale-degrees that are chord tones of the final harmony. For example, scale-degree 4 shows many changes in size and color across the levels of harmony for almost all graphs; and scale-degree

6 shows clear changes in qualia when it becomes a chord tone in the graphs for

Surprising/Expected, Dark/Bright, Weak/Strong, and Tense/Relaxed.

One caveat to mention is that probes in this experiment were presented after the harmonies, not concurrently. This was done for two reasons: first, as will be explained below, one goal of this experiment was to replicate a component of Krumhansl &

Kessler (1982), which used the harmony-followed-by-probe design, in which case it was preferred to use as similar a methodology as possible; second, it was preferred to

first investigate the effect of harmony on scale-degree qualia in isolation (rather than use complete melodies) and if a single scale-degree were presented simultaneously with the last chord of the progression, it would be impossible to know whether the evoked qualia was due to the chord, the scale-degree, or the extended harmony that

79 Figure 4.2: Scale-degree Qualia Ratings (continued on next page)

80 Figure 4.2: Scale-degree qualia ratings. Qualia ratings for each scale-degree across harmonic contexts. There is one graph per slider. On the x-axis, each Roman numeral represents the final chord of each possible progression. Scale-degrees are listed on the y-axis. Each circle represents the average rating across all participants for that particular scale-degree following the given harmonic context. Color codings indicate which side of the slider the value falls on. The larger the circle, the more extreme the average rating. 81 would result from the combination of the two. Therefore, the present design was adopted in an attempt to isolate the individual scale-degrees, and it appears from these results that the final harmony was indeed still capable of having an effect.

Future research, however, will be needed in order to address the effects of concurrent sounding harmonies and scale-degrees.

Since the circles in the graphs from Figure 4.2 represent the average rating across all participants, it will be useful to know how much the participants are in agreement with each other. Of course, this is also one of the main exploratory questions: How similar is qualia perceived across individuals, and how much does it depend on one’s musical training?

Figure 4.3 illustrates how well participants were in agreement with each other, as well as how consistent they were when rating repeated stimuli. One way of evaluating agreement is to examine correlation values. Accordingly, a string of values was as- signed to each participant representing their responses for all 70 trials across all seven dependent measures. This string of values was then compared with every other par- ticipant’s values. In addition, since each participant heard a subset of the stimuli (20) twice, their first responses to those stimuli were compared to their second responses to those same 20 stimuli to evaluate consistency. For the between-subject compar- isons, since there were 63 participants in total, and each participant’s responses were compared with every other participant’s, there are a total of 3,906 correlation values.

Figure 4.3b shows the distribution of these correlations (Pearson’s r values) centered around a mean of .18. However, in order to evaluate these correlations, they must be compared against something. Thus a randomized set of correlations was created by comparing each participant’s string of responses to randomized versions of the

82 other participant’s responses. Since there should be no relationship between random

pairs of responses, one would expect the distribution of these random correlations to

be centered around zero. Indeed, randomized correlations of all responses produces

the distribution as seen in Figure 4.3a, which has a mean of .03 and 95% of the

values falling below .15.12 In comparison, then, while the average between-subjects

correlation of .18 may not appear very strong, when those ratings are scrambled and

re-assessed for the amount of correlation, less than 5% of the r values reach .18. This can be taken to mean that, overall, participants were in moderate agreement with each other in their qualia ratings. What about the within-subject agreement? How consistent were participants on repeat qualia judgments? Figures 4.3c and 4.3d show the actual within-subject correlations (one value per subject) for the 20 repeated stimuli across all dependent variables, compared against correlations made from a randomized version of data. In this case, the within-subject correlations have a mean of .39, whereas the correlations from the randomized set of data show 95% of values falling under .15.

Thus, participants are showing a good level of consistency in their qualia judg- ments. Recall that these graphs so far have included the results from all partici- pants, regardless of musical training. It might be expected, however, that for the non-musician group — given the abstract nature of the task — their level of con- sistency might be worse than that of the musician group. Thus it is appropriate to examine these correlation values by splitting the participants according to their level of training.

12Fisher’s z transformation was used to convert the correlation values to an interval scale (in order to compute an average) before being converted back to r values. See Meyers et al. (2013, p. 298).

83 (a) Random Between (b) Between-subject

(c) Random Within (d) Within-subject

Figure 4.3: Consistency between and across participants Amount of intra- and inter-subject agreement as measured by correlation. Pearson’s r values are listed along the x-axes. (a) A distribution of correlation values obtained by randomizing between-subject ratings, with 95% of the values falling under r = 0.15. (b) The distribution of between-subject correlations (each participant’s response correlated with every other participant’s response) with a mean of r = .16. (c) A distribution of correlation values obtained by randomizing within-subject ratings, with 95% of the values falling under r = .15. (d) The distribution of within-subject correlations (each participant’s first response correlated with their second response to the same stimuli) with a mean of r = .38.

84 Figure 4.4 shows the consistency of each participant, divided by level of training.

Each circle represents the single correlation value for each participant, determined by comparing their first and second responses to the repeated set of 20 stimuli. The horizontal line drawn through each graph represents the group’s mean. From this, it can be seen that the musician group are indeed, on average, more consistent than the non-musician group. However, there are also more participants in the musician group than the non-musician group, making it a slightly unfair comparison. In addition, if one looks at the scatter of individual circles, it is clear that there are individuals in the non-musician group who are actually more consistent than individuals in the musician group. Also to be noted are the few individuals whose correlation values hover around zero (note there an equal number from both the musician and non-musician group).

As pointed out in Chapter 2, it is possible that some individuals are not capable of hearing qualia in this way. Although it is a possibility that participants were simply tired or unmotivated, the results as presented in these graphs suggest that there may indeed be a minority of listeners who do not hear scale-degree qualia in a consistent manner.

Recall that some of the rating scales used in the experiment related to the concepts of resolution and closure (i.e., Surprising/Expected, Move/Stay, Complete/Incomplete).

These ratings can be compared to the “goodness of fit” ratings (henceforth “key pro-

files”) from Krumhansl and Kessler (1982). Since Krumhansl and Kessler only used experienced musicians in their study, in order to make a fair comparison with the present study, only the results from the musician group will be considered. The K&K key profiles were obtained by taking the average “goodness of fit” rating (across all

85 Figure 4.4: Intra-subject correlations. Within-Subject consistency across groups. Each circle represents a single correlation value for an individual participant, based on the comparison of their first and second responses to the 20 repeated stimuli. A horizontal line is drawn through the mean for each group.

participants) of a probe tone (one of the 12 possible chromatic scale-degrees) fol- lowing a predominant-dominant-tonic (PDT) progression in a major or minor key.

Krumhansl and Kessler do not provide the unique profiles for the individual progres- sions, they only average them together, generating the profiles as shown in Figure 4.5.

In order to compare findings, a similar set of scale-degree “profiles” were created for the three rating scales related to resolution and closure: Complete/Incomplete,

Wants to Move/Wants to Stay, and Surprising/Expected. Since the present study only used 10 scale-degrees, the extra two scale-degrees from the K&K profiles were removed so that the number of scale-degrees in all profiles would match. In Fig- ure 4.6, the Krumhansl and Kessler major key profile (4.6a is compared with profiles created using data from the present study, with a separate profile (see legend) for

86 Figure 4.5: Key profiles from Krumhansl & Kessler (1982).

each harmonic context.13 Similar to Figure 4.2, each point represents the average rating across participants (in this case, only musician participants) for that scale- degree+harmony pair. Each of these profiles were then correlated with Krumhansl and Kessler’s original profile.

As can be seen from Figure 4.6, the current profiles are all relatively well matched to the contour of the K&K profile, with the same dips for chromatic tones, and peaks for diatonic tones. However, the strongest correlations tend to be found between the K&K profile and the profiles for tonic context (see Table 4.3 for complete list of

Pearson’s r values). As already noted, when other chord contexts precede the scale- degree probe, the ratings for the scale-degrees are altered, with the most marked effects occurring for chord-tones. This can be most easily seen in Figure 4.6 by looking at the ratings for scale-degrees 2,ˆ 4,ˆ and 6,ˆ which tend to “pop out” of the texture when those notes become chord tones. This “chord-tone” effect caused the correlations between the K&K profile and the non-tonic context profiles to decrease.

13Although the profile in comparison is technically that of , Krumhansl and Kessler found that the various major keys show profiles — when arranged to start on tonic – that are near-identical.

87 (a) K&K Major Key Profile (b) Complete/Incomplete

(c) Move/Stay (d) Surprising/Expected

Figure 4.6: Present data compared to Krumhansl & Kessler’s (1982) key profiles. a) Krumhansl & Kessler’s major key profile with two chromatic scale degrees removed (to be consistent with present data). b) Ratings across all chord contexts for “Complete/Incomplete.” c) Ratings across all chord contexts for “Move/Stay.” d) Ratings across all chord contexts for “Surprising/Expected.” To be consistent with Krumhansl & Kessler, the data reported in these graphs shows only the responses from the “musician” group.

88 Table 4.3: Correlations with the K&K profile. Table of Pearson’s r values showing correlations between the K&K profile and each of the profiles from the current data (shown in Figure 4.6).

I V vi IV ii Expected/Surprising .96 .92 .84 .84 .86 Stay/Move .88 .84 .86 .92 .85 Complete/Incomplete .94 .90 .88 .94 .92

Recall that the progressions used by Krumhansl and Kessler all end with tonic,

while in the present experiment they all imply the same key but end on different

harmonies. Since in this experiment the interaction of harmony and scale-degree

was significant, it implies that Krumhansl and Kessler’s key profiles might be better

explained as “chord profiles” rather than “key profiles”. Said another way, while

possible that they are testing tonal stability, as they claim, it appears at least a

strong possibility that they may have largely been testing which scale-degrees are a

good fit with a tonic triad context.

Krumhansl and Kessler argue that their profiles reflect scale-degree relations

within a key, since the ratings for each scale-degree consistently match a supposed

scale-degree “hierarchy”, with tonic being the highest rated, followed by dominant,

followed by the mediant. But in the present experiment, the only profile that exactly

matched the contours in the K&K profile (and therefore their hierarchical claim) was

that of tonic context. This suggests that the K&K profile may not represent scale-

degree relations to a key, but merely that tonic triad members are a good fit with a

tonic triad context.

89 Aarden (2003) proposed that “the probe-tone method (from which the key pro-

files were derived) encourages listeners to hear the tone being tested as occurring in a phrase-final position ... since musical phrases typically end with harmonic ‘ca- dences’ or with stereotyped melodic figures.” Indeed, Aarden found that the K&K profile closely matched the zeroth-order distribution of scale-degrees at phrase-final positions. Obviously, the probe-tone method was also used here, implying that lis- teners may have heard the probes as an “ending”. However, unlike Krumhansl and

Kessler, the present experiment consisted of lead-in progressions that ended with var- ious harmonies (not only tonic), and as such, while possibly heard as a “phrase final position”, some of these contexts would be unusual at a cadence, providing at least some minor discouragement to listeners for hearing the final note as an “ending”. For instance, it would be very uncommon for a phrase ending to end with the chord. Moreover, the increase in chord tones rated as “expected” in some cases is not consistent with what would be expected for melodic tonal completion. For example,

4ˆ and 6ˆ are two of the least likely scale-degrees to end a phrase (Aarden, 2003), yet in the present experiment, when followed by IV or vi, the expectancy and completeness ratings increased for those scale-degrees. Since phrases are simply more likely to end on tonic (I), and the K&K profiles were created entirely from ratings following a tonic harmony, it makes sense that the K&K profile more closely resemble the distribution of phrase-final scale-degrees. Thus, while Aarden’s proposal that the K&K profiles are more apt to be representative of phrase-final positions is appropriate, the findings from the present experiment are at least also consistent with the interpretation that participants may be largely responding to effects from local chord context.

90 First, it should be noted that, in fact, part of the hierarchy proposed by Krumhansl and Kessler can still be seen in the present data, despite changes in harmonic context; specifically, the “hierarchy” only seems to involve the first and fifth degrees of the scale, with the effect of the first being the strongest. In fact, even in Krumhansl and

Kessler’s original experiment, the difference between ratings for scale-degrees 3ˆ and 4ˆ was only 0.3 (on a seven point scale), which could easily have arisen by chance. Thus it seems that their claim that the ratings of scale-degrees match a tonal hierarchy (as influenced by Meyer, 1956) has some merit; however, the hierarchy may only involve the tonic and dominant notes of the scale. In fact, in the majority of profiles from the present data, scale-degree 4ˆ tends to be rated (moderately) higher than the mediant on all three graphs, suggesting that it is possible that the hierarchy (if it exists) may have more to do with the cycle of fifths (i.e., a fifth above and below the tonic) than the tonic triad.

While the K&K key profiles remain widely in use, the remainder of their 1982 paper actually detail additional, and more complex, experiments which attempt to demonstrate perceived key distances, as well as show converging evidence for their tonal stability theory. In these later experiments, the authors in fact do use a multi- tude of progressions; the majority of which do not end on tonic. While the authors’ primary claim is that the overall results of their experiments show that the effect of the key overrides that of the local chord, the evidence they provide in support of this claim is poor. First, their progressions ending with tonic always provided the strongest correlation with the key profile. Second, in non-modulating progressions with a tonic triad in the middle, the correlations with the key following the tonic

91 harmony get worse, suggesting that either participants believed that they were mov- ing to a new key, or they were simply rating the goodness of fit with the local chord.

Moreover, Krumhansl and Kessler do not provide profiles for the scale-degrees in the individual chord contexts, nor do they examine the relationship between the profile at each point in the sequence and the profile for that local chord. Rather, they only show the correlations between those chord-context profiles and the key profile.

One caveat relates to the language used in Krumhansl and Kessler’s experiment compared with the present one; which is, there may be a difference between rating

“fit” and “expected” (or “complete”, etc.). First, it may be that the interpretations by the participants cannot be controlled in the experiment. Even when instructed to rate how a given scale-degree “fits” in a given context, participants may change their rating (unconsciously or unconsciously) to adopt how a given scale-degree might

“complete” a given context. Thus, although “completing” and “fitting” are not the same, participants may be changing their interpretations such that, in the end, they are largely used in the same way. In post-experiment interviews this question of interpretation was raised to participants with the example of how they rated scale- degree 7 following a dominant versus tonic context. Some participants reported being inconsistent, while others seemed unaware of any differences while completing the ex- periment. In the profiles from the data broken down by chord context, it is clear that a double usage was occurring: Across ratings for “Surprising/Expected” follow- ing a dominant context, results found that the highest rating went to 1,ˆ followed by

5ˆ and 7.ˆ This means that participants were rating notes that fit the current con- text as expected (i.e., 5ˆ and 7),ˆ but also notes that followed the current context as expected (i.e., 1).ˆ Of the three dependent variables that were thought to most

92 likely match Krumhansl and Kessler’s “goodness of fit” task, “Surprising/Expected”, when the context ended with tonic, had the highest correlation to the K&K profile.

However, of the dependent measures overall (averaged across all chord contexts),

’Tense/Relaxed’ and ’Complete/Incomplete’ show strongest correlation to the K&K profile at .92, while ’Surprising/Expected’ is .86. Overall, the relatively good corre- lations with the K&K profile suggest that, despite inconsistent interpretations of the terms, the difference in language used between the two experiments (i.e.,“fit” versus

“expected” or ”complete”) is probably trivial.

4.4 Discussion

In this chapter, an experiment was detailed that attempted to continue a line of investigation on scale-degree qualia, following Huron (2006). The results of the pilot study suggest that even when listeners have limited musical experience, and are presented with physical stimuli (as opposed to using their imagination), listeners came up with similar types of descriptive terms (that could be categorized in similar ways) as found in Huron (2006). Throughout the main experiment, the same stimuli in the same context was repeated, to test how consistent listeners were in their judgments, and it was found that listeners were not only capable of performing the task, but showed relatively good consistency. With regard to the question of musical training, it appears that those with musical training certainly appear more consistent.

However, understanding why musicians might perform “better” at this task is not straightforward. As Hansberry (in press) claims, “one learns only about what the participants believe their experience to be, and, in this case, a participant’s concepts of scale-degrees may well outrun their qualia”. It is true that participants are asked

93 to reflect upon some experience, and attempt to convey that experience in words (or evaluate it with a set of given terms), which is admittedly a difficult and rather crude task. For those participants with music theory training, it is impossible to know for certain whether they might be identifying a scale-degree and responding in a way that simply echoes their conceptual knowledge of that scale-degree. It is possible, however, that it is not the conceptual knowledge that creates the difference, but perhaps some other factor, such as an increased amount of statistical learning. Presumably — even when considering statistical listening arising from listening — someone who regularly practices an instrument and/or regularly participates in musical activities will have acquired more statistical knowledge. However, as pointed out, individuals in the non-musician group were still capable of rating scale-degree qualia, with a level of consistency greater than would be expected by chance. This is consistent with the notion that scale-degree qualia can exist apart from mere identification, although it is unclear how much conscious understanding of the concept of scale-degree might add to that qualia. One avenue for future research might attempt to pry apart these empirically entwined factors through the use of implicit measures, such as reaction time studies.

The most basic but perhaps the most critical component of this research was the inclusion of harmonic progressions that end with harmonies other than tonic. As it relates to qualia, indeed, the harmonic context was shown to have significant influence over listeners’ judgments. Although this finding itself may seem unsurprising, scale- degree qualia have until now only been considered in isolation, and given the strong

94 influence of chord-tones on qualia judgments, this suggests that a large part of a scale-

degree’s ability to influence qualia may be tied to a harmony (or perhaps harmonic

function) with which it is most closely attached.

Krumhansl and Kessler’s work on the perception of key structures has become

seminal in the field of music cognition. The methodology used in the present experi-

ment allowed a conceptual replication of part of Krumhansl and Kessler’s 1982 study.

In the present study, the regression results showing a main effect for scale-degree, in

conjunction with the scale-degree profiles across multiple chord contexts, support the

notion that participants understand and can keep in mind the prevailing key despite

changes in local chord contexts, consistent with Krumhansl and Kessler’s claim. How-

ever, Krumhansl and Kessler’s discounting of the role of local chord effects may have

been premature. The present work showed a significant interaction for scale-degree

and progression, with subsequent analyses showing a strong effect for chord-tones

being rated as “better fitting”. The primary claim of Krumhansl and Kessler that

scale-degrees 1,ˆ 5,ˆ and 3ˆ form a tonal-stability “hierarchy” (in that order), was only replicated following a tonic context. While scale-degrees 1ˆ and, to an extent, 5ˆ were consistently rated high, the remaining scale-degree ratings varied according to chord context.

95 Chapter 5: Rhythm Qualia

Abstract

In this chapter a perceptual experiment tested the hypothesis that met- rical context influences rhythm qualia. Listeners heard musical samples of either composed rhythms or short rock/pop clips, having one of eight possible inter-onset interval (IOI) patterns. These IOI patterns were pre- sented in a variety of metrical contexts. Listeners were given a set of descriptive terms and asked to check which ones applied to the given rhythm. The results suggested that different IOI patterns can elicit dif- fering qualia, and that the metrical context may have a moderate impact on the shaping of a rhythm’s qualia. In general, listeners exhibited a fair amount of agreement in their ratings of rhythm qualia. Various features of the IOI patterns were examined to investigate any potential relationships between the musical structure and the evoked qualia. Syncopation was found to have a moderate impact on certain aspects of rhythm qualia.

5.1 Introduction

Rhythmic “feel” is a term that appears frequently in music scholarship. But what exactly does it mean? In many cases, authors appear to use the term to describe a regularly recurring rhythmic pattern that can be entrained to — often in the context of a jazz idiom — and it is often used synonymously with “groove” (Pressing, 2002).

But there are potentially infinite combinations of rhythmic patterns all of which can be executed at different and in different meters, and appear in many other genres besides jazz. Certainly they won’t all “feel” the same?

96 As discussed in Chapter 2, there are many aspects of rhythm, meter, and timing that appear to contribute to their qualia, however this chapter will focus on the qual- itative differences between different rhythmic patterns, and the effect of the metrical context. Specifically, this study attempts to address several questions and problems related to the perception of rhythm and meter. For instance, what musical attributes might contribute to a “groovy” or “stately” sounding rhythm? What features create rhythms with similar qualia? And most importantly, can we disentangle the relative contributions of both rhythm and meter in evoking specific musical qualia?

Several scholars have attempted to investigate and describe musical qualia as it relates to the pitch domain. Of course, as musicians and researchers we commonly also use descriptive vocabulary to characterize various rhythmic concepts and qualia.

Yet, as mentioned in Chapter 2, little attention has been given to the qualia of basic rhythmic patterns. Discovering what makes rhythmic patterns similar or distinct in qualia is important not only for the study of music perception in general, but is a basic component of musicianship. As will be discussed further in Chapter 6, it is possible that qualia help listeners to identify musical objects. As applied to rhythm and meter, perhaps qualia help distinguish subtly different rhythmic figures, or even musical styles.

How might qualia contribute to our understanding of rhythm perception? Before answering this question, it must be acknowledged that the term “rhythm” is used in multiple ways. On the one hand, it is frequently used as a general term to refer to all aspects of temporal musical organization. More narrowly, it is often used in contrast to the term “meter” to identify specific temporal patterns within some meter. If the same “rhythm” is shifted in a measure, that is, put within a new metrical context, is

97 it the same rhythm? For instance, few musicians would consider the three examples presented in Figure 5.1 to be the same rhythm. Nevertheless, they share something in common, namely, an identical durational pattern. Since in this dissertation I am not considering the effects of articulation, it will be more precise to use the term “onset pattern.”

Meter, of course, has its own definition separate from that of rhythm, but it is rare that the two do not interact. London (2004) calls meter “a perceptual ground for rhythmic figures.” The grouping effects of meter influence which note forms the beginning of a group, which not only changes the amount of metrical stress it receives, but also serves as a type of position-orienting device, which is likely to alter how a rhythm is felt or perceived. In this paper I investigate the relative contributions of an onset pattern and its metrical context on musical qualia. Specifically, I expect to

find that different onset patterns elicit differing qualia, but I also test the conjecture that manipulation of the metrical context of an onset pattern results in a change of qualia.

To add to the confusion in terminology, while psychologists typically attempt to make a distinction between auditory and acoustic phenomena by using distinct vocab- ulary (such as pitch vs. frequency), there currently lacks an appropriate distinction for terms that describe symbolic or acoustic features versus those that describe the perceptual experience of a metric-rhythmic event. While I do not propose to solve this problem here, it will be useful to provide a few definitions so as to clarify my use of terminology. In this chapter, an “onset pattern” (or “pattern”) is a category applied to a set of note values that share the same sequence of inter-onset inter- vals (IOI), and it is therefore used to describe a purely acoustic feature. I use the

98 Figure 5.1: Three identical onset patterns in different metrical contexts. The second measure is a rotation or shift in phase of the first measure, and the third measure has the same phase as that of the second, but is notated in a different meter. Regardless of one’s definition of “rhythm,” the question posed is whether they elicit differing qualia?

term “rhythm” to refer to a given onset pattern set within a clearly defined metrical context. While I use “rhythm” to refer to both acoustic and perceived rhythm, for the most part the context will make clear which meaning of the word I intend. For instance, “rhythm qualia” necessarily implies that the phenomenal reaction is to a perceived rhythm, while a description of “composed rhythm stimuli” refers to the acoustical or symbolic properties. While a distinction between the two meanings is important, in many cases the distinction is either irrelevant given the context, or else it is assumed that for practical purposes the subjective experience is isomorphic to the objective pattern, and so a distinction becomes unnecessary. For instance, in the experiment that will be detailed in the ensuing paragraphs, a series of rhythms were composed, and performed by a computer; when listeners hear the rhythm, the experimenter assumes (as is true in most experiments) that the stimuli are perceived in the intended manner — that is, that the perceived rhythm will be the same rhythm as the one represented in . In the case that a clear distinction must be made between one meaning of rhythm or the other, I will specify “rhythm(a),” for acoustic rhythm, and “rhythm(p)” for perceived rhythm. In this way, Figure 5.1 can be described as showing three instances of the same onset pattern, but potentially evoking three different rhythms(p).

99 5.1.1 Background

Several scholars have investigated the causes and effects of “metric malleability”

(Vazan & Schober, 2000, 2004), a term coined by London (2004) to refer to a musical pattern that can be “construed in more than one metric framework.” This effect is often referred to in the literature as metrical “ambiguity.” Many scholars have investigated and discussed some of the preference rules that lead listeners to settle on or prefer one particular metric interpretation of a so-called “ambiguous” pattern or melody (e.g. Lerdahl & Jackendoff, 1983; Longuet-Higgins & Lee, 1984; Parncutt,

1994; London, 2004) However, the term “ambiguous” has two related, but different, meanings. One dictionary definition is that something “can be open to more than one interpretation.” This definition is clearly applicable to the rhythms that will be studied here. However, dictionaries also define ambiguous to mean something that is

“unclear or inexact.”

Condit-Schultz & Arthur (2014) thus proposed using the term “multi-stable” — a term used in visual perception literature — since their research found that typically a perceived meter is initially felt strongly in one way or the other. Some composers and musicians seemingly take advantage of this proclivity of the listener to prefer one interpretation by creating or interpreting music that can be described as evoking a “garden path” experience — that is, initially implying one metric interpretation, but eventually other cues arrive to suggest an alternate interpretation. These are sometimes referred to as “fake-outs” (London, 2006).

An investigation of the attributes that lead to one interpretation or another of a multi-stable pattern is beyond the scope of this dissertation. What is of primary in- terest here, is whether the metrical context of a pattern can affect its qualia. Research

100 has shown that metric priming is a reliable way to induce a particular rhythmic inter- pretation (e.g. Desain & Honing, 2003). Multi-stable passages offer an opportunity to study the effect of metrical context on a passage’s qualia, since, with the appropriate count-in, the passage can be evoked in one of two different metrical contexts, yet the remaining acoustic information (such as timbre, microtiming, tempo, etc.) across the two interpretations is identical. In addition to metrical context, however, there are other features of an onset pattern that might influence its qualia, some of which are also investigated in this chapter. For instance, some factors may include: the length of the pattern, the amount of syncopation, the position of the first onset, perceived tempo or speed, the regularity or predictability of the pattern, the variety of durations

(or IOIs) that make up the pattern, the ratio of the shortest to longest duration, or the average duration length. Accordingly, in addition to testing the main hypothesis that meter will interact with an onset pattern to affect qualia, these other potential influences of rhythm qualia will be investigated.

5.2 A Perceptual Study: Rhythm in Context

5.2.1 Introduction

In this study, an attempt is made to investigate the relative contributions of onset pattern and meter on rhythm qualia. In particular, it is predicted that different onset patterns will elicit differing qualia, and that various metrical manipulations of each onset pattern will elicit differing qualia. Of course, it is very likely that a change to the onset pattern itself will have a larger effect than a change of phase of the same pattern. To give an example using pitch, a and a will sound very different, on account of their having different cardinalities as well as

101 different patterns of step sizes. A natural and a major scale, however, will sound more similar, since the pattern of whole and half steps that make up the scales are identical, and only the phase of the pattern is different. That is, the phase is rotated with respect to which note is considered the tonic. Nevertheless, it would be hard to argue that there is no change of qualia between a major and minor scale.

Thus, similarly, the main hypothesis tested in this study, is that a “rotation” (change of phase) of an onset pattern will elicit differing qualia in a listener.

As in the qualia experiment presented in Chapter 4, it was preferred that, as an initial study of rhythm qualia, the experiment use a simple design that would test some very basic assumptions, and make use of commonplace vocabulary. That is, there may be a large amount of variation in the descriptive language used to describe the qualia of a rhythm, but in order to minimize the amount of variation in the data, and to be able to draw preliminary conclusions, it was decided that participants should simply have a list of descriptive terms to choose from, and that those words should be fairly general (but different enough that they would not apply to every rhythm.) In order to decide which terms to use, a “crowd-sourcing” technique was employed. A small group of musicians14 were were asked to write down any and all descriptive words they could think of that would specifically describe the “feel” of any rhythm. The responses were collected and then content-analyzed looking for common responses, and terms that would be considered synonymous. The top 10 most common responses (and their synonyms) were ultimately chosen as the descriptive terms to be included in the experiment. (See Figure 5.4 in Section 5.2.2 for the list of terms.)

14Since attendance during this informal meeting was not recorded, a precise number cannot be given with certainty, but a good estimate would be 10.

102 With regard to the actual patterns played to the participants, since the goal was to investigate the interaction of onset pattern and meter, at least some of the stimuli would have to be tightly controlled. Accordingly, it was decided that a some of the stimuli would be composed and generated by the experimenter. Since IOIs are relative in proportion, and can be notated differently depending on and tempo, a common way of referring to them is by using simple integer ratios (e.g., followed by half note is 1:1; quarter followed by two eighths is 2:1:1). Of course, it is acknowledged that changing the tempo has been found to affect musical qualia

Hevner (1936, 1937). However, due to time constraints, the composed rhythms were all performed at the same tempo. The onset patterns to be included in the experiment were chosen based on common ratio relationships that are found in different genres of music, including classical and . First, the most basic ratio relationships were chosen: 1:1, 1:2, 1:3, 1:1:2. Then, more complex ratio patterns commonly found in rock and latin music: 3:3:2, 3:3:3:3:4, 3:5, 3:3:4:2:4. Of course, there are many more possibilities. These particular onset patterns were chosen primarily due to their ubiquity in music. It was assumed that listeners would be familiar with most, if not all, of the patterns.

For each given onset pattern, there was typically four metrical interpretations used, either by setting the pattern in a new meter, or, more commonly, by shifting the phase of the pattern. For the simplest pattern (1:1), only one shift was performed, that of switching from downbeat to offbeat. For the longer onset patterns, experi- mental time restrictions prevented the use of all possible rotations. Thus primarily the most representative or “familiar” rotations were used. The complete set of com- posed rhythms used in the experiment are illustrated in Figure 5.2. Note that there

103 are multiple ways to represent the various IOIs in Figure 5.2 using common Western musical notation, and the method of notation chosen was somewhat arbitrary. Nev- ertheless, since musicians are accustomed to reading music notation, it is provided here as a visual aid. Participants were not shown any musical notation. In order to ensure that participants were hearing the intended rhythm, a count-in was provided in the form of a metronome-like stream that played a “bass drum sound” on the main beats (e.g. quarters), with a “woodblock sound” providing subdivisions of the beat

(e.g. eighth notes). Participants were instructed to tap their foot along with the beat for the duration of the excerpt.

In order to add depth and ecological validity to the experiment, several brief song excerpts were borrowed from Condit-Schultz’s “fake-out” experiment (Condit-Schultz,

2016). These were excerpts that were representative of one of the ratio relationships described above. All excerpts were from purely instrumental segments of the songs.

The excerpts were selected based on their metrical flexibility or multi-stability. That is, by giving the listener one of two possible metrical contexts (i.e. “count-ins”), the excerpt’s phase could be manipulated. These excerpts were extremely valuable to the experiment. Although they contain musical information such as timbre, harmony, and melody, the only variable that is changed from one hearing to another is the starting point or phase of the pattern. Thus if a listener reports a change in qualia, presum- ably the only factor that it could be attributed to is the change in rhythm(p). These recorded excerpts therefore provide ecologically valid musical examples that comple- ment the more contrived stimuli. Figure 5.3 shows the musical examples written out in score notation.

104 Figure 5.2: Rhythm stimuli (composed). Complete set of composed rhythm stimuli used in the experiment. Each measure (total 32) represent a single rhythm(a). Inter-onset patterns on the same line (system) are either rotations of each other (i.e., phase-shifted), or re-notated in a different meter. Thus there are 32 rhythms(a), but only 8 unique onset patterns.

105 At this point it will be beneficial to introduce terminology that will be used to

describe the two interpretations of these song-clip stimuli. Note that although the

songs are composed, it is unclear — and for the purposes of this dissertation, ir-

relevant — whether the artists intended for their songs to have the “garden path”

effect, or whether they simply intended them to be heard with one consistent metri-

cal interpretation throughout. In this way, the word “original” as applied to a given

metrical interpretation is not helpful. Nevertheless, it will be useful to distinguish

between the two possible interpretations for each song. Accordingly, the interpreta-

tion which would allow a listener to hear the song within a single metrical context is

called the consistent version, and the interpretation which would require the listener

to be “faked-out,” or have to switch metrical interpretations part way, is called the

alternative version. Note that in the present experiment, since participants do not

hear the whole song, but only a fragment of the song, they never have to switch met- rical interpretations, rather, they will only hear one or the other (i.e., the clip ends before any metrical “switching” would occur). Therefore, although no perceptual fake-outs occurred in this experiment, for the sake of clarity the two interpretations will henceforth be referred to as consistent or alternative.

One might ask why the experiment used the composed rhythms at all? Firstly, it is much easier to build synthetic stimuli. The composed stimuli are experimenter controlled, and therefore lack any performer nuance or differences in dynamics that might bias the listener towards one interpretation, or make one interpretation more viable or realistic. More importantly, it is difficult to find purely instrumental excerpts of pop, rock, or latin music that are metrically malleable, and can be believably interpreted in different metrical contexts; and even more difficult if one attempts to

106 Figure 5.3: Rhythm stimuli (borrowed). Sample of the borrowed rock/pop stimuli from Condit-Schultz (2016), shown with the two possible metric interpretations. The “consistent” hearings are labeled as “0,” and “alternative” hearings are listed with a positive or negative number indicating the shifted position (in 8th notes) from the consistent version.

include a wide variety of rhythms. In fact, if multiple plausible metric interpretations were the norm, then fake-outs would be a much more common musical occurrence.

5.2.2 Method

Thirty-six undergraduate musicians from Ohio State University participated in exchange for partial course credit. All students were enrolled in the final year of aural skills (i.e. musicianship) classes. Participants chose this experiment from a list of possible experiments.

The experiment was conducted in three blocks. The stimuli in block one con- sisted of solo rhythm patterns of one measure, repeated for a total of four loops, performed with a woodblock sound. The stimuli in block two were the short clips of

107 popular/. The stimuli in block three were identical to that of block one.

The repetition of stimuli was used as a way of measuring within-subject variability.

All sound files were edited in Sonar recording software (version: Producer), and the

percussion instrument sounds (blocks 1 and 3) were taken from Sonar’s percussion

sampler. There were no visual representations of any of the audio stimuli. The exper-

iment was conducted in a sound-attenuated booth and participants listened to stimuli

through stereo headphones.

In the first block, participants heard all 32 unique interpretations of the 8 possible

inter-onset patterns illustrated in Figure 5.2. These were presented in a uniquely

semi-randomized order for each participant. That is, the excerpts were arranged such

that variations of the same pattern were maximally distant from one another, to avoid

obvious recognition of the rotations that might occur if they were played one after

the other. All composed rhythms were played at 65 beats per minute. In the second

block, participants listened to the short pop/rock clips. However, for this portion

of the experiment the two versions of the stimuli were divided between participants.

This was because it seemed likely that these stimuli would be much more memorable, and therefore that even with time in between hearings, participants would recognize the manipulations. Therefore participants only heard one version (the consistent or the alternative) of the pop/rock stimuli. In the third and final block, participants heard the same 32 rhythms from block 1, but presented in a newly semi-randomized order. The instructions to participants were as follows:

You will hear a count-in in either 3/4, 4/4, or 6/8 time. A bass drum will signify the downbeat and a woodblock will tap out the subdivisions of the beat. As soon as you hear the count-in begin, we would like you to tap your foot along with the beat, and continue tapping your foot through the duration of the excerpt. This is to minimize the chance of hearing the

108 Figure 5.4: List of rhythmic descriptor terms. List of terms provided for the rhythm qualia experiment. Ten options were provided (including synonyms). Partic- ipants were instructed to check the box if they felt that any of the terms applied to the heard rhythm.

given rhythm in the incorrect metrical context. Immediately following each rhythm you will be presented with this screen [see Figure 5.4]. Your task is simply to check any and all boxes beside the words that you feel best describe the rhythm that you just heard. As you can see, the words are grouped into synonyms. For the most part, we expect that if you perceive a rhythm as sounding “upbeat” that you will likely also hear it as “lively” and “bouncy.” However, even if you only agree with one of the words in the list of synonyms, we would like you to check the box anyway.

It should be noted that some redundancy was included in the set of terms for two reasons: first, it provides a way of ensuring that participants were using the terms appropriately, as one would not expect to see opposite terms as both applying to the same rhythm; second, because the task is binary (yes/no), by providing terms that are opposites it gives participants the opportunity to respond in an affirmative way to either category. For example, just because “fast” was left unchecked for some rhythm does not necessarily mean that it was heard as “slow.”

109 5.2.3 Results and Discussion

Comparisons were made for qualia ratings across all participants for each depen- dent variable, and for the composed rhythms as well as the songs. Recall that the rhythms can be categorized based on the IOI pattern, which are represented by simple integer ratios. Figure 5.5 shows the qualia ratings for all the composed rhythms, for each dependent measure (i.e., checkbox). Each rhythm is notated (in onsets and off- sets) along the y-axis, with rhythms grouped according to onset pattern. The x-axis shows the proportion of participants that used the dependent variable (qualia term) to describe a given rhythm.

The main hypotheses predict that: (1) metrical context can influence the qualia of a given onset pattern, and (2) that different IOI patterns will elicit differing qualia.

In order to test these hypotheses, three mixed effects logistic regression models were created and compared to evaluate which one best fit the given data. The first model attempts to predict the likelihood that a given dependent variable will be checked as- suming there are no differences between any of the IOI patterns or individual rhythms

(i.e., the intercept is the only predictor). The second model attempts to predict the likelihood that a given dependent variable will be checked based on the onset pattern

(e.g., 1:1, 1:2, etc.). The third model attempts to predict the likelihood that a given dependent variable will be checked based on the individual rhythm. These models were then evaluated using a log-likelihood ratio test (see Chapter 3 for description of the log-likelihood test). If the second model performs significantly better than the

first model, it suggests that the various IOI patterns are generating qualia that are distinct from each other. If the third model performs significantly better than the second model, it suggests that the individual rhythms have qualia differing from each

110 Figure 5.5: Rhythm Qualia Ratings - Composed Rhythms (continued on next page)

111 Figure 5.5: Rhythm qualia ratings - composed rhythms. Qualia ratings for each composed rhythm, with one graph per each dependent variable (i.e., checkbox). Rhythms created by the same (rotated) onset pattern are plotted with the same shape. All rhythms are listed along the y-axis in the form of equally spaced onsets (1) and offsets (0). Position along the x-axis represents the proportion of participants (n=36) who applied the qualia rating (title of graph) to any given rhythm.

112 other; that is, that the individual rhythms would be providing additional information beyond that of the IOI pattern alone. A separate set of models was created for each dependent variable.

Unsurprisingly, the second model performed better than the first model across all dependent variables (all p < .001), consistent with the hypothesis that different

IOI patterns elicit differing qualia. The results found the third model to be signifi- cantly better than the second model for: “Groovy,” “Lively,” “Powerful,” “Stable,”

“Stately,” “Slow,” and “Unstable” (p < .05); and not significant for: “Hurried,”

“Quick,” and “Laid-back.” (Note that each dependent variable is being referred to by the first term in the list, but applies equally to all synonymous terms. See Figure

5.4 for the list of synonyms.) One might wonder, given that there is some redundancy in the response items, whether some of the tests might be redundant? One way to investigate this is to examine the degree to which the responses are correlated. Re- call, however, that the data collected in this experiment is binary. Therefore, taking a correlation of means may not be very effective. For example, hypothetically, a given rhythm could be rated by 50% of participants as “Quick,” and by 50% of participants as “Slow.” This might imply that participants disagreed about the qualia of this rhythm. However, a correlation of those two means would be 1, and imply that quick and slow are redundant terms (which, in this hypothetical case, is not true.) A more meaningful similarity measure, therefore, would examine the proportion of times that term x and term y were both applied to the same rhythm, ignoring (from the total) instances where neither term was applied.15 If this proportion is high, then it may be

15The logic behind ignoring null values is that measuring the similarity of avoided items doesn’t contribute to what they have in common, in much the same way that we wouldn’t compare two individuals’ shopping lists by investigating what they both didn’t buy.

113 Table 5.1: Similarity measures between dependent variables. Jaccard similar- ity coefficients were used to measure redundancy between dependent variables. Each coefficient in the table represents the proportion of times that the term listed on the x-axis and y-axis were both checked at the same time, ignoring (from the total) instances where neither term was applied.

Groovy Lively Hurried Powerful Quick Stable Stately Laid-back Slow Unstable

Groovy - Lively .22 - Hurried .08 .25 - Powerful .10 .23 .27 - Quick .05 .26 .34 .23 - Stable .15 .17 .22 .16 .17 - Stately .01 .07 .17 .14 .15 .30 - Laid-back .36 .08 .02 .04 .02 .17 .03 - Slow .07 .01 .00 .03 .00 .10 .12 .13 - Unstable .12 .07 .06 .05 .06 .00 .03 .12 .10 -

more likely that the terms are redundant. Using this method (specifically, the Jaccard similarity coefficient) the similarity of qualia terms was compared. These values are shown in Table 5.1. Since the highest similarity coefficient is .36, it is assumed that this is not great enough to dismiss any one term as being redundant.

The same similarity metric can be applied to the participants’ data to evaluate the reliability of their responses. Recall that block 3 repeated the stimuli from block

1, and as such a comparison can be made between first and second hearings for each participant. Using this method the average Jaccard similarity coefficient (across all dependent variables) was found to be .21 with a standard deviation of .08. While the similarity across repeat responses does not appear to be very strong, again, it is useful to have something to compare it against. As was done in Chapter 4 (see Section 4.3),

114 a simulation was performed, this time with 1000 Jaccard similarity coefficients (using the same average proportion of 1s and 0s as found in the actual data) in order to create a sampling distribution for the null hypothesis. The distribution of correlations from the randomized data was found to have a mean of .08 with 95% of values falling under .12. Therefore, the actual participants’ mean repeat-measures similarity score of .21 is considerably higher than would be expected by chance. As with the findings from Chapter 4, this result might be interpreted as suggesting that the task was somewhat difficult, but that participants nevertheless were able to perform the task with a moderate degree of consistency.

Recall that song clips were included in the experiment to provide a subset of more ecologically valid experimental stimuli, and that the rhythms in the song clips belong to one of the eight IOI patterns. Upon preliminary analysis of the data, it was noted that the song clips seemed to elicit greater perceived differences between conditions than the composed stimuli. In addition, there seemed to be more variation between songs (even for those with the same onset pattern), which might be expected given that each song has different instrumentation, tempo, etc. As such, it seemed appropriate to consider these groups of data (composed rhythms vs. song clips) separately.

Figure 5.6 shows examples for changes in qualia both by song and by dependent variable. Recall that there are two versions of each song, the consistent and alterna- tive. Figure 5.6a shows which songs appear to have the greatest differences between versions. The songs are listed along the y-axis, and the proportion of responses along the x-axis. The consistent version of the song is represented by a circle and the al- ternate version with a triangle. Reading the graphs horizontally from left to right,

115 one can measure the mean perceived difference in qualia for each interpretation of the song by examining the difference in space between the circle and triangle.

Figure 5.6b shows which qualia terms were applied most/least frequently to a given song clip. Examples of qualia ratings are given for three songs, with all dependent variables listed on the y-axis and the proportion of responses along the x-axis. In these graphs one can see how the qualia might change between the two interpretations of a given song. Notice that the two songs on the right (“She’s A Woman” & “I’m Free”) show much more variation between interpretations than the song on the left (“All My

Life”).

To test whether the qualitative differences found across metric interpretations of the songs were signifiacnt or simply arose by chance, two mixed effects logistic regression models were created and compared to evaluate which one best fit the given data. This time, the first model attempts to predict the likelihood that a given dependent variable will be checked based on the unique song (e.g., “Kate,” “She’s A

Woman,” etc.). The second model attempts to predict the likelihood that a given dependent variable will be checked based on each unique rotation of the songs (e.g.

“Kate” - consistent, “Kate” - alternative, etc.) Like the earlier models, these were also evaluated using a log-likelihood ratio test. If the second model performs significantly better than the first model, it suggests that the two interpretations of the songs have qualia differing from each other. In other words, the metric interpretation (or version) would be providing additional information beyond that of the song itself.

Four of the ten dependent variables had the second model perform significantly better than the first: “Stable,” “Stately,” “Slow,” and “Unstable.” For the remain- ing dependent variables (“Groovy,” “Lively,” “Quick,” “Hurried,” “Laid-back,” and

116 (a) Example Qualia Ratings; All Songs

(b) Example Qualia Ratings; All Categories

Figure 5.6: Rhythm qualia ratings - song clips. Examples of changes in qualia ratings for song clips. Both upper and lower graphs have the proportion of checked responses plotted along the x-axis. Circles represent the “consistent hearing,” and tri- angles represent the “alternative hearing.” (a) Changes in qualia ratings across songs (plotted on the y-axis) using three dependent variables as examples. (b) Changes in qualia ratings across dependent variables (plotted on the y-axis) using three songs as examples.

117 “Powerful”), the second model did not perform significantly better than the first.

While this may not appear to be a very strong result, these models based on the song clips had almost one tenth of the amount of data compared to the composed rhythms. This is because there were far fewer song clips than composed stimuli, but also recall that the songs had to be heard between participants to avoid recognition.

Thus, while the rhythm models have 2,304 data points (36 subjects × 32 rhythms ×

2 hearings), the song models have only 245 data points (35 subjects16 × 7 songs ×

1 hearing — either consistent or alternative). As such, the weak results for the song data is likely due to a lack of power, and more data will have to be collected in order to test the reliability of the effect.

There are, of course, a few caveats to note regarding the results and methodology of this experiment. In the present experiment participants could only agree or disagree with the applicability of the given terms to the rhythms in question. As explained in Section 5.2.1, this was a preliminary investigation of the subject, and so a simple design was preferred at first. In looking for differences between rotations of patterns, a better design might ask participants to make ratings using a linear scale (similar to the design from Chapter 4). In this way, minor differences in qualia — which could not be captured using the present design — may come to the fore. In addition, the terms used as dependent variables in the experiment, while chosen with a reasonable and objective methodology, were relatively basic; and although synonyms were given in an attempt to capture as much variation as possible, participants were not allowed to distinguish between synonyms (this was done to conserve statistical power as well as to limit the total duration of the experiment). In fact, many participants expressed

16One participant’s middle block was not counted due to a technical error during the experiment.

118 that in a handful of cases, the terms were not felt to be synonymous, and that they would have preferred to distinguish between them. Therefore, while the present

findings suggest a small effect size (based on the relatively small differences as seen in the graphs), perhaps a more fine-tuned experiment that allows greater variation in terminology and/or finer control over the degree to which some qualia are felt would expose greater differences (i.e., a larger effect).

Due to the availability of participants and time restraints, this experiment used only musician listeners. It is therefore unknown whether training may have had any influence on their judgments. However, from personal observation, it appears that descriptive language in the classroom as applied to meter and rhythm is minimal; and, in fact, less time seems to be devoted to teaching rhythm and meter in general

(compared with other rudiments). As such, and given the minimal effect found for training in the qualia experiment from Chapter 4, it seems unlikely that learning would have a major effect in this experiment.

In determining the results for the two sets of data, there were 10 sets of models created for each dependent variable without correcting for mulitple tests. However, a

Bonferroni correction requires that the tests be independent. Since the same rhythms were evaluated on the same task with the same set of participants, with some known redundancy in the dependent variables, these tests are not independent, and therefore a Bonferroni would be an over-correction at the cost of losing power. Nevertheless, if one thinks of each of these tests as another chance to test the same hypothesis, then there is an inflated chance of finding at least one significant difference by chance. One possibility, of course, is that each test is not testing the same hypothesis about the influence of metric context. For instance, it could be that some aspects of musical

119 qualia (e.g., “grooviness”) are influenced by changes in metrical context, while others

are not (although they might be affected by other factors not tested here, such as

tempo, articulation, etc.), in which case the second model would show no improve-

ment. In this case, the assumption that each test has an equal chance of failing would

be incorrect. Nevertheless, the experiment carried out here did make this assumption

(for better or worse) and so the problem of multiple tests remains.

A different approach to the problem of multiple tests, however, might consider

the probability of obtaining some proportion of significant results for the same test.

In other words, if the null hypothesis were true, what are the odds that seven in ten

tests would be significant? (Or, in the latter case, four?) In fact, even three in ten

tests would be unlikely, with a probability of rougly 1 in 100. By this logic, finding

four in ten tests to be significant — as was the case for the song data — is enough

to surpass the 95% confidence level.

Overall, the results found here are consistent with the notion not only that differ-

ent IOI patterns elicit differing qualia, but that the metrical context is a significant

factor contributing to a rhythm’s qualia. Although, with the data at hand, it is

difficult to decipher which has greater impact on the qualia.

5.2.4 Post-hoc Exploration of Rhythm Qualia

Rather than being content with simply assuming there is some difference, the rhythms and songs themselves can be analyzed to investigate what structural proper- ties of the rhythms or songs might be responsible for the qualia in question. Given that some of the descriptive terms included “laid-back/relaxed” and “groovy/sexy/funky,” one obvious feature to consider is the amount of syncopation. A crude examination

120 of the amount of syncopation in each rhythm can be made by giving a “syncopation score” to each composed rhythm used in the experiment.

A syncopation score was given to each rhythm based on the onset position (in 16th notes) of each attack, and the beat (or subdivision of the beat) that was obscured.17

In addition, the present operationalization of syncopation closely resembles that of

“lacunae” in Huron & Ommen (2006). For example, take the following rhythm in 4/4 time: 1001 0000 1001 0000 (spaces have been added only to clarify the position of each beat). It should be rather uncontroversial to propose that either of the following two modifications to the pattern above should remove the syncopation: 1000 0000 1000

0000 or 1001 1000 1001 1000; or that the following two modifications should lessen the syncopation: 1010 0000 1010 0000 or 1000 0010 1000 0010. This is because syncopa- tion is not merely created by the absence of attack on a strong(er) beat, nor simply by the presence of an attack on a weak beat, but rather it requires an attack at a lower level of subdivison prior to a proximal, unarticulated strong(er) beat. Of course, not all strong beats are created equal, and in 4/4 time beat 1 will be the strongest, followed by beat 3, followed by beats 2 and 4, and the 8th note level of subdivision will be weaker than that of beats 2 or 4, and the 16th note level weaker still than the 8th note subdivisions, and so on. Thus “beat strength scores” could be applied to each 16th-note position in 4/4 time as follows: 16,1,2,1,4,1,2,1,8,1,2,1,4,1,2,1. The syncopation score system applied here assumes that syncopation will be greater when stronger beats (compared to weaker beats) are obscured, and when the obscured beat is preceded by a proximal onset in a metrically weak position. Using this method, an

17While this method for measuring syncopation was independently designed, it bears a striking similarity to the method used by Fitch & Rosenfeld (2007), which was in turn based on a theoretical model proposed by Longuet-Higgins & Lee (1984).

121 algorithm applies “points” to each onset in a rhythm based on its syncopation proper- ties (as described above), and those points are summed to create a syncopation score.

The greatest number of points, for example, would be given to a downbeat obscured by an onset at position 16 or 15; or beat three obscured by an onset at position 6 or

7. This resulted in a set of syncopation scores ranging from 0 to 21. Table 5.2 shows the syncopation score generated by the algorithm for each rhythm composed in 4/4 time.

These syncopation scores can then be used to investigate a rhythm’s likelihood of evoking some qualia (e.g., “groovy”). Syncopation scores were used in a regres- sion analysis to see if the amount of syncopation could predict the evoked qualia.

Figure 5.7 shows the relationship between the syncopation score and the odds that

“Groovy/Sexy/Funky” will be used to describe the qualia of the rhythm. Each rhythm is represented by its syncopation score, listed along the x-axis, and indi- vidual dots are plotted for each trial showing whether the given rhythm was checked as “groovy” or not (1 or 0). A logistic curve (plotted with 95% confidence intervals) shows that as the syncopation score increases, the likelihood that the rhythm will be heard as “groovy” also increases, or in other words, it shows how syncopation and

“grooviness” covary. Syncopation scores were also used as predictors for the qualia terms “laid-back,” “unstable,” and “lively,” with the results shown in Figure 5.8.

Figures 5.7 and 5.8 suggest that the amount of syncopation (as operationalized by the syncopation scores) is a relatively important factor influencing qualia, especially the amount of “grooviness.” The amount of syncopation also showed moderate covari- ance with the likelihood of perceiving a rhythm as “laid-back” (or “relaxed”/“calm”) or “lively” (or “bouncy”/“upbeat”). Notice too, how the amount of syncopation can

122 Table 5.2: Syncopation scores for rhythm stimuli. The generic category (ratio) or pattern is written in brackets on the left. The unique rotation or rhythm is written out in 0s and 1s representing offsets and onsets in 16th notes (4/4 time). Spaces are added merely for visual aid in segregating beats. A syncopation score, generated by an algorithm, is listed to the right of each rhythm.

Rhythm Syncopation Score (1:1)1000 1000 1000 1000 0 (1:3)1000 0010 1000 0010 0 (2:1:1)1010 1000 1010 1000 0 (2:1:1)1000 1010 1000 1010 0 (1:3)0010 1000 0010 1000 3 (1:3)1010 0000 1010 0000 4 (2:1:1)1010 0010 1010 0010 4 (3:3:2)1001 0010 1001 0010 6 (3:5)1001 0000 1001 0000 6 (3:3:4:2:4)1001 0010 0010 1000 7 (3:3:4:2:4)1000 1001 0010 0010 8 (3:3:4:2:4)0010 1000 1001 0010 8 (3:3:4:2:4)1010 0010 0100 1000 9 (3:5)0100 1000 0100 1000 9 (3:3:3:3:4)1001 0010 0100 1000 10 (3:3:2)1010 0100 1010 0100 10 (3:5)0010 0100 0010 0100 10 (1:3)0000 1010 0000 1010 12 (2:1:1)0010 1010 0010 1010 12 (1:1)0010 0010 0010 0010 13 (3:3:3:3:4)0010 0100 1001 0010 13 (3:3:3:3:4)0100 1001 0010 0100 14 (3:5)0001 0010 0001 0010 15 (3:3:3:3:4)0001 0010 0100 1001 18 (3:3:2)0100 1001 0100 1001 20 (3:3:2)0101 0010 0101 0010 21

123 Figure 5.7: Amount of syncopation and “grooviness.” Grooviness ratings are predicted by the amount of syncopation. Each rhythm is given a syncopation score between 0 and 21 (plotted along x-axis). The y-axis represents the likelihood that the term “groovy” will be applied to that rhythm. Each dot represents a participant’s rating (checked, 1; or unchecked, 0) for each rhythm. In this case, as the syncopation score increases, so does the likelihood of the rhythm being heard as “groovy.” Dotted lines represent 95% confidence intervals.

124 (a) “Laid-back/Relaxed/Calm” (b) “Unstable/Weird” (c) “Lively/Bouncy/Upbeat”

Figure 5.8: Additional examples of syncopation predicting qualia. As in Figure 5.7, the syncopation score was used to predict the likelihood of (a)“Laid- back,” (b)“Unstable,” or (c)“Lively” being checked, respectively. The x-axis shows the probability of a yes/no (1/0) rating, and the y-axis shows syncopation scores from 0 to 21.

vary dramatically between rotations of the same pattern, as can be seen from exam- ining Table 5.2. While it is perhaps unsurprising that the amount of syncopation is directly related to “grooviness,” these figures suggest that there may be an optimal amount of syncopation desired in a rhythm, since it appears that a large amount of syncopation can actually create a very unpredictable, or “unstable,” rhythm. This is consistent with research by Fitch & Rosenfeld (2007), who found that highly synco- pated rhythms were difficult to reproduce and often led to an incorrect interpretation of the meter.

What other factors might be contributing to the differences in rhythm qualia?

Looking back to the graphs in Figure 5.5, one can look for patterns in the graphs, and conjecture about what might explain certain groupings. For instance, some of the graphs show a tendency for the average ratings to form (more or less) two groups, with a bifurcation typically ocurring between the lower portion of the graph

125 (ratios 1:1 through 2:1:1) and the upper portion of the graph. What, then, might be different about the patterns listed in the upper versus lower portion of the graph?

One difference lies in the amount of repetition (or, alternatively, the length) of the pattern. Although all rhythms are one measure long, there are varying amounts of repetition in the onset patterns. For instance, the shortest pattern (1:1) repeats at the level of the beat; patterns 2:1, 3:1, and 2:1:1 repeat every two beats, or mid-measure; and the remaining patterns have no repetition within the measure. Thus the binary groupings found in the results for some qualia ratings (e.g., “Groovy,” “Hurried,”

“Quick,” “Laid-back,” and possibly “Stately,”) might be attributed to the amount of repetition in the pattern.

One might also look for similarities and differences between the patterns them- selves, and then examine the graphs for any evidence of groupings based on simi- larities and differences. For example, the number of unique durations in a pattern might affect its qualia, or perhaps the total number of onsets. With regard to unique durations18, the 1:1 pattern is the only one comprised of multiples of the same du- ration (e.g., quarter notes), and the pattern 3:3:4:2:4 is the only pattern comprised of more than two unique durations; the remainder have exactly two. There are several graphs in which the 1:1 pattern looks like an outlier: “Hurried/Driving,”

“Stately/Stiff/Serious,” “Stable/Solid,” and “Slow/Plodding.” However, if the num- ber of unique durations was a factor contributing to a rhythm’s qualia, the pattern with the greatest number of unique durations (3:3:4:2:4) would be expected to have the opposite effect from the 1:1 pattern; however, that is not the case.

18Inter-onset distance is more precise, but “durations” is used for simplification

126 In terms of the total number of onsets in the pattern, the smallest number of onsets per measure occurs for the patterns 3:5 (two onsets) and 3:3:2 (three onsets), while the pattern 2:1:1 has the greatest number of onsets per measure (six). Recall that the rhythms were all played at 65 BPM. Therefore, the number of onsets might contribute to the sense of pace or speed, since patterns with more onsets in the measure will necessarily arrive closer together than rhythms with fewer onsets. Although there appears to be some tendency for the patterns 2:1:1 and 3:5 to elicit opposite effects across ratings for “hurried,” “quick,” and “powerful,” the remaining IOI groups do not seem to elicit any particular pattern that would suggest a linear relation between the total number of onsets and their qualia ratings, suggesting that the number of onsets is not likely a factor contributing to qualia judgements.

Lastly, one final factor potentially related to perceived tempo might be the inter- onset space. While the total number of onsets has already been considered, there are some rhythms with maximally-spaced onsets (such as 1:1 or 3:3:3:3:4), and others with more variation in inter-onset distance (such as 3:1, 3:5, or 3:3:4:2:4). However, once again, the data do not present any clear patterns implying changes in qualia based on the distribution of onsets.

Note that although there lacks a clear relation between qualia ratings and each of the above factors taken independently, it remains a possibility that two or more such factors may work in combination to affect qualia, in which case a clear pattern would not necessarily emerge from the graphs. For instance, the tendency for the pattern

2:1:1 to stand out as more “hurried,” “quick,” and “powerful” could occur due to a combination of factors such as a lack of syncopation, combined with a high number of onsets. There may additionally, of course, be factors that were not considered that

127 may be contributing to the various effects. And of course, as already mentioned, there are most certainly other factors beyond the simple properties of the rhythms and the metric contexts that would likely contribute to a rhythm’s qualia — such as timbre, articulation, or tempo — however these were not tested here.

Finally, it is important to bear in mind that the graphs only show mean pro- portions, and it is possible, therefore, that some other measure may uncover hidden relations between these various features of rhythms and their qualia.

5.3 Chapter Summary

In this chapter an experiment was carried out that investigated the roles of inter- onset interval and meter in influencing musical qualia. Various IOI patterns, either composed or taken from short clips of popular music, were presented in various met- rical contexts, and participants were asked to check from a set of terms which words best described the qualia of the rhythm they were hearing. Based on the results, it appears that rhythms (not only pitch) are capable of evoking distinct qualia and that listeners appear moderatly consistent in their qualia descriptions. Perhaps un- surprisingly, different onset patterns were found to elicit qualia that are distinct from each other. However, the onset patterns’ ratings occasionally “clustered together” showing little variation in qualia. Also, certain other factors, such as the amount of syncopation, were shown to have a large influence on perceived qualia, possibly more so than the IOI pattern or metrical context.

The metrical context was, overall, found to be a significant factor influencing qualia judgments, consistent with the main hypothesis. However, the amount of perceived difference in qualia between the various metrical settings of a pattern seems

128 to depend largely on the pattern itself, and on the amount of syncopation that results from shifts in phase. When the effects of the composed rhythms were compared against that of the song clips with the same IOI pattern, it appeared as though song clips showed a greater effect of phase on qualia judgements, suggesting that perhaps the composed rhythms were simply not musical enough to generate large differences in qualia. Alternatively, it could be that the timing, apparent loudness, or particular timbres in the song clips combine to add to the strength of the overall qualia of a given interpretation (e.g., loud guitar strokes occurring on downbeats may sound more “serious” than a woodblock performing the same rhythm.) Lastly, there could have been a simple arousal effect. Participants in post-experiment interviews reported that the task was somewhat repetitive, and that the middle block with the song clips was far more engaging. Therefore, it could have been simply that participants were paying closer attention to the song clips than the composed rhythms. However, when these differences suggested by the song data were tested, many test results for the effect of metric interpretation were in fact not significant. As noted, the second block of the experiment (i.e. the rock/pop clips) generated far fewer data points than the remaining blocks, and thus it is likely that this part of the experiment was simply under powered. Additional data collection will be required in order to make a fair comparison of the two metric conditions, and across the two data sets.

Finally, many of the responses to the song clips showed dramatically less variation between the consistent and alternative versions than others. While it is possible that certain patterns (or songs) simply have similar feels across metrical contexts, it is also possible that in some cases the participants were not truly hearing the rhythm as intended. In Condit-Schultz’s experiment, participants often had a difficult time

129 hearing the consistent interpretation for some stimuli, even when given a demon- stration. As such, it is a likely possibility — since a subset of the same song clips were used in the present experiment — that a sizable proportion of the participants were simply unable to hear the consistent (or alternative) interpretation. A likely culprit is the song “Kate,” which had a very challenging and unusual “consistent” interpretation with a high degree of syncopation. (Refer to Figure 5.3 for the notated examples.) Figure 5.6a shows almost no difference on any of the three qualia ratings between the two interpretations of “Kate.” In this case, it seems likely that despite the different count-in for each condition, the majority of participants may have actu- ally experiened the same (“alternative”) metric interpretation. Future experiments should be careful to test the reliability of the induced metrical context, especially for complex, ecologically valid stimuli.

130 Chapter 6: General Summary

6.1 Recapitulation

This dissertation investigated two aspects of music — scale-degree and rhythm

— in three studies that attempted to uncover whether individuals have consistent phenomenal reactions to these aspects of music, and how the musical context might influence that phenomenal experience. The first study, presented in Chapter 3, used a corpus of classical music to test the effect of harmonic accompaniment on the prob- ability of melodic continuations. Implicit knowledge accrues through the unconscious observance of the probabilities of real-life events. These event probabilities are an important factor contributing to the generation of expectations. Thus, melodic prob- abilities and, in turn, melodic expectations have been theorized to play an important role in the generation of musical qualia (Huron, 2006). The results from the first study found that the harmonic context does significantly influence probabilities of scale-degree successions, suggesting not only that harmonic information can help im- prove statistically-informed models of melodic expectation, but also that changes in local harmonic context would likely affect changes in scale-degree qualia.

131 The hypothesis that harmonic context would affect scale-degree qualia was for- mally tested in Chapter 4 with a perceptual experiment that asked listeners to eval- uate a scale-degree’s qualia while the local harmonic context was manipulated. The results suggested that scale-degrees do appear to carry some “independent” qualia related to their position in the key or scale, but that scale-degree qualia are also influenced by the local harmonic context. This experiment was carried out with par- ticipants both with and without music-theoretic training. While the non-musicians showed less consistency overall, their level of consistency was found to be greater than would be expected by chance, suggesting that the evaluation of scale-degree qualia was not merely a result of the post-hoc association of terminology with an identified scale-degree. This experiment also partially replicated a component of Krumhansl and Kessler’s 1982 experiment on the perceived “goodness of fit” of scale-degrees in a tonal context. While the results of the present research appear to be partially con- sistent with Krumhansl and Kessler’s finding that members of the tonic triad exhibit greater “tonal stability”, the present research, combined with the findings of Aarden

(2003), suggest that the claim of tonal stability may be better explained by other factors. Specifically, this study suggests Krumhansl and Kessler’s findings were con- founded by two factors: an implicit bias towards hearing the probes as phrase-endings, and the underestimation of the role of local harmonic context.

Finally, the third study carried out a perceptual experiment on the effect of met- rical context on perceived rhythm qualia. Listeners rated the perceived qualia of composed stimuli, as well as ecologically valid musical stimuli, wherein a given inter- onset pattern was played in differing metrical contexts. The results suggest that metrical context influences rhythmic qualia, but that the effect may be more or less

132 pronounced depending on the particular rotation of the IOI pattern. In addition, it is possible that the metrical context may only affect certain aspects of musical qualia

— such as perceived “grooviness” — but not others. Post-hoc analyses of the data investigated other features of the rhythms aside from the changes in metrical context and found a strong relationship between the amount of syncopation and certain qualia ratings, such as “groovy,” “laid-back,” “unstable,” and “lively.” Regarding the eco- logically valid stimuli, while a strong effect for metrical context was initially implied in the data, many tests of the effect did not prove significant. This was likely due to the low power in these tests, as the rock/pop stimuli represented a small subset of the overall data, and had to be presented in a between-subjects design. Nevertheless, the number of overall tests (4 out of 10) that were significant was still greater than would be expected by chance (if the null hypothesis were true). More data will need to be collected to verify these results. Though finding appropriate stimuli is challenging, pursuing qualia research using ecologically valid stimuli appears promising.

6.2 Discussion

Goguen (2004) says, “A difficulty with qualia advocacy is that it tends to reify qualia, giving them independent existence as Platonic entities, and introducing a fundamental ontological distinction between subjective and objective aspects of ex- perience.” If anything, this research has shown how interconnected the objective and subjective aspects of experience are. The results strongly suggest that certain structural elements of a musical object appear to be linked to certain aspects of their qualia, such as a minor chord context evoking “darkness” or a syncopated rhythm evoking “grooviness.” Moreover, while qualia are by definition subjective, this work

133 empirically investigated the amount of agreement between individuals’ subjectivity, and in the case of scale-degree and rhythm qualia, found the level of agreement to be significantly better than would be expected by chance. Therefore, while a com- mon definition of subjectivity is that of “belonging to the thinking subject rather than the object of thought,” it must be acknowledged that in order for individuals to have largely overlapping responses to some stimulus, there must be some objective attributes of the stimulus itself that contribute to the phenomenal response. This work has contributed a small insight into some of the rudimentary features of musical stimuli that lead to a qualitative response, and has shown how the placement of those stimuli in various contexts can systematically alter that response.

Given that music simultaneously presents so many types of context — not only meter, rhythm, melody, and harmony, but also style, form, genre, instrumentation, etc. — it may be problematic to isolate “types” of qualia at all, since it seems plausible that the only thing that exists is the overall moment to moment qualia of some musical experience. Music, like all art, is complex, and it is difficult if not impossible to study the micro contexts and the macro contexts simultaneously.

Listeners commonly react to a musical work as a whole, or at least, a complete musical phrase or idea (Gabrielsson, 2011); but with larger “slices” of music, it becomes difficult to understand what features of the music, exactly, lead to some experience.

Presumably it is not simply one musical feature that is responsible, but a combination of many features.

In this dissertation I merely studied two components of music, scale-degree and rhythm, and in order to study their effects, they were largely isolated and, for the

134 most part, very far removed from actual music; even the interaction of these two com-

ponents — rhythm and scale-degree — were not studied here. In addition, Chapter

3 relied heavily on probabilistic models, yet probabilities were defined using a very

narrow, “state-based” approach. As Goguen (2004) notes, “traditional Shannon information theory ... has played a modest role in music theory, but has been rightly criticized for its inability to move beyond local features (such as so called n-grams) to larger grain structures, such as . More fundamentally, a theory that is based on probability necessarily makes some very dubious assumptions about the nature of music, such as that there are discrete atomic events, having fixed probabilities.”

Throughout this document I attempted to make clear the goals of my research, to operationalize musical features and categories as best as possible given the intended local effects I was searching for, and to avoid generalizing results beyond their scope.

While a complete understanding of musical qualia may be impossible, through the dissection and investigation of the unique components that give rise to musical qualia, we, as researchers, can begin to assemble some picture of a whole by examining the sum of its parts.

6.3 Implications for Music Pedagogy

As discussed in Chapter 2, some philosophical arguments claim that qualia should not be attributed to properties of the object. If this were true, it would seem inappro- priate for music teachers to draw attention to, for example, scale-degree tendencies and their (apparently) associated qualia. However, the research presented in this dis- sertation suggests that there are intersubjectively reliable qualia that are evoked in certain contexts, and moreover that certain structural aspects of the stimuli appear to contribute to certain aspects of their phenomenal character. This suggests, therefore,

135 that not only is it appropriate for instructors to discuss musical qualia in the class- room, but that the relationship between musical structure and evoked qualia is a topic that should be encouraged. For teachers who find that the emphasis in the classroom tends to lean more towards explaining and understanding musical structure and less towards the phenomenal aspects evoked by the structure, this research provides yet further support for the idea that the phenomenal content is a salient aspect of music, and should be bolstered in the classroom. The phenomenal content of the music is something that students already have intuitions about and often feel very passion- ately about. By discussing the evoked musical qualia in relation to certain structural elements of the music that are being taught — such as , suspensions, modal mixture, instrumentation, etc. — students may become more engaged with the the- oretical material. In addition, fruitful discussions about the evoked qualia may lead to the generation of new theories about how musical structures evoke specific qualia.

Presently, musical qualia plays its largest role as a pedagogical tool within the au- ral skills classroom. Teachers commonly call attention to qualia such as the “yearn- ing” of tones that require resolution, the “feel” of a (e.g., how does 4 against 3 feel different from 3 against 4?), or the “crunchiness” created by disso- nance. As instructors of a skills-based course largely focused on aural recognition of musical structures, teachers are often left with nothing but qualia when it comes to assisting students in learning the differences between musical objects, such as chord types, scale-degrees, rhythms, and progressions. Indeed, the fact that students learn qualitative associations was a concern raised in Chapter 4, where it was conjectured that students may only be recognizing scale-degrees and subsequently regurgitating learned, associated terms as the evoked qualia. While we cannot entirely rule out this

136 possibility with the data at hand, it was found that even non-musically trained indi- viduals could categorize the qualia of scale-degrees with a fair degree of consistency, suggesting that qualia identification can exist independently from object identifica- tion. This suggests that what aural skills instructors are doing is teaching students to learn to use qualia to aid in the identification of musical objects, and not the other way around. There are, therefore, several questions with implications for aural skills pedagogy that arise from the present research. First, might students who have difficulties identifying qualia have the most difficulty with object identification? For example, perhaps the stability of 5ˆ and 1ˆ make them easily confused, or the inherent

“pulling” tendency of both 7ˆ and ]4ˆ might explain why some students miss a seem- ingly obvious change of key during a melodic dictation? This, in my opinion, is a fascinating and important question that deserves serious inquiry. For instance, might there be a way to generate a similarity matrix for scale-degree qualia that could be tested against frequent errors made by students?

A related question of concern to aural skills curricula is: given the demonstrated effect of harmony on both probabilities for scale-degree successions as well as scale- degree qualia, might it be beneficial to add harmonic context to melodic dictations?

At present, it is most common to compartmentalize various tasks in aural skills, such as notating melodies separate from harmonic progressions. However, as already noted, in most Western music the melody and harmony are intertwined. It seems that it would therefore be in the student’s best interest to use examples that best represent

“real” music. Of course, it is understood that part of the logic behind separating these tasks (of melodic and harmonic dictation) can likely be attributed to the level of difficulty inherent in each isolated task. One assumes the reason they are not

137 commonly combined into a single task is that it is presumed that the combination of

the two would only increase the difficulty, or perhaps the fear is that one might mask

the other. However the research here suggests that harmonic context may actually

provide a facilitating role in some cases. Going back to the example of the student

who confuses 5ˆ for 1,ˆ if the student readily recognizes V7 chords, then presumably the same student would not make this mistake in a dictation which placed 5ˆ over V7.

Future research might investigate this issue by calculating student performance across

various aural skills sections that either did or did not implement melodic dictations

containing harmonic accompaniment.

A final thought concerns the relationship between conceptual knowledge and musi-

cal qualia. In Chapter 2 I presented examples supporting the notion that conceptual

knowledge could influence qualia. In some ways, this argument parallels a com-

mon one in music scholarship; as noted by Montague (2011), “many undergraduate

textbooks that ask ‘why study music?’ ... suggest that the formal study of music

enhances musical enjoyment.” There are many reasons why the overall enjoyment

of music might be enhanced by formal study, but the present research suggests that

qualia may have something to do with it. If, as suggested above, part of learning

music involves building an association between musical structures and their qualia, it

would not be surprising if persons with greater theoretical knowledge of music expe-

rienced (at least some) music in a phenomenally different way than persons with no

theoretical knowledge. In addition, the component of musical training that seemingly

builds our awareness to subtle differences in musical qualia may result in a “sharp-

ening” of the phenomenal content. Of course, this remains conjecture, and enhanced

musical enjoyment can arise simply from developing an appreciation for music, which

138 could of course come from learning and understanding. Nevertheless, perhaps future research — in particular biological or neurological approaches to music research — may be able to deepen our understanding of the effect of conceptual knowledge on musical qualia.

6.4 Areas for Future Research on Musical Qualia

In addition to the suggestions already made in the previous section, there are many other avenues for future research into musical qualia, of which only a few will be mentioned here. Goguen (2004) discusses an interesting “qualia hypothesis,” which he states as: “the weight of a unit corresponds ... to its saliency, and the saliency of a unit gives its strength as a quale.”19 Now, Goguen’s idea of “a unit” is complex, and based on multiple musical parameters which can exist at various hierarchical levels.

However, the basic idea could be initially applied to relatively simple musical stimuli in an attempt to deconstruct the components that lead to some overall qualitative experience. For instance, various musical phrases could be independently assessed for both qualitative content (i.e., evoked qualia) and saliency content (i.e., which features or components appear most salient). The same phrases could then be recomposed or deconstructed into their constituent parts (e.g., only the pitch, rhythm, timbre, etc.) and those individual parts re-evaluated for qualitative content. If Goguen’s theory were correct, we would expect that elements rated as most salient would be the most likely to independently evoke the same qualia.

In Chapters 2, 3, and 4 I discussed the role of statistical learning in the generation of qualia; however, in Chapter 5 this discussion was largely absent. While statistical

19 Note that the saliency hypothesis is only one component of Goguen’s overall theory intended to model how we understand music.

139 learning is likely playing an important role “behind the scenes” in the case of rhythm qualia, there appears to be no way of formally testing this suspicion on account of the lack of data available on the statistical probabilities of the rhythmic patterns used in the experiment presented in Chapter 5. However, through corpus study, we can investigate the statistical frequency of smaller rhythmic patterns and motives. For instance, within classical music, it could be that certain rhythms are associated with fast movements while others are more associated with slow movements. If this were the case, then rhythms associated with fast or slow movements — even presented in isolation — may elicit qualia that are respectively “hurried” and “driving,” or “slow” and “plodding.”

Finally, as noted, most of the investigations into musical qualia have focused on pitch. If researchers wish to build a comprehensive picture of the features that contribute to musical qualia, other musical features will also need to be isolated and studied for their potential to evoke qualia. In particular, the effects of timbre, tempo, and articulation would all make ideal starting places for future investigations of musical qualia, since they can be easily manipulated and make up the remaining

“rudimentary” aspects of music that can be widely varied.

140 Works Cited

Aarden, B. J. (2003). Dynamic melodic expectancy. PhD thesis, The Ohio State

University.

Abraham, N. (1995). Rhythms: on the work, translation, & psychoanalysis. (N.T.

Rand & B. Thigpen, Trans.). Stanford, Calif: Stanford University Press.

Albrecht, J. & Huron, D. (2014). A statistical approach to tracing the historical

development of major and minor pitch distributions, 1400-1750. Music Perception,

31(3):223–243.

Berlyne, D. E. (1971). Aesthetics and psychobiology. The Century Psychology Series.

New York: Appleton-Century-Crofts.

Browne, R. (1981). Tonal implications of the diatonic set. In Theory Only, 5(6-7):3–

21.

Budge, H. (1943). A study of chord frequencies based on the music of representa-

tive composers of the eighteenth and nineteenth centuries,. New York: Bureau of

Publications, Teachers College, Columbia University.

Burkhart, C., (Ed.). (1994). Anthology for musical analysis (5th ed). Fort Worth,

TX: Harcourt Brace College Publishers

141 Butler, D. & Brown, H. (1981). Diatonic as minimal cue cells. In Theory

Only, 5(6-7):39–55.

Chalmers, D. J. (1996). The conscious mind: in search of a fundamental theory.

Philosophy of mind series. Oxford: Oxford University Press.

Chiou, R. & Rich, A. N. (2014). The role of conceptual knowledge in understanding

synaesthesia: Evaluating contemporary findings from a hub-and-spokes perspec-

tive. Frontiers in Psychology, 5.

Condit-Schultz, N. (2016). Beat and switch: Multi-stable meter, metric fake-outs,

and the ‘first things strong’ rule. Manuscript submitted for publication.

Condit-Schultz, N. & Arthur, C. (2014). Beat and switch: Multi-stable rhythms,

metric ambiguity, and rock & roll fake-outs. Paper read at the Annual Meeting for

the Society for Music Theory, Milwaukee, MI.

Dennett, D. C. (1991). Consciousness explained. Boston: Little, Brown, and Co.

Desain, P. & Honing, H. (2003). The formation of rhythmic categories and metric

priming. Perception, 32(3):341–365.

Devaney, J., Arthur, C., Condit-Schultz, N., & Nisula, K. (2015). Theme and variation

encodings with Roman numerals (TAVERN): A new data set for symbolic music

analysis. Proceedings of the International Society of Music Information Retrieval

(ISMIR) conference.

Dowling, W. J. (1967). Rhythmic fission and the perceptual organization of tone

sequences. PhD thesis, Harvard University.

142 Dowling, W. J. (2010). Qualia as intervening variables in the understanding of music

cognition. Musica Humana, 2(1):1–20.

Fitch, W. T. & Rosenfeld, A. J. (2007). Perception and production of syncopated

rhythms. Music Perception, 25(1):43–58.

Fruhauf, J., Kopiez, R., & Platz, F. (2013). Music on the timing grid: The influence of

microtiming on the perceived groove quality of a simple drum pattern performance.

Musicae Scientiae, 17(2):246–260.

Gabrielsson, A. (2011). Strong experiences with music: Music is much more than just

music. Oxford: Oxford University Press.

Gjerdingen, R. O. (2014). Historically informed corpus studies. Music Perception,

31(3):192–204.

Goguen, J. (2004). Musical qualia, context, time and emotion. Journal of Conscious-

ness Studies, 11(3-4):117–147.

Grant, S. (2010). Some suggestions for a phenomenology of rhythm. In Philosophical

and Cultural Theories of Music. Brill, Leiden.

Hansberry, B. (In Press). What are scale-degree qualia? Music Theory Spectrum.

Hevner, K. (1936). Experimental studies of the elements of expression in music.

American Journal of Psychology, 48:246–268.

Hevner, K. (1937). The affective value of pitch and tempo in music. American Journal

of Psychology, 49:621–630.

Huron, D. (1995). The Humdrum Toolkit: Reference manual.

143 Huron, D. (2001). Tone and voice: A derivation of the rules of voice-leading from

perceptual principles. Music Perception, 19(1):1–64.

Huron, D. (2006). Sweet anticipation: music and the psychology of expectation. Cam-

bridge, Mass: MIT Press.

Huron, D. (2015). Affect induction through musical sounds: An ethological per-

spective. Philosophical Transactions of the Royal Society of London B: Biological

Sciences, 370(1664).

Huron, D. & Davis, M. (2013). The harmonic minor scale provides an optimum way

of reducing average melodic interval size, consistent with sad affect cues. Empirical

Musicology Review, 7(3-4):103–117.

Huron, D. & Ommen, A. (2006). An empirical study of syncopation in American

music, 1890-1939. Music Theory Spectrum, 28(2):211–231.

Iyer, V. (2002). Embodied mind, situated cognition, and expressive microtiming in

African-American music. Music Perception, 19(3):387–414.

Janata, P., Tomic, S. T., & Haberman, J. M. (2012). Sensorimotor coupling in music

and the psychology of the groove. Journal of Experimental Psychology: General,

141(1):54–75.

Jusczyk, P. W. & Krumhansl, C. L. (1993). Pitch and rhythmic patterns affecting

infants’ sensitivity to musical phrase structure. Journal of Experimental Psychology:

Human perception and performance, 19(3):627.

Kraepelin, E. (1899). Psychiatrie: Ein Lehrbuch f¨urStudierende und Arzte¨ (6th ed).

Leipzig: Barth Verlag. Reprinted (1990) as Psychiatry: A textbook for students and

144 physicians (ed. J. M. Quen, trans. H. Metoiu & S. Ayed). Canton, MA: Science

History Publications.

Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. Oxford: Oxford

University Press.

Krumhansl, C. L. (2000). Rhythm and pitch in music cognition. Psychological Bul-

letin, 126(1):159–179.

Krumhansl, C. L. & Jusczyk, P. W. (1990). Infants’ perception of phrase structure

in music. Psychological Science, 1:70–73.

Krumhansl, C. L. & Kessler, E. J. (1982). Tracing the dynamic changes in perceived

tonal organization in a spatial representation of musical keys. Psychological Review,

89(4):334.

Krumhansl, C. L. & Shepard, R. N. (1979). Quantification of the hierarchy of tonal

functions within a diatonic context. Journal of Experimental Psychology: Human

Perception and Performance, 5(4):579.

Laitz, S. G. (2008). The complete musician: An integrated approach to tonal theory,

analysis, and listening (3rd ed). New York: Oxford University Press.

Lerdahl, F. & Jackendoff, R. (1983). A generative theory of tonal music. Cambridge,

Mass: MIT Press.

London, J. (2004). Hearing in time. Oxford: Oxford University Press.

London, J. (2006). Metric fake outs. Excel spreadsheet posted at

http://people.carleton.edu/˜jlondon/.

145 London, J. (2016). [Review of the book Groove: A Phenomenology of Rhythmic Nu-

ance, by Tiger C. Roholt]. The Journal of Aesthetics and Art Criticism, 74(1):101–

104.

Longuet-Higgins, H. C. & Lee, C. S. (1984). The rhythmic interpretation of mono-

phonic music. Music Perception, 1(4):424–441.

Marks, L. E., Hammeal, R. J., & Bornstein, M. H. (1987). Perceiving similarity

and comprehending metaphor. Monographs of the Society for Research in Child

Development, 52(1):1–102.

Matheisen, T. J. (1983). Aristides Quintilianus on music in three books. Music Theory

Translation Series New Haven, Conn.: Yale University Press.

Merriam, A. P., Whinery, S., & Fred, B. G. (1956). Songs of a Rada community in

Trinidad. Anthropos, 51(1/2):157–174.

Meyer, L. B. (1956). Emotion and meaning in music. Chicago, Ill.: Univ. of Chicago

Press.

Meyers, L. S., Gamst, G., & Guarino, A. J. (2013). Applied multivariate research:

design and interpretation (2nd ed). Los Angeles: SAGE.

Montague, E. (2011). Phenomenology and the ‘hard problem’ of consciousness and

music. In D. Clarke & E. Clarke, (Eds.), Music and Consciouness: Philosophical,

Psychological, and Cultural Perspectives. Oxford: Oxford University Press.

Morton, E. (1977). On the occurrence and significance of motivation-structural rules

insome bird and mammal sounds. American Naturalist, 111(981):855–869.

146 Narmour, E. (1990). The analysis and cognition of basic melodic structures: the

implication-realization model. Chicago, Ill.: University of Chicago Press.

Narmour, E. (1992). The analysis and cognition of melodic complexity: the

implication-realization model. Chicago, Ill., University of Chicago Press.

Ortmann, O. (1926). On the melodic relativity of tones, volume 35, no. 1 of Psycho-

logical Monographs. Princeton, N.J.; Albany, N.Y.: Psychological Review Co.

Osborne, J. (1980). The mapping of thoughts, emotions, sensations, and images as

responses to music. Journal of Mental Imagery, 5:133–36.

Palmer, C. & Krumhansl, C. L. (1987). Pitch and temporal contributions to mu-

sical phrase perception: Effects of harmony, performance timing, and familiarity.

Perception & Psychophysics, 41(6):505–518.

Parncutt, R. (1994). A Perceptual Model of Pulse Salience and Metrical Accent in

Musical Rhythms. Music Perception: An Interdisciplinary Journal, 11(4):409–464.

Pearce, M. T. (2005). The construction and evaluation of statistical models of melodic

structure in music perception and composition. PhD thesis, City University London.

Pressing, J. (2002). Black Atlantic rhythm: Its computational and transcultural

foundations. Music Perception, 3(19):285–310.

Quittner, A. & Glueckauf, R. (1983). The facilitative effects of music on visual

imagery: A multiple measures approach. Journal of Mental Imagery, 7:105–20.

Raffman, D. (1993). Language, music, and mind. Cambridge, Mass.: MIT Press.

147 Reber, A. S. (1993). Implicit learning and tacit knowledge: an essay on the cognitive

unconscious. New York: Oxford University Press.

Roholt, T. C. (2014). Groove: a phenomenology of rhythmic nuance. New York:

Bloomsbury Academic.

Romberg, A. R. & Saffran, J. R. (2010). Statistical learning and language acquisition.

Wiley Interdisciplinary Reviews. Cognitive science, 1(6):906–914.

Rubenstein, R. Y. & Kroese, D. P. (2004). The cross-entropy method: a unified ap-

proach to combinatorial optimization, Monte-Carlo simulation, and machine learn-

ing. New York: Springer.

Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical

learning of tone sequences by human infants and adults. Cognition, 70(1):27–52.

Schiavio, A. & van der Schyff, D. (2016). Beyond musical qualia. Manuscript sub-

mitted for publication.

Shepard, R. N. (2009). One cognitive psychologists quest for the structural grounds

of music cognition. Psychomusicology: Music, Mind and Brain, 20(1-2):130.

Temperley, D. (2007). Music and probability. Cambridge, Mass.: MIT Press.

Temperley, D. (2008). A probabilistic model of melody perception. Cognitive Science:

A Multidisciplinary Journal, 32(2):418–444.

Thaut, M. (2008). Rhythm, music, and the brain: scientific foundations and clinical

applications. New York: Routledge.

148 Tye, M. (2015). Qualia (Stanford Encyclopedia of Philosophy). Retrieved from

http://plato.stanford.edu/entries/qualia/.

Vazan, P. & Schober, M. (2000). The garden path phenomenon in the perception of

meter. Paper read at the Annual Meeting of the Society for Music Perception and

Cognition, Toronto, ON.

Vazan, P. & Schober, M. (2004). Detecting and resolving metrical ambiguity in a

rock song upon multiple hearings. In Lipscomb, S. D., Gjerdingen, R. O., Webster,

P., & Ashley, R., editors, Proceedings of the 8th International Conference on Music

Perception & Cognition, pages 426–432, Evanston, Illinois.

Zentner, M. (2012). A language for musical qualia. Empirical Musicology Review,

7(1-2):80–83.

149 Appendix A: The Corpus Data

Composer Setting Piece Title

Bach Chorale No. 9 Ermuntre dich, mein schwacher Geist Bach Chorale No. 19 Ich hab mein Sach Gott heimgestellt Bach Chorale No. 24 Valet will ich dir geben Bach Chorale No. 28 Nun komm, der Heiden Heiland Bach Chorale No. 30 Jesus Christus, unser Heiland Bach Chorale No. 32 Nun danket alle Gott Bach Chorale No. 46 Vom Himmel hoch, da komm’ ich her Bach Chorale No. 48 Ach wie nichtig, ach wie fluechtig Bach Chorale No. 54 Lobt Gott, ihr Christen, allzugleich Bach Chorale No. 68 Wenn wir in hoechsten Noeten sein Bach Chorale No. 69 Komm, heiliger Geist, Herre Gott Bach Chorale No. 88 Helft mir Gott’s Guete preisen Bach Chorale No. 98 O Haupt voll Blut und Wunden Bach Chorale No. 101 Herr Christ, der ein’ge Gott’s Sohn Bach Chorale No. 110 Vater unser im Himmelreich Bach Chorale No. 117 Nun ruhen alle Waelder Bach Chorale No. 124 Auf auf mein Herz und du mein ganzer Bach Chorale No. 136 Herr Jesu Christ, dich zu uns wend’ Bach Chorale No. 153 Alle Menschen muessen sterben Bach Chorale No. 157 Wo Gott zum Haus nicht gibt Bach Chorale No. 158 Der Tag, der ist so freudenreich Bach Chorale No. 165 O Lamm Gottes, unschuldig Bach Chorale No. 176 Erstanden ist der heil’ge Christ Bach Chorale No. 177 Ach bleib bei uns, Herr Jesu Chris

150 Bach Chorale No. 183 Nun freut euch, lieben Christen Bach Chorale No. 187 Komm, Gott Schoepfer, heiliger Geist Bach Chorale No. 200 Christus ist erstanden hat uberwunden Bach Chorale No. 201 O mensche, bewein’ dein’ Suende gross Bach Chorale No. 217 Ach Gott, wie manches Herzeleid Bach Chorale No. 223 Ich dank’ dir, Gott, fuer all Wohltat Bach Chorale No. 224 Das walt’ Gott Vater und Gott Sohn Bach Chorale No. 248 Se Lob und Ehr’ dem hoechsten Gut Bach Chorale No. 255 Was frag’ ich nach der Welt Bach Chorale No. 258 Meine Augen schliess’ ich jetzt Bach Chorale No. 268 Nun lob’, mein’ Seel’, den Herren Bach Chorale No. 272 Ich dank’ dir, lieber Herre Bach Chorale No. 273 Ein’ feste Burg ist unser Gott Bach Chorale No. 276 Lobt Gott, ihr Christen, allzugleich Bach Chorale No. 282 Freu’ dich sehr, O meine Seele Bach Chorale No. 290 Es ist das Heil uns kommen her Bach Chorale No. 299 Meinen Jesum lass ich nicht Bach Chorale No. 303 Herr Christ, der ein’ge Gott’s Sohn Bach Chorale No. 306 O Mensch, bewein’ dein’ Suende gross Bach Chorale No. 323 Wie schoen leuchtet der Morgenstern Bach Chorale No. 328 Liebster Jesu, wir sind hier Bach Chorale No. 350 Jesu, meiner Seelen Wonne Bach Chorale No. 354 Sei Lob und Ehr’ dem hoechsten Gut Bach Chorale No. 361 Du Lebensfuerst, Herr Jesu Christ Bach Chorale No. 366 O Welt, sieh hier dein Leben Bach Chorale No. 368 Hilf, Herr Jesu, lass gelingen Beethoven Sonata Op. 13, No. 8/2 Piano Sonata No. 8 in Beethoven Sonata Op. 13, No. 8/3 Piano Sonata No. 8 in C minor Beethoven Sonata Op. 14, No. 1/2 Piano Sonata No. 9 in Clementi Sonatina Op. 36, No. 2/1 Sonata No. 2 in Haydn String Quartet Op. 74, No. 3 String quartet in Mendelssohn Lieder ohne Worte Op. 62, No. 3 Trauermarsch Mendelssohn Lieder ohne Worte Op. 19, No. 6 Venetianishes Gondellied Mozart Sonata K333, Mvt 1 Piano Sonata No. 13 in Bb major

151 Mozart Sonata K284, Mvt 3 Piano Sonata No. 6 in Schubert Lied Op. 59, No. 3 Du bist die Ruh Schubert Lied Op. 7, No. 3 Der Tod und das Maedchen Schubert Lied Op. 32, DV550 Die Forelle Schubert Lied Op. 25, DV795 Ungeduld Schubert Lied Op. posth.DV957 Staendchen (No. 4) Schubert Lied Op. 106, No. 4 An Sylvia Schumann Piano Miniature Op. 68, No. 1 Melody Schumann Piano Miniature Op. 68, No. 6 Poor Orphan Child Schumann Piano Miniature Op. 68, No. 8 The Wild Horseman

152