<<

Worlds of : Cognitive Ethnomusicological Inquiries

on Experience of Time and Space in -making

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Yong Jeon Cheong

Graduate Program in Music

The Ohio State University

2019

Dissertation Committee

Udo Will, Advisor

Georgia Bishop

Graeme Boone

Copyrighted by

Yong Jeon Cheong

2019

2

Abstract

This dissertation is a cognitive ethnomusicological investigation regarding how each individual creates his or her own world via different musical behaviors. The goal of this thesis is to contribute to a model of our of time and space from an interdisciplinary perspective. There is a long tradition that we use two cognitive constructs,

‘time’ and ‘space’, when talking about the world. In order to understand how we construct our own worlds cognitively via music-making, I first distinguished two behaviors in music performance ( vs. instrument playing). I looked at how the different modes of music-making shape our body in a distinctive way and modifies our of time and space.

For the cognitive sections (chapters 2 & 3), I discussed not only building blocks of temporal experience but also features of space pertaining to the body. In order to build a comparative perspective (chapter 4), I examined various ancient understandings of time and space in different cultures. In terms of music (chapter 5), I looked at the transformative power of music-making and speculated about potentially different modulatory processes between singing and instrument playing. The discussion in the cognitive sections provided the basic ideas for my ‘Hear Your Touch’ project consisting of two behavioral experiments (chapter 6). I focused not only on two elements of temporal experience: 1) event detection, and 2) perception of temporal order, but also on several ii elements of spatial experience: 1) body space, 2) audio-tactile integration, and 3) space pertaining to hands. Both simple reaction time and temporal order judgment experiments provide supporting evidence for differences in spatiotemporal processing between and non-musicians as as between vocalists and instrumentalists. The simple reaction time experiment suggests that instrumental musical training contributes to enhanced multisensory integration through co-activation. The temporal order judgment experiment indicates not only that musical training changes response to audio-tactile stimuli but also that instrumental training modifies the perception of temporal order.

Compared to non-musicians and vocalists, instrumentalists showed significantly lower absolute and difference thresholds. These demonstrate different effects of specific musical training on our of time and space. My experimental findings support that, although they are often considered as distinctive cognitive constructs (chapter 4), time and space are established together through our bodily experiences. In connection with music evolution (chapter 5), it is highly likely that the use of both vocal and non-vocal in a communication system might have had significant influence on the development of human cognition by transforming our bodies, our perception of, and our action toward the world. This work suggests that there are many musics that allow us to have different worlds.

iii

Dedication

To my family

iv

Acknowledgments

I would like to express my sincere gratitude to my Cognitive guru, Dr.

Dr. Udo Will who has been patiently helping me to find my own path. Without his guidance and supports, I would not have completed my studies here at The Ohio State University.

I thank my committee member, Dr. Georgia Bishop, for giving me have my foundation in neuroscience.

I thank my committee member, Dr. Graeme Boone, for broadening my understanding of .

I thank sincerely Dr. Hyun Kyung Chae for being always supportive of my academic journey.

I thank profoundly Seymour Fink and his wife Beth Owen for having me and my Miss

Daisy as part of their family.

v

I thank my friend, Darrell Joseph who has enriched my Columbus life with his help, humor, and kindness.

I thank my djembe master, Mr. Balla Sy, who initiated me into a new world of music- making.

I thank all of my programmer friends, Stephan Wolf, Tim Vets, Qianli Feng, Gopi

Tummala, Jessie Zhao, Leon Durrenberger, and Jack McHugh. Without them, I could not run my experiments.

I thank people who were willing to be my guinea pigs for tedious experiments.

I thank Dr. McCoy, the director of OSU Voice Teaching and Research Lab, who helped me to have my singer participants.

I thank Nancy McDonald-Kenworthy for her insightful editorial advice.

I thank Steven Brown, Daniel Everett, Lara Pearson, and Sundeep Teki for the quick reprint permission.

vi

I thank the department, the School of Music, and the College of Arts and

Science for the financial supports and all the opportunities for me.

I thank my Miss Daisy to be my perfect lab mate.

Above all, I thank my family who love and support me unconditionally throughout my life.

vii

Vita

2014 – 2018 Graduate Research & Teaching Associate, The Ohio State University

2013 M.A in Musicology & M.M in Composition, The Ohio State University

2006 M.M in Composition, Ewha Womans University

2003 B.M in Composition, Ewha Womans University

Publications

Cheong, Y. J., & Will, U. (2018). Music, space and body: the evolutionary history of vocal and instrumental music. Proceedings of 15th International Conference on and Cognition 10th triennial conference of the European Society for the Cognitive of Music. Montréal, Canada: University

Cheong, Y. J., Will, U., & Lin, Y-Y. (2017). Do vocal and instrumental primes affect word processing differently: An fMRI study on the influence of melodic primes on word processing in Chinese musicians and non-musicians. Proceedings of 25th Anniversary Conference of the European Society for the Cognitive Sciences of Music, 35-39. Ghent, Belgium: University of Ghent

Klyn, N. A., Will, U., Cheong, Y. J., & Allen, E. T. (2015). Differential short-term memorisation for vocal and instrumental . Memory, 24(6). 766-791.doi: 10.1080/09658211.2015.1050400

Fields of Study

Major Field: Music

Area of Emphasis: Cognitive Ethnomusicology

viii

Table of Contents

Abstract ...... ii Dedication ...... iv Acknowledgments ...... v Vita ...... viii List of Tables ...... xii List of Figures ...... xiii Chapter 1. Introduction ...... 1 Chapter 2. Concerning time ...... 7 Time in cognitive sciences ...... 9 Psychological building blocks of time ...... 11 Event detection ...... 11 Perception of temporal order: simultaneity vs. succession ...... 13 Duration perception ...... 17 Duration estimation ...... 21 Psychological present ...... 22 perception ...... 24 Conclusion ...... 26 Chapter 3. Space and music-making bodies ...... 28 Body space ...... 30 Postural schema ...... 32 Superficial schema ...... 33 Body schema vs. Body image ...... 35 Peripersonal space ...... 37 Multisensory integration ...... 40 Body-part centered specificity ...... 41 ix

Sensorimotor coupling ...... 44 Plasticity ...... 45 Two modes of spatial processing: sensorimotor vs. representational ...... 48 Embodied spaces in music-making bodies ...... 50 Chapter 4. The origins of time and space concepts ...... 55 Babylonia ...... 57 India ...... 61 China ...... 71 Greece ...... 88 Mythological views on time and space ...... 90 Paradigmatic views on time and space ...... 92 Conclusion ...... 104 Chapter 5. Transformative power of music-making and the origins of music-making .. 107 Transformative power of music-making ...... 107 The origins of vocal and instrumental music in the human history ...... 117 The vocal and non-vocal communications in the human ...... 121 Prehistory vocal communication ...... 121 Prehistory non-vocal communication ...... 126 The vocal and non-vocal communications in the ...... 131 vocal communication ...... 131 Animal non-vocal communication ...... 142 Functions of animal sound communication and their implications on the origins of music ...... 143 Competition-sexual selection hypothesis ...... 144 Cooperation-social cohesion hypothesis ...... 145 Emotion ...... 152 Music vs. Language ...... 157 Non-vocal language: Speech surrogate ...... 160 The design analysis of vocal vs. non-vocal music and speech vs. speech surrogate ...... 163 Do two modes of music-making transform our experience of the world differently? 199 Chapter 6. Hear Your Touch: Experimental investigation of embodied time and space in music-making ...... 201

x

Simple Reaction Time (SRT) Experiment ...... 205 Methods...... 206 Results ...... 211 Discussion ...... 226 Temporal Order Judgment (TOJ) experiment ...... 231 Methods...... 232 Results ...... 236 Discussion ...... 257 General Discussion ...... 262 Chapter 7. Conclusion and suggestions for future research ...... 268 References ...... 273 Appendix A. Jajangga text with transliteration and translation ...... 304 Appendix B. Simple Reaction Time (SRT) experiment mean reaction time ANOVA table ...... 305

xi

List of Tables

Table 1. Comparison of spaces pertaining to the singing vs. playing-an-instrument bodies ...... 53 Table 2. Design features of language and music ...... 198 Table 3. Simple Reaction Time (SRT) experiment ANOVA summary ...... 212 Table 4. Significant coefficient estimates of GLMM of SRT experiment ...... 216 Table 5. Temporal Order Judgment (TOJ) experiment reaction time (RT) ANOVA summary...... 238 Table 6. Significant coefficient estimates of GLMM of TOJ experiment RT ...... 241 Table 7. TOJ experiment accuracy ANOVA summary ...... 244 Table 8. Significant coefficient estimates of for GLMM of TOJ experiment accuracy . 250 Table 9. PSE and JND estimates for instrumentalists, non-musicians and vocalists ..... 254 Table 10. PSE and JND estimates for ALIGN, LOCATION, and ARM in instrumentalists ...... 255 Table 11. PSE and JND estimates for ALIGN, LOCATION, and ARM in vocalists .... 256 Table 12. PSE and JND estimates for ALIGN, LOCATION, and ARM in non-musicians ...... 256

xii

List of Figures

Figure 1. Unified model of time perception ...... 20 Figure 2. Two parieto-premotor peripersonal space (PPS) networks on a monkey brain template ...... 39 Figure 3. Ancient Mesopotamia geography ...... 58 Figure 4. Taittirīya upaniṣad’s model of continuum of cosmos ...... 66 Figure 5. Sāṅkhya’s model of continuum of cosmos ...... 68 Figure 6. Qin notation - excerpt from Flowing water (Liushui 流水) ...... 85 Figure 7. Human vocalization system ...... 123 Figure 8. Laryngeal duplication and migration model ...... 126 Figure 9. Phase relationship between singing and rocking in The Family’s lullaby performance ...... 150 Figure 10. Semiotic progression model ...... 174 Figure 11. Varying pitch in Karnatic vocal music rendition ...... 179 Figure 12. Three examples of the Kele Drum language and enphrasing technique ...... 184 Figure 13. Extended duality of patterning: Linearity and hierarchy...... 187 Figure 14. Sixteen configurations of the SRT experiment ...... 207 Figure 15. Presentation and registration microcontrollers for SRT experiment ...... 208 Figure 16. BIOPAC tactile stimulator TSD 190 ...... 209 Figure 17. Simple Reaction Time experiment setup ...... 211 Figure 18. Reaction time (RT) for the factor MODALITY of SRT experiment ...... 213 Figure 19. RT for LOCATION:MODALITY interaction of SRT experiment ...... 214 Figure 20. RT for STATUS:MODALITY interaction of SRT experiment ...... 217 Figure 21. RT for STATUS:ARM:MODALITY interaction of SRT experiment ...... 218 xiii

Figure 22. RT for STATUS:LOCATION:MODALITY interaction of SRT experiment ...... 219 Figure 23. Redundancy gain plot: RT for multisensory and unisensory conditions in three participant groups ...... 221 Figure 24. Predicted multisensory facilitation violation of the RMI for instrumentalists ...... 222 Figure 25. Difference in the joint and multisensory cumulative probability for vocalists ...... 224 Figure 26. Difference in the joint and multisensory cumulative probability for non- musicians ...... 225 Figure 27. Group difference in the joint and multisensory cumulative probability between instrumentalists and non-musicians ...... 225 Figure 28. Eighty stimulus configurations of the Temporal Order Judgment (TOJ) experiment ...... 233 Figure 29. Presentation and registration microcontrollers for TOJ experiment ...... 234 Figure 30. TOJ experiment setup ...... 236 Figure 31. RT for the factor SOA of TOJ experiment ...... 239 Figure 32. RT for LOCATION:MODALITY interaction of TOJ experiment ...... 239 Figure 33. RT for ALIGN: MODALITY interaction of TOJ experiment ...... 240 Figure 34. RT for STATUS of TOJ experiment ...... 242 Figure 35. RT for STATUS:SOA interaction of TOJ experiment ...... 243 Figure 36. % correct for the factor MODALITY of TOJ experiment ...... 245 Figure 37. % correct for LOCATION of TOJ experiment ...... 245 Figure 38. % correct for ALIGN :LOCATION interaction of TOJ experiment ...... 246 Figure 39. % correct for LOCATION:MODALITY interaction of TOJ experiment .... 247 Figure 40. % correct for the factor SOA of TOJ experiment ...... 248 Figure 41. % correct for LOCATION:ALIGN:MODALITY interaction of TOJ experiment ...... 248 Figure 42. % correct for the factor STATUS of TOJ experiment ...... 251

xiv

Figure 43. % correct for STATUS:SOA interaction of TOJ experiment ...... 252 Figure 44. % correct for STATUS: MODALITY:SOA interaction of TOJ experiment 252 Figure 45. % correct of ‘sound first’ responses of instrumentalists, non-musicians, and vocalists...... 255 Figure 46. % correct of ‘sound first’ responses for ARM in non-musicians ...... 256

xv

Chapter 1. Introduction

In this dissertation, I intend to propose a cross-cultural model that can explain how music-making affects the cognitive construction of the world we live in, and how music- making contributes to shaping of time and space in both perceptual and conceptual aspects.

Many people have expressed that their experience of the world has been changed by music.

For instance, one of Gabrielsson’s (2011) interviewees described her experience of listening to Tchaikovsky’s Pathétique as “It moved me from the very first note. I felt as if

I was lifted up into another world. Time and space disappeared; perhaps that is what eternity is like” (p. 46). This personal report indicates that music-making transforms our experience of time and space and thereby leads to individually different mental representations of the world. As a cognitive ethnomusicologist, I want to have a holistic picture of the effects of music-making on the cognitive reconstruction of the world. The way I am going to approach the research questions leads me to apply different methodologies. Using a comparative approach, I will discuss how various cultures conceptualize time and space differently. I will also examine the roles of music-making in the establishment of time and space concepts in these cultures. For the cognitive component,

1

I will introduce two behavioral experiments with which I investigated the effects of specific musical training on spatiotemporal processing at the perceptual level.

In order to understand the relation between different musical behaviors and emergence of different world reconstructions, I will first introduce two primary cognitive constructs of the world, time and space, from a cognitive perspective. In chapter 2,

I will discuss six psychological building blocks of time that shape our temporal experience.

They include 1) event detection, 2) perception of temporal order: simultaneity vs. succession, 3) duration perception, 4) duration estimation, 5) the psychological present, and 6) rhythm perception. As a precondition of other building blocks, event detection is the recognition of change happening in the environment. Perception of temporal order allows us to have temporal experience. Whether two events are happening at the same time or successively requires involvement of different processes. Duration perception relates to the detection of state changes, and I will discuss both duration-based and -based timing mechanisms for duration perception. Compared to duration perception, duration estimation is associated with memory. As the duration of experiential process (Fraisse, 1984), the psychological present allows us to keep the multiple percepts as one unit. These psychological building blocks enable rhythm perception.

Chapter 3 deals with spatial cognition with a focus on space as it pertains to the body, and introduces two concepts relevant for this thesis, that of body space and peripersonal space. Body space is the space of the body and peripersonal space is the space immediately surrounding the body. Body space consists of postural and superficial schemata. Superficial schema is related to body surface. For both schemata, a movement

2 component, action, plays an important role. We can see a distinctive character between postural and superficial schemata in terms of perception. The former is associated with proprioception and the latter is connected with touch. Peripersonal space serves two functions of 1) body protection and 2) goal directed action. It is characterized by four features: 1) multisensory integration, 2) body-part centered specificity, 3) sensorimotor coupling, and 4) plasticity. Sensory inputs retrieved from different modalities are integrated.

Various body parts (i.e., head, hand, trunk, etc.) function as different reference frames.

Perceived objects or events via multisensory integration lead to possible actions. The scope of peripersonal space is plastic, for example, use can enlarge it. I will discuss potential spatial differences between singing and instrument-playing. The features of audio-tactile perception and hand-centered space (i.e., perihand space) constitute two conspicuous differences between them.

In order to have a better understanding of diverse views of our worlds, I will discuss time and space as cognitive constructs. I will examine how various cultures conceptualize time and space in chapter 4. For this, I will analyze early documents on cosmology from selected cultures. They inform us of how each culture see time and space because ancient cosmology talks about how temporal order and/or spatial order had been established from chaos. Despite of culturally different views on the world, the fragments all together suggest an importance of rituals where music plays a role, which further implies a significance of human action in the construction of the world that is meaningful for each culture. The ancient texts allow us to understand each culture’s own view of time and space because time and space concepts have changed over human history. My discussion about the origins

3 of time and space concepts is heavily in debt on numerous scholarly works of the ancient

Babylonia, Indian, China, and Greek cosmology and philosophy. My analysis will show not only cultural diversity but also some shared characteristics (i.e., eternity, ephemerality,

& cyclicity) in ancient understanding of time and space.

In chapter 5, I will return to two behaviors in music performance (i.e., singing and instrument playing) and trace back their different evolutionary trajectories. I will try to demonstrate consistency in the distinction between singing and instrument playing from historical, prehistorical, and comparative perspectives. We use vocal and non-vocal modes of acoustic communication in music and language, which are human-specific. Using design features of Hockett (1960a, 1960b), I will analyze vocal music, instrumental music, speech and speech surrogate (i.e., non-vocal form of language) in order to have a better understanding regarding how human language, music, and cognition have evolved together.

Chapter 6 introduces two behavioral experiments that are directly connected to the discussions in chapters 2 & 3. For my ‘Hear Your Touch’ project, I conducted a simple reaction time (SRT) and a temporal order judgment (TOJ) experiment. The most important question of experiments concerns the effects of specific musical training on spatiotemporal processing. Therefore, I recruited three different participant groups, vocalists, instrumentalists, and non-musicians In terms of psychological building blocks of time, the

SRT experiment is based on event detection. For this, I delivered unisensory auditory, unisensory tactile, and multisensory audio-tactile stimuli. The TOJ experiment is based on a perception of temporal order. For this, audio- tactile stimuli are coupled and presented in a pair but with different stimulus onset asynchronies. In connection with chapter 3, both

4 experiments consider several features of space pertaining to the body. In terms of body space, the experimental task of crossed vs. uncrossed arms was used to test effect of postural schemata. In terms of peripersonal space, I focused on audio-tactile integration and hand-centered specificity. For this, I presented audio stimuli near the hand and tactile stimuli on the hand. The experimental results show supporting evidence for the different effects of specific musical training on spatiotemporal processing.

Pickering (2017) introduced the term ‘different worlds’ to mean “the fact that other social groups understand and act in the world different from ‘us’” (p.2). He further pointed out that ‘different worlds’ are associated with plural and argued that ontologies are not schemes of classification and representation but enacted or performed in practice.

My discussion on time and space as cognitive constructs and my experimental finding of differential spatiotemporal processing between vocalists and instrumentalists seem to be in line with Pickering’s (2017) notion of different worlds. This dissertation shows that time and space arise from our minds and bodies that can be shaped in a distinctive way through specific modes of musical practice. Although there are multiple ways of music-making

(e.., performance, listening, writing, dancing, etc.), this dissertation shows that, at least in music performance, two different modes (i.e., singing vs. instrument playing) allow us to have multiple mental representations of the world. It recalls my conversation with Dr.

McCoy, the director of Voice Teaching and Research Lab. While I conducted my experiments, he helped me with recruiting singer participants. The other day I ran into him in the elevator. I told him “I found singers are different from instrumentalists in my

5 experiments”. Then he said “It isn’t surprising at all, is it?”. There are indeed many worlds and many musics.

6

Chapter 2. Concerning time

I wondered to what extent the cosmic phenomena of night controlled our periods of sleep and activity. In short, I wanted to investigate time - that most inapprehensible and irreversible thing. I wanted to investigate the notion of time which has haunted humanity since its beginning.

From Michel Siffre’s Beyond Time (1964, p. 25)

What is time? Before being asked this question, most of us think that we know time well. A report by a French geologist, Michel Siffre, challenges our naivety on time. In 1962,

Siffre spent two months of a hot summer in a of the Alps to conduct underground experiments while being absolutely isolated from the external world. He might have thought that he had comprehended time until he stepped out of the cave and discovered that he lost track of time. He was twenty-five days behind compared to the outside.

Personally, a lecture by my college professor, Dr. Lee, prompted me to think seriously about time. As a composition student, I sat in his ‘Advanced Music Analysis’ class. One day arguing that music is the art of time, he asked us a series of questions about the relationship between time and music. One of his questions I have never forgotten is whether my composition wastes the time of other people. Dr. Lee, a follower of Arnold Schönberg

(1874 - 1951), Theodor W. Adorno (1903 - 1969), and a student of Milton B. Babbitt (1916

- 2011), emphasized that we, , should be responsible for what we write. While regarding ‘craftmanship’ as the virtue of a , he said that a composer should

7 respect others’ time. According to him, bad music can steal five hours of fifty people’s precious time with a six-minute long piece. Although I accepted his view on a composer’s responsibility to write good music, I could not agree with his perspective on time. To me, it seems that he ignored the subjectivity of human experience of time and the influence of music on it. In terms of subjective experience of time and music, it is common that people say different things about the same music. As shown in the introduction section, listening to Tchaikovsky’s Pathétique moves some people. Other people may not listen to it in the same way. They may say “Suddenly a terrible feeling of confusion came upon me. In what

I believe was only a few seconds, a whole series of conflicting thoughts began rushing around in my mind” (Gabrielsson, 2011, p. 143). This leads me to keep questioning what time is and how music changes our experience of time. Now as a cognitive ethnomusicologist and researcher of time, I investigate how we experience time, how time relates to music, and how music transforms human temporal experience from different perspectives rather than a composer’s view.

To begin with, I would like to examine definitions of time from two widely- accepted dictionaries in order to get general ideas about how time has been seen in the modern world. According to Encyclopædia Britannica, time is “a measured or measurable period, a continuum that lacks spatial dimensions” (Markowitz, Smart, & Toynbee, n.d.).

This definition reflects three features of time. It is 1) quantifiable, 2) a period or duration that exists in the world, and 3) independent from space. Citing various sources including

Oxford, Merriam and Webster, Collins, American Heritage, internet Encyclopedia of

8

Philosophy, and The Stanford Encyclopedia of Philosophy, Wikipedia defines time more comprehensibly:

Time is the indefinite continued progress of existence and events that occur in apparently irreversible succession from the past through the present to the future. Time is a component quantity of various measurements used to sequence events, to compare the duration of events or the intervals between them, and to quantify rates of change of quantities in material reality or in the conscious experience. Time is often referred to as the fourth dimension, along with the three spatial dimensions.

Similar to Encyclopædia Britannica, time as presented by Wikipedia is quantifiable.

The duration of events is the measurable component of time. Wikipedia also distinguishes time from space and implicates that time and space are two components that constitute the world. Furthermore, Wikipedia considers time has an absolute existence with a linear sequence of past, present, and future. It is interesting to me that Wikipedia alludes to the existence of time in both external (i.e., material reality) and internal (i.e., conscious experience) worlds. This examination of definitions of time show how confusing our concept of time is. In the following discussion, my focus is the last feature of time in

Wikipedia, that is, how time is experienced in human mind. For this, I will discuss the notion of time in cognitive sciences and examine psychological building blocks of time.

Time in cognitive sciences

In the late 19th century, the term ‘time sense’ (Zeitsinn in German; sens du temps in

French) referred to the apprehension of any attributes of temporal experience (e.g., duration, change, order of events, etc.). It was studied with an instrument called a ‘time-sense apparatus’ in order to determine the accuracy of time estimation. The early time researchers sometimes had equated time sense with time perception. According to Roeckelein’s (2000, 9

2008) review of the concepts of time in , however, time sense was applied to the capacity of apprehending the attributes of time while time perception denoted specific occurrences of apprehending. Later, time researchers rejected the term ‘time sense’ due to the misrepresentation of the term ‘sense’. Ornstein (1969) pointed out, for instance, that there is no specific sensory that processes time information. He also noted that the perception of duration differs in different . In line with Ornstein, Friedman (2000) remarked that there is no sensory organ that receives temporal stimulation in the way that eyes or transduce light or sound respectively. He argued that time perception is just a metaphor. Current cognitive science studies about time are heavily obliged to Paul Fraisse

(1911 - 1996). According to him, we do not have direct experience of time but only of sequences and rhythms, therefore there is no time sense (Roeckelein, 2000, 2008). Fraisse

(1984) clarified the notion of time:

The notion of time applies to two different concepts which may be clearly recognized from our personal experience of change: (a) the concept of succession, which corresponds to the fact that two or more events can be perceived as different and organized sequentially; it is based on our experience of the continuous changing through which the present becomes the past; (b) the concept of duration, which applies to the interval between two successive events. Duration has no existence in and of itself but is the intrinsic characteristics of that which endures. (p.2)

Here a prerequisite of both succession and duration is event detection. This is the most fundamental one among psychological building blocks that allows us to have temporal experience (Will, 2017). In the following section, I will discuss six psychological building blocks that include 1) event detection, 2) perception of temporal order, 3) duration perception, 4) duration estimation, 5) psychological present, and 6) rhythm perception.

10

Psychological building blocks of time

Event detection

We all live in the world which continuously changes. Change of the world is the precondition of our experience of time. Studies about human temporal experience have shown an importance of perceptible changes, that is, the interaction between senses and the external world. In the caveman story above, for instance, Siffre completely isolated himself for sixty-three days. No access to external cues from the environment (e.g., light) disturbed his biological clock. His biological cycle of sleep-wake was prolonged due to an absence of outside cues. Therefore, his circadian clock was de-coupled from the external cycle of day and night. At the end of the experiment, researchers of the caveman study also observed that Siffre’s estimation of two minutes in the cave took actually five minutes in addition to the twenty-five day difference between the actual calendar and Siffre’s estimation of the date. It demonstrates that both sensory deprivation and biological changes affect significantly our experience of time.

We do not detect every single change as an event. Sensory organs detect events depending on features that are specific to each modality. A change of features should reach a threshold to be perceived as an event. Visual features are luminance, size, and position, etc. Auditory features include amplitude, , and source location, etc. Tactile features contain temperature, texture, and size, etc. Additionally, an interaction between sensory information retrieved from different modalities affects our perception of events.

Using a simple reaction time, researchers reported a faster reaction to multisensory inputs

11 than unisensory inputs. This is known as the Redundant Effect. I will investigate this facilitatory effect of multisensory stimuli in chapter 6.

Not all detectable events exist only in the external world; some are generated internally. For example, memory can drive some sensations. Anderson-Barnes, McAuliffe,

Swangerg, & Tsao (2009) proposed proprioceptive memory in order to explain the symptoms of phantom limb pain with which patients report lingering sensations after amputation or severe pain of their missing limbs. According to the authors, proprioceptive memory refers to long-term memory where proprioceptive information has been stored sub-consciously. In other words, this long-term memory contributes a generation of phantom limb pain. Music imagery and are other examples of internally generated detectable events. Composers have musical imagery. When writing my first orchestral piece Color of Sky, I heard in my mind a piercing that followed by strings playing a major chord. As Sacks (2007) reported in his book , earworms, sticky music or catchy tunes in the mind, can be a compulsive, even pathological phenomenon.

In sum, event detection is one of the fundamental psychological building blocks of human temporal experiences. We detect events, changes in the world, via our senses. Event detection is based on a feature analysis of stimuli. Stimuli can be either external or internal.

Memory, imagery, etc., may generate internal ones.

12

Perception of temporal order: simultaneity vs. succession

If multiple events are detected, how does our mind organize them? Let me take a musical example. Imagine two different notes, namely, C and E. When these notes are sounding simultaneously, we call it a chord. If they are sounding successively, we call it a melody. As this example shows, the perception of event order is important for us to construct time because the temporal order gives us hints of how to interpret detected events.

Simultaneous events seem to require a simpler process compared to successive events because they may be causally related, which indicates an involvement of different processes.

Simultaneity

Citing Poincaré who had claimed that the perception of simultaneity is infinite and omnipresent intelligence, Fraisse (1963) pointed out that the order of events in the physical world does not correspond to the order that our sensory organs detect. For instance, we hear the sound of thunder after seeing lightening, although both thunder and lightening happen at the same time in the same place. One might argue that our different perceptions are due to the difference in speed of transmission between sound and light. However, it is important to note that sensory organs have their own modality-specific receptors. Each receptor has a different time interval to convert external stimuli to electric and/or chemical so neurons can transmit transduced signals to the central nervous system. Moreover, some senses have more than one receptor type. This suggests that, even in one sensory modality, different conduction times may exist. For instance, the somatosensory system

13 has four different types of mechanoreceptors. They are anatomically distinctive and their response behaviors are different. Specifically, the mechanoreceptors show different firing rates depending on the stimuli to which their neural response adapt (Gescheider, Wright,

& Verrillo, 2009). In particular, there are several types: the Meissner corpuscle (i.e.,

Rapidly Adapting (RA)), the Merkel neurite complex (i.e., Slowly Adapting (SA) type I), the Ruffini corpuscle end organ (i.e., SA type II), and the Pacinian corpuscle. The somatosensory system has four groups of the peripheral axons. Each group is classified by a type of somatosensory information and a thickness of axon myelin. The thicker the myelin is, the faster the conduction velocity is. Carrying proprioceptive information, Group

I has the thickest myelin. Group II is associated with touch information. Group II has smaller diameters than Group I does. Group III involves pain and temperature. Group III is less myelinated than Group II. Unmyelinated Group IV carries information regarding temperature, pain, and itchiness (Bautista & Lumpkin, 2011).

Minimum spacing between events is required to identify them as separate events.

This spacing, called a simultaneity threshold, varies depending on types of sensory modalities and stimulus features as discussed above. Since the late 19th century (e.g., Exner,

1875 cited in Hirsh, 1959; Hirsh & Scherrick, 1961), the has been considered to have the most superior acuity in the processing of temporal order.

Researchers have noted that, in the auditory system, events separated by less than 2 to 3 ms are perceived as simultaneous if duration of stimuli is brief (e.g., two clicks) (Hirsh,

1959). In their classical study on the cross-modal order perception, Hirsh & Scherrick

(1961) concluded that the auditory system is the one with the best temporal resolution,

14 tactile is less than auditory, and visual has the lowest resolution in order to decide whether two stimuli are simultaneous or not. According to Occelli, Spencer, & Zampini’s 2011 review on audio-tactile temporal order judgment studies, Gescheider performed a series of experiments to measure auditory and tactile temporal resolution for the first time in the late

1960’s. He concluded that simultaneity thresholds for auditory and tactile are about 2 ms and 10 ms respectively.

If two events are perceived as simultaneous although being physically apart, then it means that the events are fused. Multiple features of stimuli can affect fusion. Fusion occurs due to neural responses. For the auditory system, binaural fusion is associated with the subjective perception of one or two acoustic events. In other words, each is independently stimulated at the peripheral level, but the brain does not distinguish two events within a certain time window. In binaural fusion studies, stimuli are binaurally presented with interaural time or interaural intensity differences. In experiments investigating binaural fusion, duration and amplitudes of stimuli are controlled because these features allow us to detect an acoustic event. Arguing that a temporal perception of a sound’s onset is influenced by its duration, its spatial configuration, and its intensity,

Schimmel & Kohlrausch (2008) investigated the onset perception of binaural sounds and the role of interaural differences. They noted a systematic relationship between perceived and physical onsets in connection with interaural differences: a longer duration of a sound delays the perceived onset whereas a shorter one does not have a significant delay. This finding implies that the perceptual synchrony can be achieved when a long stimulus starts earlier than short one.

15

Succession

Regardless of types of sensory modalities, we can determine the order of successive events. Early studies reported that at least 20 ms is required to decide the order of two events (Hirsh & Scherrick, 1961; Fraisse, 1984). Interestingly, there is a time window called the temporal fusion within which we can detect multiple events but cannot determine the order of successive events. The temporal fusion occurs between simultaneity (i.e., 2 to

3 ms) and temporal order thresholds (i.e., 30 to 50 ms). Gescheider found 30 ms as threshold for temporal order decision in both auditory and tactile systems (Occelli, et al.,

2011). Using the temporal order judgment task, I investigated the audio-tactile temporal order threshold (see chapter 6). With more than three events, the threshold increases to 100 to 300 ms depending on the relatedness of the events (Will, 2017).

According to Fraisse’s 1984 review, a decision of the temporal order of multiple events requires an integration of various mechanisms including attention and perception, etc. For example, decision time of the order of successive events may correspond to the length of time that is required to attend the sequence of stimuli. This explanation is based on the attention-switching hypothesis. According to the discrete moment hypothesis, however, the decision time involves two separate processes: 1) the features of stimuli affect event detection, and 2) the detected events should be treated separately in order to determine the orders of events. This perspective postulates that the perception of time is not continuous but discrete. The discrete moment hypothesis also posits that events below the temporal order threshold cannot establish discrete functional units, so temporal integration occurs. It is important to note that each event should be marked as a functional

16 unit not to fuse the order of the events. This implies that the threshold of temporal order varies depending on what processes that are involved to build a functional unit for each event. Building a functional unit for temporal order can be associated not only with perception (e.g., stimulus features, the relatedness across events, etc) but also with action

(e.g., sensorimotor synchronization) (Will, 2017).

To conclude, the mind classifies multiple events in three ways, 1) simultaneous events, 2) distinct but not temporally ordered events (i.e., temporal fusion), and 3) temporally ordered successive events. Events are fused below the simultaneity threshold even when they are physically separated. Beyond the simultaneity threshold, events can be perceived with or without temporal order depending on whether the events reach temporal order threshold. Temporal fusion refers to the fact that people can detect multiple events but cannot decide on the order in which the events happened. Determination of temporal order involves several processes including event detection, attention, discretion of functional units, etc. An establishment of functional units relates to processes of perception and/or action.

Duration perception

Duration, the interval between two successive events, is an intrinsic characteristic of an organization of succession (Fraisse, 1963, 1984). Given that we perceive changes through our senses (i.e., event detection), the term ‘duration’ can be generalized as the length from one state-change to the next. There are two types of durations: the duration between state changes and the duration (extent) of a state change itself. These durations are

17 frequently labeled as ‘empty’ and ‘filled’. An empty interval is bounded by two successive brief sensory signals. A filled interval is characterized by a continuous signal that lasts from its onset to offset. Since the late 19th century (e.g., Meumann, 1896 cited in Fraisse,

1963), researchers have reported a phenomenon called filled duration (FDI). Filled intervals are perceived longer than empty ones although the intervals are of equal length.

FDI can be explained by three components of the scalar timing theory: clock, memory, and comparator. Clock consists of a pacemaker that generates regular pulses. Memory stores the number of pulses generated by a pacemaker. Comparator determines whether the current number of pulses matches with that stored in memory. Accepting the scalar timing theory and focusing on a pacemaker rate, Wearden, Norton, Martin, & Monford-Bebb

(2007) argued for difference in pacemaker rates between filled and empty intervals. The authors discussed that a rate for filled stimuli is faster than empty ones. In other words,

FDI occurs because filled intervals accumulate more pulses than empty ones do.

Additionally, the perceived duration can be influenced by various factors. In terms of features of signals, stronger stimuli seem to lengthen the perceived duration compared to weaker ones (e.g., Goldstone, Lhamon, & Sechzer, 1978, cited in Frassie, 1984). By modifying Israeli’s 1930 study, Nakajima, ten Hoopen, & van der Wilk (1991) first demonstrated an called the time-shrinking illusion, that is, a substantial underestimation in duration judgment. When two auditory empty intervals are presented serially, participants’ judgment of the second interval is substantially influenced by the duration of the first. When the first interval is shorter than the second one, the second duration is remarkably underestimated. Investigating the influence of actions on the

18 perceived duration, Press, Berlot, Bird, Ivry, & Cook (2014) argued that a sensory prediction mechanism, essential for action, can distort the perceived duration of sensory events produced by actions. They also demonstrated that we overestimate the duration of our own actions.

There are two distinct timing mechanisms that have been proposed for duration perception: one is absolute, duration-based timing, and the other is relative, beat-based.

The former is associated with discrete encoding of the absolute duration of each time interval (ΔTi). The latter has been alluded to by a founding father of American psychology,

William James. In his discussion of empty time, James (1890) wrote, “subdividing the time by beats of sensation aids our accurate knowledge of the amount of it (time) that elapses”

(vol.1, p.619). This beat-based timing presumes not only that duration perception is facilitated by a regular beat but also that individual time intervals are encoded relative to the beat (ΔTi /Tbeat). Investigating the relationship between timing and control of movements in the cerebellum, Ivry, Spencer, Zelaznik, & Diedrichsen (2002) proposed event timing that is based on explicit temporal representation. In contrast, temporal regularitiy without explicit temporal representation is achieved via emergent timing which requires emergent or secondary properties to control motor movements. In other words, event timing is associated with a discrete representation of a time interval. It is needed for the temporal control of a series of discrete movements. Emergent timing relates to the processing of a temporal regularity or a beat. It is transformed into another control parameter that allows continuous movement or perform beat-based timing tasks in experimental set-ups (e.g., rhythm discrimination task in Grahn & Brett, 2007;

19 judgment task in Grahn & McAuley, 2009). Teki, Grube, Kumar, & Griffiths (2011) argued distinctive brain areas subserving 1) perception of absolute duration of discrete time interval, and 2) perception of time intervals relative to a regular beat. They found that duration-based timing is correlated with neural activation in the olivo-cerebellar network whereas beat-based timing involves the striato-thalamo-cortical circuits (see fig.1). The olivo-cerebellar network includes the inferior olive, the cerebellar lobules IX and X, the vermis, and the deep cerebellar nuclei such as the dentate nucleus, the superior temporal gyri, and the cochlear nucleus. The striato-thalamo-cortical network consists of the putamen, the caudate nucleus, the thalamus, the supplementary motor area, the dorsal premotor cortex, and the dorsolateral prefrontal cortex. Teki, Grube, & Griffiths (2012) proposed a unified model of time perception in which the neural networks for both duration-based and beat-based timing mechanisms are interconnected and mediate for precision to process the timing signal.

Figure 1. Unified model of time perception (Teki et al., 2012): = the striato-thalamo- cortical network; = the olivocerebellar network; Orange = dopaminergic pathways; = inhibitory projections; Solid = excitatory projection; Dashed black = anatomical connections; IO = inferior olive; VTA= ventral tegmental area; GPe = external globus pallidus; GPi = internal globus pallidus; STN= subthalamic nucleus; SNpc = substantia nigra pars compacta; SNpr = substantia nigra pars reticulata: Reprint permission granted by the first author. 20

To conclude, studies of perceived duration have demonstrated various types of distortions in time interval judgment (e.g., filled duration illusion, time-shrinking illusion, etc.). Timing mechanisms proposed for duration perception include duration-based and beat-based in the olivocerebellar and the striato-thalamo-cortical circuits respectively.

Duration estimation

In his The Principles of Psychology, James (1890) established a foundation of the concept of time by making a distinction between experienced time and remembered time, although some of his terminology (e.g., time sense) is questionable. As James (1890) wrote

“I shall deal with what is sometimes called ‘internal perception’ or ‘the perception of time’, and of events as occupying a date therein, especially when the date is a past one, in which case the perception in question goes by the name of ‘memory’” (vol.1, p. 605). He dealt with these two aspects of temporal experience in the chapters entitled “The Perception of

Time” and “Memory”. Experienced time is based on sensory inputs and associated with the psychological building blocks of time. In contrast, remembered time involves internally generated events in the mind. According to James (1890), these events created by the mind are ideas as he noted “To remember a thing as past, it is necessary that the notion of ‘past’ should be one of our ‘ideas’” (vol.1, p. 605). Our chronological construction of time is based on the ideas that are connected like a string of beads.

Similarly, Fraisse (1984) distinguished duration perception from duration estimation on the basis of the involvement of memory. In his discussion about duration estimation, Fraisse (1984) did not specify a type of memory, but noted “estimation of

21 duration takes place when memory is used either to associate a moment in the past with a moment in the present or to link two past events, whereas perception of duration involves the psychological present” (p. 9) and “it will nonetheless be seen that in the case of durations which go beyond perception, new problems arise” (p. 19). This quote refers to the involvement of sensory, working and/or long-term memory. Researchers have considered estimation as one form of time judgment in addition to reproduction and production in experimental contexts (Fraisse, 1984; Friedman, 2000; Grondin, 2010).

These forms have been further explored in prospective and retrospective timing paradigms.

In a prospective timing task, participants are informed in advance that they will be asked to make a time related judgment. In a retrospective timing task that is associated primarily with memory, participants do not know in advance they will be asked to judge an interval of time. They are asked to judge the remembered duration. Methods of both prospective and retrospective paradigms include verbal estimation, reproduction, and production of an interval of time.

Psychological present

The discourse on the psychological present has shown that it has many names including the ‘specious present’, the ‘sensible present’, the ‘psychic present’, the ‘mental present’, the ‘actually present’, the ‘perceived present’ (Fraisse, 1963), and the ‘subjective present’ (Pöppel, 2009). We can find the origin of these terms in The Principles of

Psychology. James (1890) introduced Mr. E. R. Clay’s term ‘specious present’. Clay noted

“Time, then, is considered relatively to human apprehension, consists of four parts, viz.,

22 the obvious past, the specious present, the real present, and the future” (James, 1890, vol.1., p. 609). It is unclear how Clay made a distinction between the specious present and the real present. After his speculation about the specious present and its duration, James proposed the sensible present as “the original paragon and prototype of all conceived time is the short duration of which we are immediately and incessantly sensible” (1890, vol.1, p. 631). The mental present by Piéron (1923) is “a durable present…in which we apprehend a succession of diverse facts in a single mental process which embraces, in the present, a certain interval of time” (Fraisse, 1963, p. 86). Currently, the most widely used term is the psychological present. Fraisse described it in various ways including “the duration of experiential process” (1984, p. 10) and “the duration of the organization of stimuli which we perceive as one unit” (1963, p. 84). In other words, the multiple percepts are kept as a whole within the psychological present. For this, several mechanisms work together.

Sensory memory, allowing us access to information obtained from the senses, plays an important role for the psychological present. Other mechanisms including affect (Fraisse,

1963), attention (Fraisse, 1963; Allman & Mareschal, 2016) and working memory (Fraisse,

1984; Allman & Mareschal, 2016) contribute to the integration of various percepts.

Given the discussion above, a question comes up regarding how long the psychological present lasts. As the involvement of multiple mechanisms implies, duration of the psychological present is plastic. Fraisse (1984) asserted that there is no fixed duration for the psychological present but it ranges from 100 ms to 5 s. After converging the results from different experiments, the psychological present is currently considered to last approximately 2 to 3 s and its upper limit is about 5 s (Pöppel, 2009). The shorter duration

23 of the psychological present relates mainly to sensory memory whereas the longer duration suggests the involvement of additional temporal mechanisms such as affect, attention, and working memory (Pöppel, 2009).

Rhythm perception

Rhythm refers to the perceived and conceived temporal relationships of successive events. Rhythm perception is complex because it encompasses the psychological building blocks we discussed above. Within the width of the psychological present we can directly perceive rhythm but, beyond the psychological present, we do not conceive a sequence of successive events as temporal gestalts. Following the early time studies in which rhythm was discussed with regard to human movements (e.g., Mach, 1865 & Vierordt, 1868, cited in Fraisse, 1982), Fraisse (1982) focused on tempo, a perceptual aspect of rhythmic organization. He further explained that rhythm generated by periodic activities (e.g., walking, swimming, etc.) relates to spontaneous tempo, that is, the tempo that people select when they are asked to tap with minimal instructions. MacDougall & Moore (2005) found the temporal spectrum of human locomotor movements shows a dominant periodicity of 2 beats/sec, corresponding to 2 Hz, which is known as resonant frequency or internal periodicity. This can be used as a reference in perceptual and motor tasks for timing.

Interestingly, van Noorden & Moelants (1999) found that the tempo of many musical pieces is centered around periodicity of 2 beats/sec (i.e., 2 Hz). In his analysis of qin performance, Will (2014, 2018) argued that the performer’s action on this instrument also reflects a resonance frequency of body movement, that is 2Hz.

24

Noting that synchronization is fundamental in dancing and music-making (e.g.

McNeill’s (1995) muscular bonding), Fraisse (1982) argued that sensory inputs affect the periodic movements via synchronization in which temporal anticipation, not regularity of events, plays an important role. In terms of temporal anticipation, tapping experiments have reported that participants’ taps tend to precede isochronous tones by a few tenths of milliseconds. This is known as negative mean asynchrony (for a review, see Repp, 2005;

Repp & Su, 2013). This phenomenon can be explained by how external and internal periodicities interact. The relationship between external and internal periodicities has been investigated in entrainment studies. Entrainment refers to the process by which two or more independent temporal systems synchronize with each other through mutual interaction

(Kung, 2017). Jones (1976) and her colleagues (Jones & Boltz, 1989; Large & Jones 1999) argued that attentional cycles mediate these two periodicities. Clayton, Sager, & Will (2005) introduced the concept of entrainment to ethnomusicology. Their chronometric analyses on music performance from different cultures showed the influence of cultural and personal factors on entrainment.

In terms of rhythm processing, a study by Hung (2011) explored whether acoustic rhythms are processed differently depending on type of the sound source. For the past few decades, cognitive studies have consistently reported that vocal sounds are processed differently from non-vocal ones. Neuroimaging studies have observed different activations in response to human vocal sounds in the superior temporal sulcus and the superior temporal gyrus than to non-vocal sounds (Belin, Zatorre, Lafaille, Ahad, & Pike, 2000;

Belin, Zatorre, & Ahad, 2002; Belin & Zatorre, 2003). A magnetoencephalographic study

25 by Gunji et al. (2001) showed that greater source strength of a sustained field is elicited by the vocal stimuli around the Heschl’s gyri. In line with those studies, Lee, Peelle, Kraemer,

Lloyd, & Granger (2015) reported that the brain activity pattern for a human voice is distinguishable from that for non-vocal sounds in the superior temporal gyrus. Levy and his colleagues suggested neurophysiological evidence for differential processing between vocal and non-vocal sounds. They identified a voice specific response, a positive component with a latency around 320 ms only for voice (Levy, Granot, & Bentin, 2001,

2003). Expanding voice sensitivity and specificity into a temporal domain, Hung (2011) demonstrated differential processing between vocal versus instrumental rhythms functionally and behaviorally. Extending her finding to a memory domain, Klyn, Will,

Cheong, & Allen (2015) showed behavioral evidence for differential memorization between vocal and instrumental rhythms.

In sum, rhythm perception involves not only mechanisms for all psychological buildings of time but also biological (i.e., resonance frequency of bodily movements), and cultural factors (e.g., Clayton et al., 2005) In terms of acoustic rhythm perception, it interacts with the source of sounds.

Conclusion

Our experience of time emerges from the way we experience the world that is ever- changing. In other words, time arises from our interaction with the world that constantly changes. The caveman experiment shows how disconnection from the external world confuses not only the body but also the mind. In order to understand how our mind

26 constructs time, I examined various psychological building blocks of time including 1) event detection, 2) perception of temporal order: simultaneity vs. succession, 3) duration perception, 4) duration estimation, 5) the psychological present, and 6) rhythm perception.

As mentioned briefly, I investigated event detection and perception of temporal order with my “Hear Your Touch” project and studied empirically the effect of different types of musical training on these two building blocks with behavioral experiments (see chapter 6).

This discussion suggests that both perceptible changes of environments in the range of the psychological present and our action and rhythmic bodily movements play a role in our experience of time. My examination of these psychological building blocks of time also demonstrates that temporal experience relies on multiple cognitive processes of perception, attention, memory, etc., that are intertwined in a complex way.

27

Chapter 3. Space and music-making bodies

Space plays a role in all our behavior. We live in it, move through it, explore it, defend it. We find it easy enough to point to bits of it: the room, the mantle of the heavens, the gap between two fingers, the place left behind when the piano finally gets moved.

From O’Keefe & Nadel’s Hippocampus as a Cognitive Map (1978, p. 5)

I sit in the arm chair in the living room and place my MacBook on my lap. I see my dog, Miss Daisy, enjoying a sunbath at her favorite spot of the front patio. I hear the siren of a firetruck passing by. I smell a fresh brownie on a plate. I touch and grab a coffee mug on the side table. Vision lets me know where Daisy is and how close my laptop is to the mug. Olfaction informs me of the existence of coffee and the brownie around me. Sweet smells from the coffee and brownie make me localize where they are and then lead me to reach for them. So, I enjoy them without spilling coffee over the laptop. As shown in this ordinary scene, integrated sensory information establishes space. Inputs retrieved via different sensory modalities guides us what to do next. Knowing not only where things are with regard to the body but also how to interact with an environment gives rise to a meaningful perception of space.

Although we conceive of space as a homogenous phenomenon, it emerges from multiple sensory data. In other words, a unitary mental representation of space is constructed by inputs from all sense organs. However, it is important to note that there are

28 significant differences in roles among the sensory modalities in terms of space. Compared to other senses, the somatosensory system plays a distinctive role when it comes to space pertaining to the body. Other sensory modalities give spatial information outside body whereas the somatosensory system directly relates to the space of the body. It is not only because proprioception, a part of the somatosensory system, gives us the sense of the position of the body and its parts but also because touch, another part of the system, requires direct physical contacts with objects in the environment. Skin, as the only sensory organ entirely exposed to the environment, collects information from the surface of the body and proprioception. Writing “active touch refers to what is ordinarily called touching. This ought to be distinguished from passive touch, or being touched.... Active touch is an exploratory rather than a merely receptive sense” (p. 447), Gibson (1962) made interesting points about active touch. That is, direct contact to the surrounding via the somatosensory system has a close tie with movement.

In the following section, I will first introduce two types of space pertaining to the body, that is, body space and peripersonal space. Each space will be examined with its components. Firstly, body space consists of two schemata: postural and surface-related (i.e., superficial) schemata. In terms of perception, the schemata are associated with proprioception and touch respectively. In the history of research about space pertaining to the body, body schema has been used interchangeably with body image. I make a distinction between the two terms, ‘body schema’ and ‘body image’. Secondly, peripersonal space, that is, space immediately surrounds body, is characterized by 1) multisensory integration, 2) body-part centered specificity, 3) sensorimotor coupling, and

29

4) plasticity. Thirdly, I briefly examine two different processes of spatial information: sensorimotor and representational modes. Finally, I will apply my discussion about the space pertaining to the body of singing vs. instrument playing in order to introduce two types of embodied spaces in music-making.

Body space

In terms of space, the body is a special object. Similar to our temporal experience, we perceive the body on the basis of sensory inputs collected internally and externally, which contributes to an establishment of space. One move’s one’s body, which changes the space outside body (i.e., external space). Although the perception of the body remains often un- or subconscious; sometimes, some practices like playing music, dancing or doing yoga let us perceive our body consciously. For example, I have been practicing yoga for years. I have learned the grudasana (i.e., the eagle pose) for which I put my left leg on top of the right while placing my right arm on top of the left, or vice versa. When I learned the eagle pose the first time, my standing leg was wobbling but now I can take this pose quite firmly. It was interesting for me to see how my everyday practice consolidates the memory of the pose. In other words, training can alter the awareness of the bodily experience of space to be un- or subconscious. This shows the complexity of our experience of space because it is associated with many psychological processes (e.g., perception, consciousness, memory, etc.) and other contributing factors (e.g., training, culture, etc.).

The perception of the space of the internal body also relies on the somatosensory system. When feeling sick, for example, we can tell whether a stabbing-like pain comes

30 from the head or stomach, although internal localization of pain is not as precise as that of touch. In spite of the spatial components of the sub-modalities of our somatosensory system

(e.g., temperature & pain), as Berlucchi & Aglioti (2010) noted, touch and proprioception have been the main concerns in the research field of the body and space. I will follow this tradition and concentrate on touch and proprioception. The reason for this is that one of the goals of this dissertation is to have a better understanding of space pertaining to the music- making body. Spaces emerging from singing and instrument playing bodies are different and music-making seems to be primarily associated with touch and proprioception, although pain may accompany the mastery of certain musical instruments.1 Other sub- modalities of the somatosensory system may be less relevant than proprioception and touch in terms of music-related behaviors. In terms of body space, the first constituent of space pertaining to the body, I would like to introduce the concept of body schema, consisting of a postural schema and a surface-related (i.e., superficial) schema which are primarily associated with proprioception and touch respectively. I will discuss differences between body schema and body image in detail, which would minimize the confusion between these two terms.

1 While writing this dissertation, I have been taking djembe lessons from Mr. Balla Sy, a director of GOREE Drum and Dance . Playing a djembe is associated with pain. It causes my hands to bleed due to cuts. When thick calluses are established, there is no longer pain (only fun!) and hands become insensitive to touch. I found that my arm movement was automatized. Once the sequence of arm movement is internalized, I controlled my performance only with acoustic information (e.g., , slap, & ). In other words, my playing is not based on touch. 31

Postural schema

Philosophers and have been intrigued by the question of how the body is represented in the mind. Early studies suggested the term ‘coenesthesia’ for a sense of body arisen from deep sensory impressions of the viscera, muscles, joints, and skin (e.g.,

Brissaud 1895; Deny & Camus, 1905, cited in Maravita, 2006). In other words, coenesthesia designates bodily sensation. The French neurologist Bonnier (1893, cited in

Maravita, 2006) proposed the term ‘schema’ to indicate some spatial qualities in addition to the bodily sensation. Bonnier’s (1893) schema involves the spatial orientation and localization of the body with regard to external objects. However, the term ‘body schema’ became famous with the work by Head & Holmes (1911-2):

For this combined standard, against which all subsequent changes of posture are measured before they enter consciousness, we proposed the word “schema.” By means of perpetual alternations in position we are always building up the postural model of ourselves which constantly changes. Every new posture or movement is recorded on this plastic schema, and the activity of the cortex brings every fresh group of sensations evoked by altered posture into relation with it. Immediate postural recognition follows as soon as the relation is complete. (Head & Holmes, 1911-2, p. 187)

Head & Holmes (1911-2) described several important aspects of the postural schema, one component of body schema. First, the postural schema does not involve consciousness. Rather, it is associated with sub- or un-conscious awareness of body postures and movements. For some researchers (e.g., Gallager), an involvement of consciousness in bodily sensation is a criterion of body schema vs. body image distinction.

This will be discussed in detail below. Second, the postural schema is not static but plastic.

It is because each body posture changes through body movements over time and movements successively lead to a new posture in a continuous manner. Seemingly static

32 yoga postures also involve the continuous changes due to minuscule body movements.

When practicing the grudasana, a yogi takes this posture for several minutes. Since a practitioner stands with one foot, keeping balance is important. A yogi should constantly adjust his or her body not to fall down. Especially, the standing foot makes incessant micro- movements to control balance.

Additionally, Head & Holmes (1911-2) implied that the postural schema is associated with both perception and movement, which reminds me of Gibson’s (1962) active touch. The perception aspect of the postural schema is similar to the proprioceptive system that receives inputs from mechanoreceptors in joints, muscles and tendons.

Although proprioceptive inputs provide an estimate about the relative relationship between body parts, it is insufficient to localize body parts precisely with respect to external space

(Longo, Azañón, & Haggard, 2010; Dijkerman, 2017). With regards to the motor component of the postural schema, different configurations of body parts change not only body space but also the space outside the body. Only when both motor and proprioceptive components work together, the postural schema fully functions.

Superficial schema

Head & Holmes (1911-2) observed that their patient Hn (case 14) could not report his hand position but was successfully able to localize the stimulated spots on the surface of his body. On the basis of this observation, they argued that there is another schema besides the postural schema and proposed that this additional schema involves the processing of the localization of the stimulated spots on surface of the body. Later, some

33 researchers called this surface-related schema the superficial schema. According to Paillard

(1999), the superficial schema is a central mapping of somatotopic information from the tactile inputs, which suggests that the superficial schema is associated primarily with touch.

Probably, the superficial schema relates to somatotopy because the somatotopic map refers to spatial patterns in the functional organization of neuronal responses in the somatosensory cortex (Wilson & Moore, 2015). It is important to note that the organization of somatotopy is not proportional to the physical size of body parts (e.g., cortical homunculus). It rather reflects the size of the tactile receptive field (RF). In the primary somatosensory cortex, for instance, a hand is represented larger than a torso because the size of RF of a hand is bigger than that of a torso. Longo et al. (2010) proposed the term

‘superficial map’, where they emphasized a bi-directional localization, in bottom-up and top-down fashions, between a neural representation of body surface and actual surface of the body.

As mentioned previously, Gibson (1962) distinguished passive touch from active touch. The former is known as tactile perception whereas the latter is called haptic, exploratory, and dynamic perception. Promoting an ecological psychology, Gibson (1966) wrote, “active exploratory touch permits both the grasping of an object and a grasp of its meaning” (p.123). The haptic touch is associated not only with perception of physical properties of an object in environment but also with awareness of its ecological meaning.

Tangible properties of an object cannot be grasped by other sensory systems. For example, both vision and touch allow us to perceive geometrical features of an object (e.g., shape, dimensions and proportions, etc.) but touch gives us more precise information about its

34 surface texture (e.g., rough vs. soft), and material features (e.g., heavy vs. light). Through these tangible properties, we can obtain a holistic understanding of an object. With this, we can interact with the environment. Therefore, the haptic touch encompasses tactile perception, motor capacity, and cognition. Developing Gibson’s idea, Aho (2016) discussed the haptic exploration in the context of music-making. In his book The Tangible in Music, Aho (2016) pointed out the importance of tactile perception arisen from bodily movements in the performance of traditional Finnish instruments, Kantele.

In sum, I have discussed body schema in terms of body space. Two components of body schema, the postural and superficial schemata, are primarily associated with proprioception and touch respectively. Although sensory disturbance studies (e.g., patient

Hn case) showed an independence of the postural and superficial schemata, the body schema relies on proprioception, touch, the interaction between two systems, and other systems (e.g., motor system) as well.

Body schema vs. Body image

In the literature of body research, the term ‘body schema’ has often been used interchangeably with another term, namely, ‘body image’. This has led to confusions in various disciplines including neurology, psychology, phenomenology, etc., and production of a series of new terms. According to Gallagher (1986), Fisher, the inventor of the Fisher

Body Distortion Questionnaire, for instance, used the terms of body concept, body scheme, body perception, and body image interchangeably. Body image or body percept, for Gibson, refers to physical poses that are equivalent to the postural schema (Gallagher, 1986). The

35 disagreement about the definitions of the terms and multiple associative terms has become an obstacle of development of study on body.

Following Head & Holmes (1911-2), several researchers have endeavored to make a distinction between body schema and body image depending on involvements of consciousness (Gallagher, 1986, 2005; Paillard, 1999; Longo et al., 2010). Paillard (1999) succinctly defined that body image is an internal representation in the conscious experience of visual, tactile, and motor information of corporeal origin. For Gallagher (1986), body schema is never fully represented in consciousness or conceptualized because the dynamics of the body organize its own spatiality within its surroundings. According to Gallagher

(1986), body image is more complex than body schema because it requires three different components: perception, cognition, and emotion. For Gallagher’s (1986) idea of body image, the body needs to be consciously perceived, conceptually constructed, and related to emotional attitudes and feelings. According to Berlucchi & Aglioti (2010) accepting

Gallagher’s view, when any of the three components malfunctions, it will produce psychological problems such as anorexia nervosa, an eating disorder caused by disturbances of one’s body image.

Interestingly, Gallagher (1986, 2005) noted an interaction between body schema and body image. As discussed earlier, training plays a role in an alternation between body schema and body image. Let us imagine that you are a yogi and practice the ardha chandrasana. It is known as the half- pose for which body limbs are radiated in all different directions. If you are a novice yogi, you should check your limb positions while projecting your arms, legs, and head not to fall down. You may look at a or compare

36 your pose with your instructor’s. In this checking process, you may notice that your leg in the air is not in parallel to the ground. This requires an involvement of your consciousness, which means body image. When you master the ardha chandrasana, your body knows where to position your limbs automatically. At this stage, only your postural schema is active because you can accomplish the pose without conscious reflection.

In sum, there are many terms associated with body schema. Body schema and body image can be distinguished by the involvement of consciousness. Given its un- or sub conscious status, body schema is strongly associated with postural schema that also closely works with the motor system. Space pertaining to the body emerges from continuous changes of postural schema. Haptic touch has a strong a tie with perception and action.

Training can alter the body image to body schema.

Peripersonal space

Space outside of the body is not homogenous. External space can be at least divided into two: peripersonal space vs. extrapersonal space. Some researchers (e.g., Lourenco,

Longo, & Pathman, 2011) use a near vs. far space distinction for this heterogeneous external space. One criterion for this distinction in the field of space research is whether hands can either reach or grasp objects. Peripersonal space is associated with an egocentric frame of reference (e.g., in relation to the body) whereas extrapersonal space is characterized by the use of an allocentric frame of reference (e.g., relations between other objects or events) (de Vignemont & Iannetti, 2015). In this dissertation, I will focus

37 peripersonal space because human music-making involves primarily control of and action in body space and peripersonal space.

The earliest notice of the special zone around the body was made in 1955 by the biologist and the director of Zurich zoo, Heini Hediger. He observed not only that an animal escapes when its enemy or predator approaches within a certain distance, but also that it is not just a detection of a potential danger but a certain distance from an animal’s body leads the animal to escape. Hediger formulated this as an escape distance or a flight distance that corresponds to a margin of safety around the body. Cléry, Guipponi, Wardak, & Hamed

(2015) and de Vignemont & Iannetti (2015) summarized that peripersonal space subserves two functions: 1) body protection and 2) goal directed action. Hediger’s observation on escape distance shows defensive function of peripersonal space. Therefore, it implies body protection function according to Cléry et al. (2015) and de Vignemont & Iannetti (2015).

When it comes to hand-centered peripersonal space, examples of goal directed actions include grasping an object, playing a , etc.

The two distinctive functions of peripersonal space are associated with different neural representations. In their review, Cléry et al. (2015) discussed the two distinct parieto-premotor networks of peripersonal space. A network subserves body protection function by encoding a safety boundary around the body and contains area 7b (i.e., a subregion of the inferior parietal lobule), the anterior part of the intraparietal sulcus (AIP), and area F5 (i.e., the rostral part of ventral premotor cortex (PMVc)) of monkey brain.

Thus, this space is known as protective or defensive space. In contrast, a network for goal- directed actions consists of the ventral section of the intraparietal sulcus (VIP) and the F4

38 area (i.e., the caudal part of ventral premotor cortex (PMVc)). This network is important for functions of reaching space, grasping space, working space, action space, etc. Fig. 2 visually presents how the networks are involved in different functions in a monkey brain.

Figure 2. Two parieto-premotor peripersonal space (PPS) networks on a monkey brain template: AIP= anterior part of the intraparietal sulcus; VIP= ventral section of the intraparietal sulcus

The term ‘peripersonal’ originates from a series of electrophysiological studies by

Rizzolatti, Scandolara, Matelli, & Gentilucci (1981) who discovered the existence of bimodal neurons responding to both tactile and visual stimuli in the arcuate sulcus of a macaque monkey. They not only called these multimodal neurons peripersonal but also reported that the neurons are activated by stimuli in the space within the animal’s reaching distance.

As implied in Hediger (1955) and Rizzolatti et al. (1981)’s observation, peripersonal space is the space that pertains to animal’s body. In other words, peripersonal space, like body space, uses an egocentric frame of reference. Incorporating existing 39 studies, Coello, Bourgeois, & Iachini (2012) defined peripersonal space the most comprehensively:

Peripersonal space contains the objects with which one can interact in the here and now, specifies our private area during social interactions and encompasses the obstacles or dangers to which the organism must pay attention in order to preserve its integrity.

Researchers come to the conclusion that the features of peripersonal space

(henceforth PPS) include 1) multisensory integration, 2) body-part centered specificity, 3) sensorimotor coupling and 4) plasticity, (Brozzoli, Makin, Cardinali, Holmes, & Farnè,

2012; Clèry, et al., 2015; de Vignemont & Iannetti, 2015). In the following section, I will examine these features of PPS.

Multisensory integration

Since the seminal work of Rizzolatti et al. (1981), different disciplines including electrophysiology, psychology, and neuropsychology have presented converging evidence for an important role of multisensory integration in the representation of PPS (for review, see Maravita, Spence, & Driver, 2003; Brozzoli, et al., 2012). A perception of events and objects within PPS triggers multisensory integration (de Vignemont & Iannetti, 2015). In other words, PPS is characterized by a high degree of integration of sensory inputs retrieved from different modalities. Multimodal processing involves both subcortical and cortical components. Subcortically, the superior colliculus (SC) plays a role as the mediating station of visual, auditory, and somatosensory inputs in the early stage processing

(Maravita et al., 2003; Stein & Standford, 2008; Stein, Standford, & Rowland, 2009; Stein, 40

Standford, & Rowland, 2014). In terms of the laminar structure of the SC, Stein and his colleagues discussed that the superficial layers (I to III) receive unisensory inputs while the deep layers (IV-VI) are associated with multimodal information. Brozzoli et al., (2012) argued that the putamen, rather than the SC, is relevant for processing multisensory events in PPS. As alluded to by the two PPS networks, cortical components responsible for multimodal integration include sub-regions of the parietal and premotor cortices. This will be discussed in detail in the sensorimotor coupling section.

Body-part centered specificity

About a decade prior to the study of Rizzolatti et al. (1981), a neurophysiological single cell study by Hyvärinen & Pranen (1974) reported that some neurons in the parietal area 7 of an awake monkey responds to both tactile stimuli delivered from a specific body part and visual stimuli presented close to the same body part (for review, see Cardinali,

Brozzoli, & Farnáe, 2009; Dijkerman, 2017). Since then, researchers not only found that multisensory neurons have tactile receptive fields (RFs) centered on specific body parts but also reported that the tactile RFs overlap with visual and/or auditory RFs. Rizzolatti,

Fadiga, Fogassi, & Gallese (1997) noted that the visual RFs work with the tactile RFs for coding body-parts coordinates. They pointed out that body movements not only affect both the visual and tactile RFs but also establish our experiential PPS.

A reference frame refers to the center of a coordinate system to represent locations of events or objects (Cohen & Anderson, 2002). Although there are disagreements about how many body-part-centered reference frames exist and which body part functions as a

41 common reference frame, it has been accepted that a peripersonal representation consists of multiple body-part specific reference frames (Holmes & Spence, 2004). Cohen &

Anderson (2002) noted four different reference frames: 1) body-centered, 2) eye-centered,

3) head-centered, and 4) limb-centered. Considering the eye-centered reference frame as the common system, Cohen & Anderson (2002) proposed that the posterior parietal cortex is the brain area where the body-part reference frames transform into the common reference frame. Pointing out that the brain constructs multiple and modifiable representations of space centered on different body parts, di Pellegrino & Làdavas (2015), however, proposed only three reference systems including 1) head-centered, 2) hand-centered, and 3) trunk- centered reference frames. They are known as perihead, perihand, and peritrunk respectively. di Pellegrino & Làdava’s (2015) proposal is interesting in regard with two different modes of music-making. Although both singing and playing an instrument need to control the whole body, which involves perihead and peritrunk spaces, compared to singing, playing an instrument strongly relies on perihand space because we use hands to play musical instruments. Therefore, it would be possible that two types of music training lead to different PPS representation depending on a specific body part.

At any rate, a behavior study by Serino et al., (2015) provides a better understanding of body-part centered reference frames and a common system of PPS. In order to measure the scope of PPS, Serino and his colleagues developed an audio-tactile interaction paradigm. The main assumption of this paradigm is that PPS is multisensorially constructed.

In this paradigm, a proxy of PPS is where sounds enhance tactile processing. Participants are asked to respond as fast as possible to a tactile stimulus while task-irrelevant

42 approaching or receding auditory stimuli are presented to their bodies. After a sound with different temporal delays, the target touch stimulus is delivered. The authors found that the size of PPS varies depending on which body part is stimulated. Specifically, trunk stimulation produces the largest volume of PPS while hand stimulation has the smallest

PPS. The task-irrelevant approaching auditory stimuli to the participants modulate tactile processing for head and trunk (e.g., the multisensory facilitation effect) whereas the receding auditory stimuli alter only hand tactile processing, which implies that different body-centered spaces respond differently to moving objects within PPS. Thirdly, perihead, perihand, and peritrunk spaces are not completely independent from each other. Rather, these spaces interact in a specific way. Serino et al. (2015) noted that perihead space is based on visual RFs that are anchored to the head thus the position of eyes and head direction are important in perihead space. The authors argued that an interaction of perihead and perihand spaces (i.e., an arm-anchored visual RFs) is associated with the computation of the arm position relative to both eyes and head. Lastly, the findings imply that the trunk, not the eyes, plays a role as the common reference frame because the size of peritrunk space is constant compared to that of perihand and perihead spaces, which vary in accordance to relative positioning and stimuli congruency. Further, Serino et al. (2015) argued that perihand space collapses peritrunk space when the hands are placed close to the trunk. Interestingly, Brozzoli et al. (2012) mentioned that near vs. far upper limb movements are represented separately in F4 and F5, the two sub areas of the premotor area respectively. This implies that F4 is probably associated with peritrunk space while F5 may involve perihand space independently. Given the stability and constancy of trunk-centered

43

PPS, Serino et al. (2015) concluded that peritrunk space comprises a whole-body reference frame relative to which a global egocentric representation of space is formed.

Sensorimotor coupling

Sensorimotor coupling links to the previously discussed two features of PPS. First, in regard to multisensory integration, Rizzolatti et al. (1981) pointed out that multimodal neurons are involved in organizing sequences of movements. Second, the body-part- centered coordinates play an important role in the sensory guidance of motor behavior in terms of the interaction with objects near the body.

Sensorimotor coupling refers to the fact that perceived objects or events can be represented in terms of possible actions. Objects or events in PPS guide body movements.

Paillard (1987, 1991) proposed two modes of spatial processing. He argued that, in a sensorimotor mode, spatial information is neurally encoded and derived from body movements in space. While making a connection between sensorimotor coupling and body-part-centered specificity, Cohen & Anderson (2002) argued that sensorimotor movements require neural computations for transformation between body-part-centered reference frames and a common reference frame in order to guide movements.

Neurophysiologically, di Pellegrino & Làdavas (2015) explained that the multimodal neurons in the putamen, and cortical components including the ventral section of intraparietal sulcus (VIP) and the macaque inferior area 6 have both multisensory and motor functions. Additionally, mirror neurons, a special class of motor neurons with visual properties, exist in area F5 (see fig. 2). According to a summary of peripersonal space by

44

Brozzoli, et al. (2012), a subset of the mirror neurons is particularly associated with the execution of a specific motor movement (e.g., precise grasping) and the same action of other bodies after observation. This implies that mirror neurons play an important role in the parieto-premotor network subserving for goal-directed action (see the blue network in fig. 2).

Plasticity

The boundary of PPS is plastic, flexible and dynamic. Hediger’s (1955) flight distance, for example, is to let an animal run away from a potential threatening danger.

However, this is not true for my dog, Daisy. She has a flight distance of zero towards cars that, for wild animals, are considered as a potential danger. Squirrels on campus climb up on a tree even when a car approaches them within about five meters. It seems that Daisy’s previous experience with cars modifies her flight distance to a car. This raises the question of what factors can modulate PPS. de Vignemont & Iannetti (2015) argued that 1) emotions and 2) tool use have modulatory effects on the PPS representation.

Concerning the effect of emotion on space, claustrophobia is a persistent and irrational fear of enclosed places or of being confined (APA Dictionary of Psychology,

2015). Patients with claustrophobia report panic symptoms (e.g., feelings of suffocation, sweating, fears of losing control, etc). Correlating claustrophobic fear to the size of near space in terms of the defensive function, Lourenco, et al. (2011) argued that claustrophobia distorts the representation of PPS. Behaviorally, they found that a relatively large near space correlates with an elevated anxiety in normal healthy people. The author further

45 argued that an enlarged PPS activates easily the defense mechanism, which causes anxiety of enclosed spaces, that is, claustrophobic emotion. Coello et al., (2012) investigated how dangerous objects affect the size of PPS. In their behavioral experiment, the participants rated whether objects (e.g., a , scissors, etc.) were dangerous depending on whether the objects are pointing away from or towards the participants. They found that the size of

PPS reduces when participants perceive an object as a threatening one. With regard to the study of Lourenco et al. (2011), the results can be interpreted to indicate that the perception of danger may cause anxiety, which subsequently leads to a shrinkage of the PPS.

In terms of two functions of PPS, both Lourenco et al. (2011) and Coello et al.

(2012) explored the relation between anxiety and a defensive space. Graydon, Linkenauger,

Teachman, & Proffitt (2012) investigated the effect of anxiety on a goal-directed PPS.

Investigating whether anxiety alters the perception of action capability (e.g., reaching, grasping, passing hands through holes, etc.) in a near space, Graydon et al. (2012) showed that the participants experiencing anxiety made a more conservative judgment on their action capability compared to the control, which reduces the size of the working PPS.

In addition to the modulative effect of emotions (e.g., anxiety) on the size of PPS, tool use is also a well-studied factor contributing to PPS plasticity. While anxiety is primarily associated with the reduction of the space, tool uses allow an animal to incorporate extrapersonal space in to its PPS. In other words, tool use stretches the size of near space. Neurophysiological studies suggest that the brain considers as extended body parts by activating neural networks associated with the putative body schema and consequently changing the PPS (Maravita & Iriki, 2004). In the first study reporting the

46 plasticity of the PPS due to tool use, Iriki, Tanaka, & Iwamura (1996) found that a monkey’s manipulation of a long rack with the hand in order to acquire a distant food rewards modulates the PPS representation. In other words, the rack extends the animal’s reaching distance by assimilating the rack with the body. Activation of bimodal neurons has been observed in the medial anterior intraparietal sulcus and in the post-central gyrus

(Iriki et al., 1996; Maravita & Iriki, 2004). Using a positiron emission tomography (PET),

Obayashi et al. (2001) investigated which areas of monkey’s brain would be activated after rake use training for several weeks. They found that tool use is associated with neural activities in the intraparietal region, the basal ganglia, the presupplementary motor area

(especially F4), the premotor cortex (especially F5), and the cerebellum.

In their review on the relationship between training and tool use in terms of PPS,

Brown & Goodale (2013) proposed a motor knowledge hypothesis. They argued that motor knowledge acquired through training plays a significant role in the appearance of near-tool effect that is associated with the adapted representation of PPS around the tool. The acquisition of motor knowledge requires to connect the planned motor movements and their corresponding sensory consequences after the execution. In other words, motor knowledge is established through a reliable predictive relationship between motor input and sensory output. Similarly, motor learning for tool manipulation involves the reinforcement of the predictive ability based on sensorimotor coupling when animals use tools. Motor knowledge of tool use allows its user to predict the spatial location of the tools as it is moved by linking limbs, hands and the tool position. According to Brown & Goodale

(2013), an animal with unfamiliar tools will not show altered spatiotemporal processing

47 because its sensorimotor system cannot predict the relationship between the tool’s action controlled by the animal and the sensory signals produced by the moving tool.

In sum, PPS plays an important role in body protection and goal-directed action that involve different neural networks. PPS is characterized by four features that are inseparable from each other as well as the two schemata of body space. First, multisensory integration has been identified with multimodal neurons. Second, body-part-centered specificity shows multiple and modifiable representations of PPS centered on different body parts. This feature is associated with postural schema and implies how body space interacts with PPS. Third, multisensorially perceived objects or events in PPS guide action or movements. Last, anxiety and tool use can modulate PPS. The last two features relate to the discussion about haptic touch.

Two modes of spatial processing: sensorimotor vs. representational

As mentioned previously, Paillard (1987, 1991) proposed that there exist both a sensorimotor and a representational mode of spatial processing. Although these two modes coexist, they generate and store their own mapping of space. The sensorimotor mode concerns the direct dialogue between an animal and the physical world which is attuned by its sensorimotor apparatus. Thus, the sensorimotor mode contributes to the continuous updating of a body-centered mapping of external space where things are located and to which actions are guided. Similar to Gibson’s active touch, Paillard (1987, 1991) noted that perceived sensory information directs body movement in space and coined the term

‘action space’.

48

In contrast, the representational mode derives from neural activities which explore and consult an internal representation of the physical environment that is embodied in memory stores. This mode is associated with a mental representation of local maps (e.g., the 18th Avenue library is on the OSU central campus), spatial relationships of routes relative to landmarks (e.g., the Arps garage from the library), relative position between objects (e.g., the Arps garage entrance and the exits) and the position of the body itself in relation to its stationary environmental frame (e.g., which floor I am in the garage). The representational mode is similar to what is known as a cognitive map.

In their phenomenal book, The Hippocampus as a Cognitive Map, O’Keefe &

Nadel (1978) argued that the psychological space is produced intrinsically and operated by the mind. The psychological space includes 1) a particular set of sensations transduced by a specialized spatial sense organ, 2) organized sensory arrays which derive their structure from the nature of peripheral receptors, 3) organizing principles that impose unified perceptions upon otherwise diverse sensory inputs, 4) abstractions from sensations, and 5) the concepts that the mind builds on the basis of reflections on experience. The first three components of the psychological space of O’Keefe & Nadel seem to correspond to the sensorimotor mode by Paillard (1987, 1991). The significance difference between the two studies is that the psychological space by O’Keefe & Nadel (1978) focused only on the perception of sensory inputs, thus missing the motor component, whereas Paillard (1987,

1991) emphasized action space that is guided by perception. Paillard’s (1987, 1991) representational mode seems to be associated with the remaining last two elements of the psychological space by O’Keefe & Nadel (1978).

49

Additionally, the different modes of spatial processing seem to be parallel to two modes of temporal processing. As discussed in chapter 2, many researchers of time (e.g.,

James, etc.) have distinguished perceived time from remembered time. Time perception is different from representation of time (Friedman, 2000). For the sensorimotor mode of spatial processing and time perception, both space and time emerge from sensory inputs and body movements. For the representation mode of spatial processing and representation of time, space and time heavily rely on patterns that are acquired from previous experience, namely, memory.

Embodied spaces in music-making bodies

With regard to music performance, one can distinguish two types of music-making:

1) singing and 2) playing an instrument. In the previous chapter, I briefly discussed the involvement of differential processing not only for vocal vs. non-vocal sounds but also for vocal vs. non-vocal rhythms. The distinction between singing and instrument playing offers us a better understanding of the origins of music. This and the different evolutionary paths of singing and playing an instrument will be discussed in chapter 5.

This chapter proposes a new perspective on the differences between singing and instrument playing in terms of space pertaining to the body. From ethnomusicological perspectives, Baily & Driver (1992) developed their argument for spatio-motor thinking on the basis of a connection between musical creativity and the way we play an instrument.

The idea behind their spatio-motor thinking is that the manner of music-making contributes to the representation of the body in a distinctive way and transform our spatial experience

50 accordingly to the instruments, which further influence musical styles. Aho (2016) proposed that tangibility may play an important role in instrumental music performance, which this implies different transformative powers of different types of music-making. He wrote:

the instrument can even seem to evolve into an organic extension of the body. In fact, a basic feature distinguishing playing an instrument from singing, the oldest and the most basic means of producing music, is the way a musical instrument provides the player with a means of transcending the limits of the physiological body, producing a sound that may be ultimately may be enough to fill a sport stadium. (Aho, 2016, p.4)

Would singing and instrument playing lead to different spatial experiences? If then, how? I limit the application of the postural and superficial schema to perceptual consideration, in spite of an importance of motor component in body space and peripersonal space. This is for a purely practical reason, in order to see clearly the differences in spaces in the two performing bodies. In terms of postural schema of the singing body, we un- or subconsciously control our vocal organs to produce vocal sounds.

Signing does not make use of the superficial schema of hands, which seem to play an important role in active touch. Singing may lead to a vibrating surface of the body but the vibration is limited to the torso and probably the head. Therefore, the singing body may have a different peritrunk space compared to the playing an instrument body. In contrast, the instrument playing body involves all of the components of both body space and peripersonal space. Additionally, it requires both postural and superficial schemata of body space due to the important role of direct tactile contact with, and haptic exploration of a musical instrument. For the instrument-playing body, multiple sensory information, predominantly from auditory and tactile inputs, is constantly integrated. Although visual 51 inputs play a role in music-making in general, I would not discuss and its interaction with other sensory systems because my main research interest here is involved in the potential differences in spatial processing between the singing and the instrument playing body. And another most remarkable of differences between them, in addition to audio-tactile integration, is perihand space where I cannot think of any role of the visual system. In instrument playing, hands play an important role and the perihand space can be expected to occupy a prominent position in the body-part centered specificity of the peripersonal space. The coupled audio and tactile inputs at perihand space imply an interaction between the perihead and perihand spaces. In terms of sensorimotor coupling, a playing-an-instrument body interacts with a musical instrument in an action-perception feedback loop. Musical instruments, the most special tools that humans have ever invented, seem to alter the peripersonal space around our limbs and music instruments. As Brown &

Goodale (2013) pointed out, playing an instrument involves motor knowledge, specifically for limb movement, that is combined with specific spatial information near and on an instrument. Furthermore, some instruments are equipped with tools (e.g., drum stick, string bow) that extend perihand space (see Table 1).

52

Music-making bodies Singing body Playing-an-instrument body

Spaces pertaining to the body and components

Postural schema ✓ ✓ Body Superfical schema ? ✓

Multisensory ✓ integration (audio-tactile) Body part ? ✓ specificity (peritrunk, perihead) (perihand) Peripersonal Sensorimotor coupling ✓

Plasticity & near- ✓ tool effect (e.g., drum stick, string bow, etc.)

Table 1. Comparison of spaces pertaining to the singing vs. playing-an-instrument bodies

As some music researchers (e.g., Schäfer, Fachner, & Smukalla, 2013) noted, although space plays an important role in music-making, space has been largely ignored in music research. This is probably because music has been considered as an art of time so space has not been a primary interest in music research. Therefore, space has not been defined well in music research and the use of space is different depending on researchers’ arguments (e.g. pitch class space, etc). Early studies on space in music research do not give us a clear idea of how to study space in music-making bodies. At various conferences, I

53 proposed that we should look at space pertaining to the music-making bodies. However, some researchers I encountered at these conferences preferred to see this in a different way, depending on their interests. For instance, I met a composer who runs an electronic laptop project where people collaboratively improvise through remote access. For him, virtual space is the main concern. His understanding of space (i.e., virtual space) is different from space I propose in this chapter. Space pertaining to the body is neither abstracted nor virtual, but it emerges from our perception of and action on the world surrounding the body.

I hope my proposal here contributes future studies on space in music-making.

54

Chapter 4. The origins of time and space concepts

One of the main arguments of the previous chapters is that time and space emerge from integrated sensory information that is retrieved from an interaction with environments.

In other words, time and space are shaped by human experience. Our temporal and spatial experiences primarily rely on our perception of and action to the world. On the basis of our perception and action, we cognitively construct and/or reconstruct time and space, that is, the world, in a meaningful way. If we agree that time and space are cognitive constructs, then they should be discussed in terms of , experience and environment, that is, culture (Will, 2017). In chapters 2 and 3 I approached time and space from a cognitive science perspective, which may cover a portion of biological and experiential aspects of time and space. In order to understand the human experience of time and space holistically, it would be interesting to see how time and space have been seen in different cultures.

However, it is not easy to find cultural differences in an understanding of time and space because these two concepts have changed throughout human history (e.g., the shift from

Newton’s absolute time to Einstein’s relative time in in the West). Among many approaches, one possible way to find cultural differences regarding time and space is to look at the ancient writing fragments about cosmology. The ancient texts on the creation of the world tell us about the origins of the time and space concepts with each culture’s 55 own view. Rosen (2004) asserted a necessity of comparative studies on the study of time in early human history in the opening chapter of Time and Temporality in the Ancient

World because it allows us to speculate how ancient individual cultures reconstructed, conceptualized, and formulated time differently. Although there are some comparative studies on time (e.g., Rosen, 2004), there seems to be no equivalent research of the concept of space in antiquity. Therefore, I will discuss how various cultures have understood time and space differently through an examination of a few fragments from ancient Babylonia,

India, China, and Greece as well as relevant commentaries on those texts. My analysis here is piggybacked on scholarly works by the expertise of these cultures’ cosmology and philosophy. To begin with, it is worth noting that the majority of the ancient scripts describe the creation of the world as setting or recovering temporal and/or spatial orders from chaos.

It means that the creation of the world has been considered as that of the ordered world, that is, the cosmos or universe. Ephemerality, eternity, and cyclicity play an important role in an achievement of temporal and/or spatial order, however, each culture put different weights on one of these properties. As temporal and spatial orders are critical in the cosmos, time and space in ancient societies served didactic roles to control human action and behaviors, for example, how to make music in ritual ceremonies. Therefore, I will also look at the role of music described in the ancient cosmological scripts if relating ancient documents are accessible.

56

Babylonia

When skies above were not yet named, Nor earth below pronounced by name, Apsu, the first one, their begetter, And maker Tiamat, who bore them all, Had mixed their waters together, But had not formed pastures, nor discovered reed-beds; When yet no gods were manifest, Nor names pronounced, nor destinies decreed, Then gods were born within them. Lahmu (and) Lahamu emerged, their names pronounced. As soon as they matured, were fully formed, Ansher (and) Kishar were born, surpassing them. They passed the days at length, they added to the years. Anu their first-born son rivalled his forefathers: Anshar made his son Anu like himself, And Anu begot Nudimmud in his likeness.

(Tablet I:1-16, Dalley, 2008, p. 233)

This is the opening of the Babylonian genesis, Enūma Eliš, literally meaning ‘when above’, (Heidel, 1951; Jacobsen, 1957). 2 The Enūma Eliš indicates how the ancient

Babylonians experienced and constructed the world on the basis of their everyday experience. The Enūma Eliš reflects a unique geographical environment of the

Mesopotamian civilization. Mesopotamia had been built on a slit between the Tigris and

Euphrates rivers over thousand years ago (see fig.3). Although the rivers contributed to a formation of fertile marshlands, which played a crucial role in the rise of civilization, sweet water from the two rivers might often blended with sea water from the Persian gulf

(Jacobsen, 1957). It is very likely that the ancient Mesopotamians suffered from the violence and severity of water-related natural disasters, including torrential rains and devastating floods, which might have been almost impossible to control (Whitrow, 2004).

2 It was recorded on seven clay tablets around the second millennium BCE in Akkadian, whose dialects include Assyrian and Babylonian, a Semitic language. 57

Figure 3. Ancient Mesopotamia geography

The epic beautifully recounts the early Babylonians’ experience of the cruelty of the nature. According to the quote above, there was neither heaven nor earth in the beginning of the world. But there existed only chaos made by the sweet water and the salty water. The two waters were deified as Apsu and Tiamat, respectively. The mixed waters brought chaos. They also generated a slit that was represented as the twin gods Lahmu and

Lahamu. Ansher and Kishar derived from the twin gods brought a horizon. The horizon was circumscribed by heaven and earth. Therefore, the god of heaven, Anu, was born from

Ansher and Kishar. The god of earth, Nudimmud, was born from Anu. This genealogy of the gods is suggestive of how significant it was to secure a living space from water disasters for the ancient Babylonians. Accordingly, the ordered space seemed to precede time in the process of the story of the creation of the world. In Babylonia, the ordered world, cosmos,

58 was heavily indebted to Marduk, the son of the earth and was born within the line of the sweet water.3 Marduk has been known as the savior of the world from destructive forces of the salty water (i.e., Tiamat) which brought about chaos and evils to the world. 4

Championing the battle against Tiamat, Marduk acquired a power to control violent nature including flooding, winds, storm, etc. (Jacobsen, 1968). Throughout the history of

Babylonia, Marduk’s creation of cosmos has been extolled on the fourth day of the akītu festival5 where the Enūma Eliš is recited (Robson, 2004):

He [Marduk] fashioned stands for the great gods. As for the stars, he set up constellations corresponding to them. He designated the year and marked out its divisions. Apportioned three starts each to twelve months. When he had made plans of the days of the year, He founded the stands of Neberu to mark out their courses, So that none of them could go wrong or stay. . . . He made the crescent moon appear, entrusted night (to it). And designated it the jewel of the night to mark out the days.

‘Go forth every month without fail in a corona, At the beginning of the month, to glow over the land You shine with horns to mark out six days; On the seventh day the crown is half. The day shall always be in the mid-point, the half of each month.

3 And inside Apsu, Marduk was created; Inside pure Apsu, Marduk was born. Ea, his father created him, Damki[na] his mother bore him. (Tablet 1: 81-4, Dalley, 2008, p. 235). Nidmmud’s another name is Ea, the father of Marduk. 4 How did Tiamat become the origin of chaos and evils in the epic? When the children of Apsu and Tiamat, the younger gods, played in her belly, they were so boisterous that Apsu could not quell them. He requested Tiamat to kill the children, which enraged Tiamat. Apsu convened the counsel of gods to kill the children. After the younger gods knew Apsu’s plan, Ea, the god of magic and Marduk’s father, casted a spell that made Apsu sleep. While Apsu was sleeping, Ea took his power and killed him. Marduk played with winds, which disturbed Tiamat who was enraged and promised to avenge her husband’s death by destroying everything. For this, she created demonic monsters. 5 According to an extensive survey on the akītu festival by Bidmead, (2014), the akītu is one of the oldest ceremonies in the ancient Near East. Records show that the early akītu began as an agricultural harvest festival and took place a year for the grain harvest and for wheat harvest. The festival developed from two semiannual agricultural celebrations to the one new year celebration. In the first millennium BCE, the festival ended up with twelve days of celebrations involving rituals, prayers, sacrifice, royal processions, recitation of the Enūma Eliš, and prophecies for the upcoming year. 59

When Shamash looks at you from the horizon, Gradually shed your visibility and begin to wane. Always bring the day of disappearance close to the path of Shamash, And on the thirtieth day, the [year] is always equalized, for Shamash is (responsible for) the year.’

(Tablet V:1-7 & 12-22, Dalley, 2008, pp.255-7)

The fifth tablet demonstrates how Marduk set cosmic orders. As delineated above, what Marduk conducted at first was to put the heavenly bodies in a proper position. Given that each star stands for his ancestor gods, Marduk’s first work could be interpreted as placing his ancestral tablets in his family shrine. After setting the spatial order, Marduk operated the stars in a regular motion that gives rise to cyclic time (Robson, 2004). The stars designated twelve months. Next, Marduk created the moon while providing how it ought to change the shape of its body throughout a month within a year. According to the temporal order established by Marduk, the ideal length of the months consisted of thirty days so the year had 360 days in total, which implies a cyclicity of time. However, reality does not correspond to the ideal model of cyclical time set by Marduk. Speculating how the early Mesopotamians conceived time and space with regard to celestial divination,

Brown (2000) argued that there was a significant development in the measuring of celestial movements by way of temporal and spatial units as indicated by the Mesopotamia cuneiform records. The observations of the movements of heavenly bodies by these units revealed that there were systematic variations from Marduk’s sacred system. The difference between reality and the ideal system was interpreted as being ominous.

Therefore, the discordance between the two systems thought to be reconciled through ritual ceremonies. Babylonian rulers used this discrepancy between the systems for administrative purposes. In line with Brown (2000), Robson (2004) argued that the 60

Bablylonians had religious ceremonies at the temples in order to reconcile the real and sacred times and to adjust the calendars by moving clothing ceremonies and animal sacrifice forward or backward a day according to the length of the month to avoid bad omens.

In sum, the Enūma Eliš reflects the ancient Babylonians’s experiences with a severe environment and their desire for ordered space. This epic describes how Marduk arranged the spatial order of his ancestral gods into heaven and determined the operation of star movements, which are associated with a cyclical temporal order of the cosmos. For the

Babylonians, the temporal order created by Marduk had been considered ideal and sacred.

The discrepancy between ideal time, established by the god, and worldly time is being considered ominous, therefore the ancient Babylonians have tried to recover and restore cosmic order through ceremonies including the akītu festival.

India

The ancient Indian understanding of the world is complex. In this section, I will look at that complexity through the Ṛgveda, a hymn collection dedicated to several deities and the oldest text of the four vedas (ca. 1,200 - 1,000 BCE). Yanchevskaya & Witzel

(2017) examined the Ṛgveda in order to reconstruct the ancient Indian views of the world.

The authors argue that the Ṛgveda is the root of the Indian time and space concepts. For example, Prasad (1992) argued that, on the basis of the vedas, the Upaniṣad and the

Ved!"nta, the post-vedic writings, show not only a radical development of thinking about time but also a diversity of understanding of the world in ancient India. This divergent

61 development can be seen in the emergence of the Brahmanical and Buddhist schools. Each school formulated its own perspectives on time and space after contemplating the origin of the world (Baslev, 2009). In regard to the concept of time, Yanchevskaya & Witzel (2017) pointed out two important properties of time in the ancient Indian perspective of the world:

1) eternity and 2) ephemerality. In the following section, I will touch upon those two temporal properties and add another one, cyclicity, which is associated with eternity. Next,

I will discuss the ancient Indian understanding of space within a continuum of time and space in the Ṛgveda. In the context of the ritual ceremonies, the continuum connects the inner world with the external world.

According to Yanchevskaya & Witzel (2017), there was no abstract term for time in the Ṛgveda where k!"la, which denotes ‘in a [proper] moment of time’, appeared once and samaya, another word for ‘time’, was not used at all. Examining the Ṛgveda,

Yanchevskaya &Witzel (2017) noted two different understandings of time in ancient India and proposed a term ‘two times’ to discuss two properties of time, that is, eternity and ephemerality. Eternity refers to something everlasting. Eternity is an imperceptible feature of time. In chapter 2, I discussed the fact that human temporal experience is based on perception of changes. Through perception, we detect events. An ensemble of different sensory modalities allows us to notice changes around us. As the fundamental psychological building blocks of time, event detection involves discerning a change of state of something. However, eternity or everlastingness is associated with an unchanging, stable, and constant state of an object. It implies that there is no event to be detected. As a purely

62 abstract concept, eternity is characterized by its atemporality. The penultimate hymn of the tenth maṇḍala titled ‘creation’ of the Ṛgveda represents this eternity property of time:

1. From fervor (tapas) kindled to its height Eternal Law(ṛta) and Truth (satya) were born: Thence was the Night (rātri) produced, and thence the billowy flood of sea arose. 2. From that same billowy flood of sea the Year (saṃvatsara) was afterwards produced, Ordainer of the days nights, Lord over all who close the eye. 3. Dh!"tṛ, the great Creator, then formed in due order Sun and Moon. He formed in order Heaven and Earth, the regions of the air, and light. (RV10.190, Griffith, 1896)

In the Ṛgveda, eternity is associated with ṛta. As the principle ṛta is the cosmic order in collaboration with satya (Truth/Reality). Another important term in this hymn is tapas (fervor/heat) that stands for certain spiritual or religious practices. It produced ṛta and satya. Both ṛta and satya were the preconditions for rātri (the night) and the billowy flood of sea. From the primordial chaos by the night and the billowy flood of sea, saṃvatsara (year) was born. According to Yanchevskaya & Witzel (2017), saṃvatsara denotes “all-powerful time-eternity in the Ṛgveda” (p. 24). After the birth of saṃvatsara, the world found its temporal organization that has governed movements of heavenly bodies.

In sum, ṛta is the organizing principle of the universe and its medium is saṃvatsara.

Saṃvatsara implies that cyclicity is a immanent property of in ṛta. Although cyclicity is not considered in Yanchevskaya & Witzel (2017)’s discussion, Brown (1968) argued that an Indian idea of cyclical time within the ever-revolving of time goes back to the Ṛgveda. For example, a famous riddle of the universe hymn, asya vāmasya

(RV1.164) alludes to cyclicity of ṛta by using a wheel metaphor. This riddle hymn gives a description of the sun’s chariot whose wheel runs around heaven. Like the eternal world, heaven never changes. So the wheel running around heaven transcends both time and space. 63

The wheel is composed of several parts that are specified with certain numbers symbolizing different things in a cycle of a year. For example, the three naves of the wheel symbolize three seasons or three worlds, which are associated with the solemn vedic ritual.6 The five spokes or five feet of the wheel relates to five seasons that are distinguished in the vedic tradition.7 The twelve spokes or fellies of the wheel indicate twelve months.8 The 360 pegs that are used to put the wheel spokes together relate to the days of a year (see footnote 6).

In a cycle of a year that is represented as the wheel and involves solar rotation, all creatures come to life and die.9 In terms of time, the atemporal ṛta reveals its temporal manifestations

(e.g., solar movement) through cyclicity. The strong connection of eternity and cyclicity is also reflected in Indian music-making. In his discussion of time in Indian music, Clayton

(2000) said, “the ultimate nature of the rāg is thought of as unchanging, while it is constantly renewed in performance as cycle inevitably follows cycle” (p.16).

Let me return to the Ṛgveda’s creation hymn in order to understand the ancient

Indian perspective on space. There is an interesting difference in the process of world creation between the ancient Babylonia and India. I argued that the Enūma Eliš reflects the geographic environment where the early Babylonians probably suffered from frequent

6 Twelve are the fellies, and the wheel is single; three are the naves. What man hath understood it? Therein are set together spokes three hundred and sixty, which in nowise can be loosened (RV 1.164.48, Griffith, 1896). Three seasons include summer, monsoon, and winter while three worlds are composed of heaven, earth and underworld. 7 Upon this five-spoked wheel revolving ever all living creatures rest and are dependent. Its axle, heavy-laden, is not heated: the nave from ancient time remains unbroken (RV 1.164.13, Griffith, 1896). 8 Formed with twelve spokes, by length of time, unweakend, roll round the heaven this wheel of during Order (RV 1.164.11, Griffith, 1896). 9 The wheel revolves, unwasting, with its felly: ten draw it, yoked to the far-stretching car-pole. The Sun's eye moves encompassed by the region: on him dependent rest all living creatures (RV 1.164.14, Griffith, 1896); also see footnote 6. 64 water floods so they might have valued space more than time. I speculated that this may be the reason why an establishment of the spatial order by Marduk preceded that of temporal order in the fifth tablets of the Enūma Eliš. In contrast, the Ṛgveda prioritizes time. Time organized space. Associated with eternity and cyclicity, time regulated everything in the world and surpassed space. Although both the Ṛgveda and the Enūma Eliš described chaos as the primordial state that was associated with water, the Ṛgveda’s prioritization of time over space was opposite to the Babylonian origins story and its Jewish parallel, the Book of Genesis. The Enūma Eliš set spatial order from the watery chaos before temporal order.

In this sense, commonalities can be found in the Book of Genesis where God made the heavens and the earth by clearing the waters. Then God filled space with his creations in a timely manner:

1. In the beginning, when God created the heavens and the earth— 2. [b]and the earth was without form or shape, with darkness over the abyss and a mighty wind sweeping over the waters— 3. Then God said: Let there be light, and there was light. 4. God saw that the light was good. God then separated the light from the darkness. 5. God called the light “day,” and the darkness he called “night.” Evening came, and morning followed—the first day. (Genesis 1:1-5, New American Bible)

With regard to space represented in the Ṛgveda, Yanchevskaya & Witzel (2017) pointed out an importance of the Puruṣasūkta hymn of the tenth maṇḍala. Puruṣa is the cosmic giant who sacrificed himself10 and his body gave rise to all elements that filled space.11 For example, heaven, earth, and interspaces, the sun and the moon, and living

10 When Gods prepared the sacrifice with Puruṣa as their offering, Its oil was spring, the holy gift was autumn; summer was the wood. (RV 1.10.5, Griffith, 1896). 11 A thousand heads hath Puruṣa, a thousand eyes, a thousand feet. On every side pervading earth he fills a space ten fingers wide. 65 beings originated from his sacrificed body. However, it is important to note that Puruṣa’s sacrifice gave him eternity12 and cyclicity13 so that the cosmic giant Puruṣa created spaces and the materials filling in the time-space continuum.

Over time, this time-space continuum model had been developed in the ancient

Indian philosophy. One of the earliest models is found in a section of the Yuriveda, one of four Vedas describing actions related to rituals. For example, the Taittirīya upaniṣad, part of the Yuriveda, that suggests five layers inserted in the continuum of cosmos (see fig.4).

The layers contain the elements coming from the sacrificed Puruṣa’s body. Above all, the

Taittirīya paniṣad’s model introduced an existence of another continuum between inner cosmos and outer cosmos. I will discuss a modified and more elaborated model below.

Figure 4. Taittirīya upaniṣad’s model of a continuum of the cosmos (Rowell, 1992, p.16): Reprint permission granted by the publisher

Another property of the ancient Indiam time is ephemerality. This property reflects a transitory and concrete property of time. Time is often marked as an event or event units of the year, such as seasons, month, day, nights, etc. As implied in the second meaning of

12 This Puruṣa is all that yet hath been and all that is to be; The Lord of Immortality which waxes greater still by food. (RV 1.10.2, Griffith, 1896). 13 From him Viraj was born; again Puruṣa from Viraj was born. As soon as he was born he spread eastward and westward o'er the earth. (RV 1.10.5, Griffith, 1896). 66 tapas, that is, spiritual or religious practices, we can demarcate various time units especially in religious settings. In doing so, a proper moment (ṛtu) in time becomes conceivable to people participating in a certain ritual ceremony. Pointing out that Sāyaṇa, the commentator of the Ṛgveda, described a proper moment with the term ‘kāla’ referring to a moment in time, Yanchevskaya & Witzel (2017) argued that the term ‘ṛtu’ explains ephemerality of time and emphasized its connection with rituals in the Ṛgveda. For example, the second hymn dedicated to Agni, the god of fire, writes, “To the Gods’ pathway have we travelled, ready to execute what work we may accomplish. Let Agni, for he knows, complete the worship. He is the Priest: let him fix rites and seasons” (RV 10.002.3, Griffith,

1896). Similar to that the akītu festival plays an important role in recovery of the cosmic order, the Priest, Agni, organizes time via rites. In his monumental work The Ritual Process,

Turner (1977) analyzed three different units of rituals: 1) separation, 2) liminality, and 3) aggregation. In the second unit of a ritual, that is in liminality, a person experiences a moment of being neither here nor there. To me, this unit seems closely related to the proper moment.

Eliade’s (1992) proposed sacred time and its counterpart, profane time. The former takes eternity into consideration and the latter is “the continuous and irreversible time of our everyday, desacralized existence” (p.97). On the basis of Elaide, Prasad (1992) argued sacred and profane times relate to each other because the re-actualization of myth, namely rituals, would bring sacred time back. Again, this is reflected in the Babylonian akītu festival. In both cultures, rituals bring back the ideal time, that is, sacred time. Then how can rituals revive it? A possible explanation for this can be found in Kak’s (2009)

67 investigation of a role of the vedic temples. To begin with, Kak (2009)’s interpretation of the vedic temples is based on Sāṅkhya’s model of a continuum of the cosmos (see fig.5).

Therefore, let me discuss Sāṅkhya first and then return to Kak (2009).

Figure 5. Sāṅkhya’s model of a continuum of the cosmos modified from Rowell (1992, p. 30): Reprint permission granted by the publisher.

Over many centuries, models of a time-space continuum developed in various ways depending on different schools in ancient India. One of the earliest models is shown in

Taittirīya upaniṣad (fig. 4). According to Rowell (1992), Sāṅkhya, one of the Brahmanical schools14, suggested the most fully developed view on the processes of how the outer cosmos transforms the inner one by introducing the tripartite doorkeeper model. Compared to the Taittirīya upaniṣad where its five layers have their own inner world and corresponding outer world in a cosmos continuum, the Sāṅkhya model consists of the inner

14 Other schools include Nyāya, Vaiśeṣika, Yoga, Mīmāṁsā, and Vedānta. 68 world as associated with the mind and the outer world as related to the external world.

Furthermore, the inner world is refined with the three door keepers, which refers to the mind. The five layers in the Taittirīya upaniṣad model were reorganized, too. Sāṅkhya, suggested the five layers of the sense and motor organs, which correspond to perception and action respectively. The five layers of the five subtle and physical elements represent the external world. The most important aspects of Sāṅkhya’s expanded model are not only a connection of sense (i.e., perception) with motor organs (i.e., action) but also a distinction of mind from ego in terms of the involvement of consciousness. This relates to the concepts of time and space that are widely accepted in the field of the cognitive science. In my previous chapters, I discussed a strong tie between perception and action in human experience of time and space. We perceive changes in our surrounding. We react to the changes. Our bodily movements create other changes in a surrounding environment. We perceive new changes. This ongoing perception-action feedback loop allows temporal and spatial experience. Gibson’s active touch and Paillard’s action space are the best example of this. In addition to this, I also discussed the role of consciousness in the perception of space (e.g., body schema vs. body image). In the Sāṅkhya’s model, manas and ahaṅkāra seem to correspond to body schema and body image respectively. Manas interprets raw data retrieved from sensory modality. This does not require an involvement of conscious awareness. Therefore, manas is close to body schema. Ahaṅkāra involves consciousness so it may relate to body image. Sāṅkhya could be considered as a pioneer of time and space in cognitive science.

69

Let me return to a discussion about the relationship between sacred time and rituals in ancient India. With regards to the vedic temple, Kak (2009) not only described the temple that is the place binding time and space but also argued ritual ceremonies that unite an inner cosmos (i.e., the human mind) and an outer cosmos (i.e., external physical world). Kak

(2009) focused on architectural aspects of the vedic temples and noted three different shapes of altars: circular, half-moon, and square. The circular altar is a symbol of the earth for the outer cosmos and of the body for the inner cosmos. The half-moon altar is a symbol of the atmosphere for outer cosmos and of heart for the inner one. The square alter is a symbol of heaven for the outer cosmos and head for the inner one (Kak, 1995). Additionally, the agnicayana altar (i.e., the fire altar), the main altar in the vedic rituals, consists of a thousand bricks built in five layers. The five layers signifies the five great elements (e.g., earth, water, fire, air, and ether) that are associated with the five senses (Kak, 2009), which coincides with the Sāṅkhya’s cosmos continuum model.

To sum up, the ancient Indian understanding of time is more complex than that of the Babylonians. The Ṛgveda indicates three different properties of time: 1) eternity, 2) cyclicity, and 3) ephemerality while the Enūma Eliš suggests that time is marked by the movements of heavenly bodies. Time in ancient India seemed to be more powerful than the Babylonian time. The ancient Indian time as eternal time was the law that organized orders of the world. However, this required a scarification. By sacrificing himself, the cosmic giant Puruṣa acquired eternity and cyclicity so he was able to create space and all elements in the space. In terms of the ephemerality of time, a concrete event or event unit was often marked by the ritual ceremonies. Through the rituals, sacred time came back

70 and the outer and inner cosmos were in accord. Models of the continuum of cosmos had been developed from Taittirīya upaniṣad of the Yuriveda to Sāṅkhya. In terms of time and space, Sāṅkhya’s expanded model showed similarities to cognitive science. First, both

Sāṅkhya and cognitive science consider that time and space emerge from a sensorimotor connection. Second, both views differentiate mental representations of the world depending on an involvement of consciousness.

China

Many sinologists (e.g., Marcel Granet, Joseph Needham, A. C. Graham, etc.) speculated about the ancient Chinese view of a continuum of time and space. They pointed out that the ancient Chinese view of the world might have been shaped by the correlative, associative, or metaphoric mode of thinking about the world.15 Bodde (1991) argued that the correlative thinking creates a series of connected ideas and is based on metaphor. In line, Wu (1995) discussed the fact that the ancient Chinese constructed space and time within the web of experience, that is, in a contextualized world. According to Pankenier

(2004) who applied the metaphorical term ‘fabric of space-time’, the early Chinese synthesized time and space via a metaphoric way of thinking, as an art of . These accounts show that the early Chinese built the concepts of space and time in reflection of their understanding of the world in a holistic way by connecting interrelated items and

15 Fung (2010) reviewed the ideas of correlative thinking developed by Western sinologists including Granet, Needham, Graham, Hall, and Ames, and pointed out a problem of their dualistic view of correlative vs. analytic way of thinking. As shown in the main text, recent scholars tend to avoid the term ‘correlative’ and to exchange it with ‘metaphoric’. Although Fung (2010)’s arguments on the premise of correlative thinking are valuable, in this chapter, I do not intend to delve into different scholars’ assumptions. 71 grouping them. One of the earliest notes about this was made by Granet, one of the most influential figures in the study of the Chinese mind. In his La Pensée chinoise, Granet

(1934), asserted that the ancient Chinese would not only see that time and space consist of blocks but also would think that a unit with the correlated blocks create an assemblage.

Although Granet’s remark on blocks of time and space seems to reflect on the conception of space and time as container of the physical material world, which is related to modern science (Mondragon & Lopez, 2012), the most important aspect in Granet’s discussion is that time and space in ancient China were associated with events caused by concrete actions :

All [Chinese thinkers] prefer to see in time an assemblage of eras, seasons and epochs, and in space a complex regions, climates and directions. In each such directions, extension [i.e., space] particularizes itself by assuming the attributes peculiar to a single climate or region. In the same way, duration [i.e., time] differentiate itself into varied time periods, each bearing the characterization appropriate to a single season or era. (Granet, 1934, p. 86, trans. by Bodde, 1991, p.104)

Time and space are never conceived apart from concrete actions…. The words shih [“occasion” or “timeliness”] and fang [“direction” or “regions”] apply respectively to all portions and parts of duration and extension- each and every one of which, however, is in each instance viewed under its own distinctive aspect. The two terms are evocative neither of space nor of time per se. Shih calls to mind the idea of circumstance or occasion (which may be either propitious or unpropitious for a give action); fang, that of direction or location (which may be either favorable or unfavorable for a particular instance. Thus time and space form a complex of symbolic conditions, both determining and determined; they are always imagined as an assemblage of concrete and diverse groupings of locations and occasions. (Granet, 1934, p. 88-9, trans. by Bodde, 1991, p.104)

Granet’s argument regarding the role of correlative relationship in building an assembled block is also reflected in the Chinese word for space consisting of two Han characters yu () and chu (). They share a radical, ‘ ’, that designates a ‘roof’ and establishes a link between yu and chu. When the two characters are independent from each other, yu means ‘eaves’, ‘room’, or ‘world’ while chu refers to ‘roof timbers’, ‘house’, or 72

‘eternity’. The compound word yuchu () denotes the eternal world, which can be understood as an infinite universe, i.e., the cosmos. Yuchu also signifies all things and happenings in nature. This suggests that the early Chinese understanding of the cosmos was different from that of ancient India. In my earlier discussion I pointed out that, in ancient India, time had three properties: eternity, cyclicity, and ephemerality. As the eternal law, ṛta organizes the world. Space and all filling elements of space cannot be created without the cosmic giant’s sacrifice. Puruṣa attained eternity through his sacrifice. The idea of an acquisition of eternality via God’s self-sacrifice is absent in the Chinese yuchu.

According to Granet, the early Chinese might not have considered eternity because there is no concrete action that can be correlated to eternity. As alluded to above, abstract time was difficult to think of for the ancient Chinese (Bodde, 1991). Then, how did the early

Chinese understand time? This can be seen etymologically with a word shijian (). The first character shi () consists of three components, the radical ri () standing for the sun, tu () meaning the earth, and cun () denoting something small or a radial artery pulse at the wrist. Shi refers to a moment or a happening of an event established by a concrete action of budding or blossoming. Composed of gate (men ) and light (ri ), jian () means ‘between’. Putting these components together, shijian indicates a temporal interval between specific moments. In other words, duration as a temporal block is defined by a relationship between particular events. This way of understanding of time demonstrates that the ancient Chinese people conceived of time holistically while associating time with concrete actions or events.

73

In contrast to this holistic and correlative understanding of time and space, there existed a philosophical school looking at space and time as abstract concepts (Bodde, 1991).

This school is called Mohist, that was founded by Mozi ( ca. 479 -381 BCE). The

Mohists’ extensive discussion showed a different speculation on time and space compared to the holistic views (Harbsmeier, 1995). They considered time and space in relationship with movements of an object. This abstract way of thinking about time and space

(Needham, 1966) was not widely accepted by the early Chinese (Bodde, 1991):16

Time Canon I 40 : Duration (chiu :) includes all particular (different) times (shih ). Exposition : Former times, the present times, the morning and the evening, are combined together to form duration.17 (Needham,1966, p.93)

Space Canon I 41 : Space (yu ) includes all the different places (so ) Exposition : 'East, west, south and north, all are enclosed in space.18 (Needham,1966, p.93)

Movement in space (frames of reference) Canon II 63 : When an object is moving in space, we cannot say (in an absolute sense) whether it is coming nearer or going further away. The reason is given under “spreading (fu )” (i.e., setting up coordinates by pacing). Exposition : Talking about space, one cannot have in mind only some special district (chhü 傴). It is merely that the first step (of a pacer) is nearer and his later steps further away. (The idea of space is like that of ) duration (chiu :). (One can select a certain point in time or space as the beginning, and reckon from it within a certain period or

16 Yuan (2006) argued that the Later Mohist Canon does not deal with time as an abstract concept by analyzing the letters etymologically. She also discussed that the comparative philosophers’ approach to the text not only misrepresents the ancient Chinese concept of time but also excludes the subjective character of time. However, Yuan’s interpretation of subjective time in the Mohism seems problematic. First, the Mohists’ distinction of duration from time that is marked by events is not properly reflected in Yuan’s translation. Second, the etymology of chiu (:) does not take the main argument on time by the Mohists into account. 17 40 : : 18 41 : : 74

region, so that in this sense) it has boundaries, (but time and space are alike) without boundaries.19 (Needham,1966, p.93)

Movement and duration Canon II 64 : Movement in space requires duration. The reason is given under “earlier and later (hsien hou ).” Exposition : In movement, the motion (of an observer) must first be from what is nearer, and afterwards to what is further. The near and far constitute space. The earlier and later constitute duration. A person who moves space requires duration. 20 (Needham, 1966, p.94)

In terms of time, the Mohists differentiated duration (chiu :) from event

(shih ).21 This seems to correspond to two psychological building blocks, duration perception and event detection, as discussed in chapter 2. Although the Mohists’ perspective on time and space was different from other schools, they probably did not assume a complete independence between time and space. Needham (1966) wrote,

“Perhaps the Mohists envisaged something like what we should now speak of as a universal space-time continuum within which an infinite number of local reference frames coexist, and guessed that the universe would look very different to different observers according to their positions in the whole” (p. 95).:

19 63 : : 傴 20 64 : : : :: 21 Here I did not take Needham’s translation of shih () of time, that is, change. I would rather interpret it as event. The reason for this can be found in the above etymological discussion of shih (). 75

Space and Time Canon II 13 : The boundaries of space (the spatial universe) are constantly shifting. The reasons is given under ‘extension (chhang ).’ Exposition : There is the South and the North in the morning, and again in the evening. Space, however, has long changed its place.22 (Needham, 1966, p.94)

Canon II 33 : Spatial positions are names for that which is already past. The reason is given under ‘reality (shih ) Exposition : Knowing that ‘this’ is no longer ‘this’, and that ‘this’ is no longer ‘here’, we still call it South and ‘North. That is, what is already past is regarded as if it were still present. We called it South then and therefore we continue to call it South now.23 (Needham, 1966, p.94-5)

In the following section, I will return my discussion regarding the representative ancient Chinese view on time and space with historical evidence. For this, I will first examine the early documents, including the Book of Change (yijing ) and its commentaries. This will allow us to understand the ancient Chinese holistic views of the world. I will also discuss how time and space are established in music-making in qin performance.

The transition from the Feudal age (Zhou dynasty 1030 - 221 BCE)24 to the early imperial periods (Qin dynasty 221 - 207 BCE and Han dynasty 202 BCE - 220 AC) contributed significantly to the formation of the two branches of Chinese thoughts, that is,

Confucianism and Taoism. Although the Book of Change, an ancient Chinese divination text from the early Zhou, is too enigmatic to derive the definitive Chinese concepts of time

22 13 : : : : 23 33 : : 24 According to Needham (1962, Vol.4, part 1., p.431), the Zhou dynasty consists of three different periods including the Early Zhou (1030 - 772 BCE), the Chhun Chhiu (722 - 480 BCE), and the Warring State (480 - 221 BCE). 76 and space, its commentary called the Ten Wings (shiyi ) provides us with some hints regarding how the early Chinese had shaped their own views of the world. After his review of several excerpts of the Book of Change and the Ten Wings, Lin (1995) argued that the former used time and space to deliver oracles and that the latter demonstrated how scholarly interpretations transformed the divination text into a philosophical discussion of life and the world.

To begin with, the Book of Change describes the world as neither stable nor fixed.

Rather it continuously changes. However, the principle that mandates the changes of the world is constant. In other words, two opposite but complementing life forces (qi ), yin

() and yang () are engendered by the great ultimate (taiji ). Dynamics between yin and yang lead to the change of all creations in the world (Chan, 1963). Then, how could the principle, which is unchanging and stable, create changes? Both yin and yang wax and wane but they do not completely overwhelm the other. The constant changes arising from two opposite forces create the repetitive cycles of nature. The Book of Change indicates that the constancy of the principle is achieved by the cycles of day and night, four seasons, etc. The relationship between the unchanging principle and its manifestation, cyclicity, seems to be parallel to that between eternality of ṛta and cyclicity associated with ṛta in ancient India. The early Chinese also considered wood, fire, soil, metal, and water as the five elements (wu xing ). These elements change gradually in the course of the cycle.25

25 Bodde (1991) discussed how the yin-yang theory was combined in the five elements theory in ancient China. He viewed that the two theories are entirely independent of each other in the beginning. The yin-yang theory first appeared during the early Chou period and had no cosmic significance. Similarly, Redmond (2017) speculated the yin-yang theory arose from multiple sources and argued that the Book of Change may be one of them. The five elements theory was discussed a little later in the Zuozhuan (). It describes only 77

This seems to correspond to the five layers of a continuum of cosmos described in models of Taittirīya upaniṣad and Sāṅkhya.

Let me examine how the Book of Change and its commentaries explain the principle in the context of the Chinese philosophy. In the Book of Change, yin and yang are transformed into the eight trigrams 26 and expanded to the 64 hexagrams. The first hexagram is qian ( ), a symbol of heaven and dragon. Its judgment text is yuan heng li zhen ()27 that has been known to all educated Chinese people. Translating yuan heng li zhen as “begin with an offering; beneficial to divine” (p.63), Redmond (2017) interpreted this phrase as an introductory invocation for the divination act and a proclamation to the spirits. However, an earlier translation of this invocation suggested that qian is the fundamental organizing principle of the world. For example, Legge (1882) regarded qian as “what is great and originating, penetrating, advantageous, correct and firm” and Wilhelm & Baynes (1967) translated yuan heng li zhen as “the creative works sublime success, furthering through perseverance” (p. 6). The first of the Ten Wings, Tuan zhuan() commented on this judgment as follows:

material substances that are not regarded as constituents of the world. Zou Yan (; 305 – 240 BC) united two theories for the first time. However, the five elements were not treated in philosophical writings before the late third century BC. After the Qin dynasty was established, the ideas of the two theories have been adopted in all schools of thoughts and integrated with Confucian moral and social values into an all- embracing system. 26 The eight trigrams include qian (, heaven), zhen(, thunder), kan (, water), gen (, mountain), kun (, earth), xun(, wind), li(, fire), and dui (, a collection of water). Each trigram is combined with another thus making 64 hexagrams in total. (Redmond, 2017) 27 Althougth the hexagram qian ( ) are assigned to the fourth month from May to June (Wilhelm & Baynes, 1967, p.3), its judgment yuan heng li zhen () are discussed in terms of seasons. The first character, yuan (), as spring, denotes the beginning of things in the universe. Second, heng () is associated with summer and signifies growth of all things. As fall, li () refers to achievement. For winter, zhen () means completion. Yuan heng li zhen signifies a cycle of life. 78

Vast is the ‘great and originating (power)’ indicated by Qian! All things owe to it their beginning: - it contains all the meaning belonging to (the name) heaven. The clouds move and the rain is distributed; the various things appear in their developed forms. (The sages) grandly understand (the connexion between) the end and the beginning, and how (the indications of) the six lines (in the hexagram) are accomplished, (each) in its season. (Accordingly) they mount (the carriage) drawn by those six dragons at the proper times, and drive through the sky. The method of Qian is to change and transform, so that everything obtains its correct nature as appointed (by the mind of Heaven); and (thereafter the conditions of) great are preserved in union. The result is ‘what is advantageous, and correct and firm.’ (The sage) appears aloft, high above all things, and the myriad states all enjoy repose.28 (Legge, 1882)

It is important to note that qian as the principle of the world not only governs cyclical movements in nature (e.g., clouds, rain, etc.) but also acts on the world of men.

The parallels between the universe and the world of men (Bodde, 1991) implies the Chinese correlative way of thinking. Specifically, the Wenyan zhuan ()29, the seventh of the

Ten Wings, focuses on the world of men. Chan (1963) discussed the fact that the early

Confucianist Chinese might have attempted to operate the forces that they could control rather than the forces that are governed by nature. Qian in the human world is the assemblage of goodness (shan ), excellences (jia ), righteousness (yi ), and action

(shi ). According to the Wenyan zhuan’s interpretation, yuan heng li zhen is:

What is called (under qian) 'the great and originating' is (in man) the first and chief quality of goodness; what is called 'the penetrating' is the assemblage of excellences; what is called 'the advantageous' is the harmony of all that is right; and what is called 'the correct and firm' is the faculty of action. The superior man, embodying benevolence, is fit to preside over men; presenting the assemblage of excellences, he is fit to show in himself the union of all propriety; benefiting (all) creatures, he is fit to exhibit the harmony of all that is right; correct and firm, he is fit to manage (all) affairs. The fact that the superior man practises these four virtues justifies the application to him of

28 29 The wenyan zhuan provides commentary only for the two hexagrams, qian (, heaven) and kun (, earth). 79

the words – ‘Qian represents what is great and originating, penetrating, advantageous, correct and firm’.30 (Legge, 1882)

Lewis (2006) focused on how early China constructed a spatial order and emphasized the role of human actions. The spatial order can be set up hierarchically by cultivating bodies, organizing families, building cities, forming regional networks, and establishing an empire. To begin with, he introduced luan () referring to chaos. In ancient

Chinese history, luan and unity (i.e., the ordered state) are associated with the Warring states and the Imperial period respectively. It is important to note not only that spatial order is not naturally given but also there is a repetitive cycle between luan and unity. This means that the early Chinese continuously strived to achieve or recover the spatial order through their actions, which might have been developed as combining the two theories of Taoism and Confucianism. The early Chinese people viewed that the ordered spaces are morally organized and achieved via self-cultivation (e.g., the way dao ). Morality shows a

Confucianist influence while self-cultivation relates to Taoism. The ancient Chinese perspectives on the ordered spaces appeared the beginning of the second chapter of the

Great Learning (da xue ), one volume of the Book of Rites (li ji ):

The ancients who wished to illustrate illustrious virtue throughout the world, first ordered well their own states. Wishing to order well their states, they first regulated their household. Wishing to regulate their household, they first cultivated their body. Wishing to cultivate their body, they first rectified their mind. Wishing to rectify their mind, they first sought to be sincere in their thoughts. Wishing to be sincere in their thoughts, they first extended to the utmost their knowledge. Such extension of knowledge lay in the investigation of things. Things being investigated, knowledge became complete. Their knowledge being complete, their thoughts were sincere. Their thoughts being sincere, their hearts were then rectified. Their hearts being rectified, their persons were

30 80

cultivated. Their persons being cultivated, their families were regulated. Their families being regulated, their states were rightly governed. Their states being rightly governed, the whole kingdom was made tranquil and happy. From the Son of Heaven down to the mass of the people, all must consider the cultivation of the person the root of everything besides. It cannot be, when the root is neglected, that what should spring from it will be well ordered. It never has been the case that what was of great importance has been slightly cared for, and, at the same time, that what was of slight importance has been greatly cared for.31 (modified from Legge, 1882)

This quote makes several points in connection to the previous discussions about the ancient Babylonian and Indian views on time and space. First, both ancient China and

Babylonia seem to consider spatial order to be important. Compared to the Babylonian spatial order set by the god, Marduk, Chinese spatial order was created by human actions.

Second, the Chinese idea of space seems unstable because of its alternation between luan and unity due to the dynamics between yin and yang. However, the Enūma Eliš described that, once the spatial order set up from the primordial watery chaos, there was no reverse.

Third, each unit of space in ancient China is placed on a continuum of the cosmos (Bodde,

1991). The units include the world (tianxia : literally means ‘under sky’), state (guo ), household (jia ), body (shen ), mind (xin ), thought (yi ) and knowledge (zhi ).

These units of the continuum were similar to the ancient Indian views of the world. In specific, the last three units seem to correspond to the three door keepers in the Sāṅkhya models. Xin ( mind), yi ( thought) and zhi (, knowledge) match to manas (mind), ahaṅk!"ra (ego) and buddhi (knowledge, intellect)

31 81

Then what was the role of music in the ancient China in conceiving of space, time, and the cosmos? In terms of the Chinese idea of space, music plays an important role in building the spatial order by bringing peace to the world. More precisely, music was composed by an ancient king as a part of ritual service that let the world restore an equilibrium through enthusiasm. The sixteenth hexagram yu ( ) in the Book of Change alludes to this calming function of music. The top trigram zhen ( ) means an activity of thunder at the beginning of summer that causes the arousal of the earth. The bottom trigram kun ( ) symbolizes the arousal of the earth. Therefore, the hexagram yu ( ) is a symbol of enthusiasm, the devotion to the movement of the nature. The third of the

Ten wings, the xiang zhuan () provides the commentary on the image of yu as follows:

(The trigrams for) the earth and thunder issuing from it with its crashing form yu. The ancient kings, in accordance with this, composed their music and did honor to virtue, presenting it especially and most grandly to God, when they associated with Him (at the service) their highest ancestor and their father.32 (Legge, 1882)

The above comment clearly demonstrates that the ruler makes music to commemorate virtue so he can please spirit of god and ancestors. Music in rites bridges the gap between the universe and the world of men. A post-Confucian work, the Records of

Music (yueji ) which is a part of the Book of Rite, reflects Confucian views on the role of music in establishing spatial order that is discussed above. The Records of Music not only suggests a connection between music, governance, and the natural order but also implies an importance of harmony between them (Cook, 1995). Specifically, Cook (1995)

32 82 argued that the Records of Music differentiates sheng (), yin (), and yue ().

According to him, there is no equivalent term corresponding to these different levels. He wrote, “at times as different stages the development of music in terms of its moral qualities, and its relation to governance, society, and the harmony of the natural world” (p. 20).33 For example, sheng is just an audible sound like a human scream or animal call. When sounds are ordered and have a meaning, they are no longer sheng, but yin. Yue is the final stage of the music development because it carries virtue. The transformations from sheng to yin and from yin to yue are associated with the level of virtue, the moral quality. Yue correlates to the human action that contributes to virtue.

DeWoskin(1982) suggested correlative thinking in the Chinese views on arts and , and states that “both music and ritual address the twin concerns of self- cultivation and social order, appropriately contextualized in the prevailing cosmic order”

(p.175). As implied in the hexagram yu, the principal concern of music and rituals is human emotions in order to reach the ideal state of mind and an ideal life. For this, music and rituals should be properly coordinated. In a discussion about the role of music, self- discipline and self-cultivation of emotion are important to achieve the ideal state of mind as rites aim to control social concerns. Therefore, musical harmony is as important as cosmic harmony. There are shared values between music and ritual. Ya () which means

33 旄 In all cases, the arising of music (yin) is born in the hearts of men. The movement of men’s hearts is made so by [external] things. They are touched off by things and move, thus they take shape in [human] sound (sheng). Sounds respond to each other, and thus give birth to change. Change forms a pattern, and this is called music (yin). The music is brought close and found enjoyable, and reaches the point of shields and , feathers and pennants, and this is called music (yue) (Cook ,1995, p. 24-5). 83 elegance is one of them. Pointing out that there is no duality between culture and nature in

Chinese way of thinking, DeWoskin (1982) explained that elegance means being a unity with nature. As Lewis’s (2006) argument above shows, being a whole with nature requires human actions, namely, practice. Strange as it may sound, being a unity with nature is not naturally given. This seems to resonate Granet’s remark on the importance of action in the

Chinese understanding of time and space by stating “Time and space are never conceived apart from concrete action” (trans. by Bodde 1991, p. 104).

This view that space and time emerge from concrete actions, more precisely, the ordered space and time emerge from human action for being in harmony with nature, is reflected in the qin () practice. Since the qin, the seven-string zither, has been the most favored musical instrument of the elite class, it is well documented in literary sources and manuscript notation.34 The most characteristic feature of the qin notation is that it notates notate neither temporal/durational values nor pitches. Rather it informs a player how to act on or interact with the instrument. Specifically, it tells how each finger of both hands moves on the instrument. The finger movements are described with symbols that are known as zhifa (), namely, a finger technique. Yung (1997) exemplified zhifa with the symbol

that stands for the ideogram mo () which indicates plucking a string inward with the right index finger. He explained that qin notation consists of a series of clusters of these

34 One of the earliest scholars who discussed the aesthetics of playing a qin and contributed to qin notation is Ji Kang (, 223-263). He was a member of the seven sages of the bamboo grove (), an anti- Confucian intellectual group supporting the Taoist ideology. He wrote two important treatises on music. First, Poetic Essay on the Qin (qinfu ) discusses on qin music ideology and the matter of performance practice. Second, Essays on Sound Expresses neither Sadness nor Happiness (Shengwuailelun ) are famous for theories of Chinese music aesthetics in terms of Taoist philosophy (Liang, 1985, p. 93-94). 84 symbols. 35 This notation is called jianzipu meaning abbreviated ideogram notation, because the symbols are taken from the written Chinese script.36 Specifically, Yung (1997) provided an excerpt from a composition Flowing water (Liushui ). The cluster consisting of five symbols is read as “the right thumb hooks the sixth string inward while the left thumb stops the same string at the seventh hui ()37; as the right thumb plucks, the left thumb slides from a point left of the seventh hui to the seventh hui. (see fig. 6)

Figure 6. Qin notation excerpt from Flowing water (Liushui 流水) (Modified from Yung, 1997, p. 4): Reprint permission granted by the publisher.

In terms of the performance practice of qin, dapu () refers to the process of transforming qin notation into music. Through dapu, a player can transform written music into being in real world. Yung (1997) described dapu as a re-creation because it involves not only reading and interpreting symbols in qin notation and literary sources but also

35 Qin notation contains a series of symbol clusters written in vertical column from right to left. 36 Wenzipu or ‘full ideogram notation’ preceded jianzipu or ‘abbreviated ideogram notation’. The instructions for a qin performance are fully written out in wenzipu. Chinese music scholars speculated that the transition from wenzipu to jianzipu gradually occurred during the Tang dynasty. (Yung, 1997) 37 Qin has thirteen inlaid markers called hui() which are placed at the outer edge of the upper surface of the body of an instrument. The carefully measured distance between hui lets a player produce harmonic tones. The thirteen hui symbolize twelve pitches (lu ) and intercalary month (run ). According to DeWoskin (1982), lu () also means ‘law’ as well as ‘regulation’ so it was regulated by a government and codified in the Book of Music. 85 making musical decisions (e.g., regarding pitch, rhythm, etc.). For Liang (1985), also seems to be a more important attribute in a musical decision in qin performance.

Although he did not use the term ‘dapu’ explicitely, Liang (1985) argued for an importance of re-creation in qin performance. This is in line with Yung who emphasized the importance of the activities of reading and interpretation in qin performance. All of these recreation activities lead to a unique personal style that is understood as the integration of the tradition and self-cultivation. To explain a series of this process, Liang (1985) introduced qindao () or the way of the qin. Qindao consists of several components, and the most interesting one is found in Liang (1985). He discussed a significance of the kinesthetic-acoustic sensibility that is acquired from finger techniques that make qin players physically experience various types of sound. Therefore, qin is “not only an art of listening but also an art of touching” (Liang, 1985, p. 209). This reminds us of Gibson’s active touch and Aho’s argument for tangibility in music performance. I believe that this statement of Liang (1985) is not limited to qin performance and can be extended to instrumental music in general. Translating dapu () as “doing the scores”, Qing (2016) briefly mentioned an aspect of spatiality of qin notation: “qin notation constructs a three- dimensional space that includes technique and fingering” (p. 131). I would not agree with

Qing’s (2016) conception of space as three-dimensional one due to its reflection of space as a container but the author made an important point that space emerges from actions that are indicated in qin notation. In addition, Liang’s (1985) word “kinesthetic-acoustic” may also explain the spatial and temporal experience of qin performance.

86

Another component of qindao is learning. Yung (1987) not only described how a student and his or her teacher play a composition together in unison until the teacher confirms that the student expresses properly the nuances of music, especially its rhythm and phrasing, but also asserted that the role of qin notation is secondary in this learning process. In line, Will (2014) suggested not only that action sequences are guided by resonance frequency of bodily movements (i.e., 2 Hz; see Rhythm perception section in chapter 2) but also that action memory acquired during the learning process in the oral- aural tradition could also contribute to the experience of time in music. In the traditional context of learning, a pupil observes and imitates his or her teacher’s performance. Will

(2014) concluded that there is no need for an external and abstract framework to organize time in music-making. Therefore, qin performance requires the recollection of a sequence of kinesthetic movements that are transmitted primarily from the teacher. Yung’s dapu and

Liang’s qindao seem to be in line with the main argument of the previous chapters, which is that time and space emerge from (multisensory) perception and bodily actions.

To conclude, the ancient Chinese viewed the world holistically where space and time are intertwined, as implied by correlative thinking. There is no explicit notion of eternity in the ancient Chinese writings but an analysis of the word for space yuchu indicates that eternity may exist in a spatial domain rather than temporal one. An etymological examination of the word for time shijian shows that the ancient Chinese understanding of time is based on a concrete event or actions. This is a different understanding of space and time compared to that of India where time has eternity and space can be created by a cosmic giant who sacrificed himself and attained eternity.

87

The earliest Chinese divination text, the Book of Change, and its commentaries described a world that continuously changes due to the dynamic between yin and yang but the changes of the world are regulated by a stable and constant fundamental principle qian.

This principle organizes all things in the universe on the basis of cyclicity. Both the Great

Learning and the Book of Rites showed the Chinese understanding of space that is alternates between chaos and the state of unity. Part of the different units of space (i.e., xin

(, mind), yi (, thought) and zhi (, knowledge)) in ancient China seemed to correspond to manas (mind), ahaṅkāra (ego) and buddhi (knowledge, intellect) of the Sāṅkhya’s cosmos model. Furthermore, an achievement of ordered space is associated with both morality, virtue, or dao (). In terms of music-making, both kinesthetic-acoustic and learning components of the way of qin (qindao) show how human actions on the instrument

(re)create space and time.

Greece

Ancient Greece seemed to have at least two different ways of looking at time and space. One is mythological and the other is paradigmatic. The mythological view is primarily narrative-based, which we have already seen in the Babylonian Enūma Eliš or the Indian Ṛgveda. The most representative examples from the Ancient Greece are

Theogony38 and Works and Days by Hesiod in the eighth century BCE. In contrast, the

38 According to Theogony, Gaia (the Earth) emerged from chaos that is expressed as mixed sweet and salty waters such as in Enūma Eliš. In both cultures, the beginning was chaos then the earth or sky came out. Gaia and her son Ouranos (the Sky) had their children. Ouranos hated his children as Apsu did (see footnote 4). Ouranos imprisoned his children, which suffered Gaia and further made her ask her youngest Titan Chronos to kill Ouranos. In Enūma Eliš, Marduk killed his wicked ancestor Tiamat. Chronos castrated his vicious father with a sickle given by Gaia, which separated the Earth from the Sky. 88 paradigmatic view is characterized by reasoning. According to this view, the world is not any longer subject to supernatural powers such as the Greek god, Chronos.39 The Mohists in the ancient China were on the way to such a paradigmatic view. On the basis of the paradigmatic way of thinking about time and space, the pre-socratic40 philosophers sought to provide an account of the world that is arranged in a natural order. Above all, they believed that it can be inherently intelligible. In terms of time, the transition from a mythical to paradigmatic mode has been regarded as the emergence of a notion of time and a rationalization process in the western thought. This shift gradually happened from Homer and Hesiod to the pre-socratics (Cornford, 1957; Corish,1986). 41

In this section, two Hesiodic poems, Theogony and Works and Days, will be examined in order to understand the mythological views of ancient Greece. Although it is narrated in a mythical mode, Hesiod’s poems present a systematic nature of the world, which became a foundation of the pre-socratics’ thoughts on time and space (Clay, 2003).

The mythological aspect of Hesiodic poems is that Theogony and Works and Days deal with the divine world and the world of men respectively (Clay, 2003). Hesiod seemed to think about time in the divine world and time in the world of men separately because they

39 The Greek word ‘Chronos (xρόνος)’ is translated as time. Before the pre-socratics, ‘chronos’ meant a length of time and, according to Corish (1986), Homer and Hesiod showed a concrete understanding of time rather than an abstract sense of time. 40 A term ‘pre-socratic’ coined by Hermann Diels is not a chronological term. Rather, the term was created to make a distinction between Socrates and others including his predecessors and contemporaries. Socrates concerned on logical, moral and ethical issues while other philosophers speculated the world cosmologically and physically. (Curd, 2016) 41 Scholars of the early Greek philosophy (e.g., Laks, Long, Lloyd, etc.) not only pointed out that the dichotomic transition from mythological to philosophical cosmology is based on problematic assumptions but also argued that a dualistic transition is oversimplified and dangerously misleading propaganda. For example, myth and reason are directly comparable, myth can replace reason or vice versa, mythological or pre-rational or pre-logical mentality is different from a scientific rationale or logical mentality. 89 reflect different aspects of time. In Theogony, I will distinguish two attributes of time in the divine world. One is eternity and the other is ephemerality, which seems similar to two properties of time represented in the Ṛgveda. Time in the world of men would be speculated with Hesiod’s another poem, Works and Days, which indicates cyclicity and ephemerality.

Ephemerality is associated with the proper and precise correspondence between human activities and the cyclical order set by nature. This reminds us of the Chinese views on space and time that are associated with concrete actions. Finally, I will discuss how the pre-socratic philosophers liberated the subject of time and space from myth and replaced it with a subject of science.

Mythological views on time and space

As stated before, Hesiod composed two poems that we know of, the Theogony and the Works and Days. These works are considered as a conversion from an older oral tradition to Greek literature (Johnson, 2017). Theogony deals with the world of the Gods in Mt. Olympus whereas Works and Days engages the human world. Accordingly, different times exist for each world. In terms of time, Purves (2004) argued that Hesiod’s Theogony reconstructs divine time that has two distinctive attributes of time. Two attributes may correspond to eternity and ephemerality of Yanchevskaya & Witzel (2016)’s term two times. The first attribute of the Greek divine time is eternity. According to Purves (2004), for example, the Gods are expressed as ‘who always (αἰὲν) are’ therefore divine time is all- compassing and omnipresent. Secondly, there is no ‘the past and future’ nor ‘first and last’ in the Greek Gods’ world. In other words, divine time is always ‘now’. This seems similar

90 to one of the ancient Indian time concepts, proper moment (ṛtu), that characterizes ephemerality. Purves (2004) called this divine time the eternal ‘now’ or everlasting

‘present’. The ancient Greek version of two times is expressed as:

They breathed into me an oracle voice, In order that I might celebrate the things of the future and the past And they bid me to hymn the race of the gods who always are But to sing always of themselves both first and last.42 (line 31-4, Purves, 2004, p. 156).

In contrast, Hesiod’s Works and Days concerns the realm of the human world. As implied its title Works and days, this poem is didactic. It instructs how people, especially farmers, could recognize the time of a year and perform proper jobs at that time:

When the Pleiades, fair star-seed of Atlas, are rising, Start on the harvest, and plow when they set. For forty nights and forty days they hidden lie Until, when the year on its annual round, They flaunt forth again when first you edge your sickle-iron.43 (line 383-5, Johnson, 2017, p. 104 - 5)

For example, the constellation of the Pleiades44 at the latitude where Hesiod lived is observed above the horizon before the sunrise in early May and sets in November. The above lines adopted from Works and Days tells that farmers harvest in November. The

42 ἐνέπνευσαν δέ μοι αὐδὴν θέσπιν, ἵνα κλείοιμι τά τ᾽ ἐσσόμενα πρό τ᾽ ἐόντα. καί μ᾽ ἐκέλονθ᾽ ὑμνεῖν μακάρων γένος αἰὲν ἐόντων, σφᾶς δ᾽ αὐτὰς πρῶτόν τε καὶ ὕστατον αἰὲν ἀείδειν. 43 πληιάδων Ἀτλαγενέων ἐπιτελλομενάων ἄρχεσθ᾽ ἀμήτου, ἀρότοιο δὲ δυσομενάων. αἳ δή τοι νύκτας τε καὶ ἤματα τεσσαράκοντα κεκρύφαται, αὖτις δὲ περιπλομένου ἐνιαυτοῦ φαίνονται τὰ πρῶτα χαρασσομένοιο σιδήρου. 44 The Pleiades are seven daughters of the Titan Atlas who carries the heaven after being defeated by Zeus. 91 poem further elaborates what to do when the almighty Zeus gives autumn downpours (in

October: line 415), when the dog star rises (in July & August: line 417), when the sun lies at the south most horizon (during winter solstice: 479), when the first cuckoo cuckoos (in March: line 486) and so on. These examples show that Works and Days provides a guidance for human activities in accordance to an annual cycle as well as to seize the ‘now’ at the proper moment of a year. The derivable attributes of human time from Works and Days are cyclicity and ephemerality. It is interesting that this poem indicates an importance of human actions in accordance with the order of nature, which is discussed above regarding the ancient Chinese thoughts on time and space.

Paradigmatic views on time and space

About one century after Hesiod, there were attempts to separate the divine world from the world of human. The attempt had been made in order to understand the natural world in terms of itself. Avoiding the mythological explanations about the world that was occupied by the anthropomorphized Gods of Olympus, Thales (c. 624 - 546 BCE),

Anaximander (c. 610 - 546 BCE), and Anaximenes (c. 585 - 525 BCE) of Miletus, proposed different ways of looking at the world. The fundamental difference between these

Ionian physicists and their predecessors including Hesiod is that Thales, Anaximander, and

Anaximenes tried to describe the world in terms of a basic material, namely, a primary substance. As a space-filling substance, a basic material can transform into multiple other things. The Ionian physicists’ primary substances are comparable to five physical elements in Sāṅkhya’s continuum of the cosmos and the ancient Chinese discussion about the five

92 elements (i.e., wu xing). Each physicist had a different element as his own primary substance and elaborated the reason why it should be the primary in his own doctrine.

Above all, it is worth noting that the transformation of the primary substance as change is associated with time.

Thales, one of the seven sages of Greece, believed that the universe originated from water and the world floats on water.45 It is not clear how Thales conceived of time with his primary substance doctrine. However, Thales’s astronomic activities let us imagine that he might have seen that time is cyclical and can be measured (, 2008). Thales, the first astronomer,46 is ascribed to predict the solar eclipse in 585 BCE.47 He made a significant advance in measuring an interval between the summer and winter solstices. He also determined the correct number of days for each season in a year. In contrast to Hesiod who used the astronomic events in order to instruct human activities, Thales tried to match his astronomic observations with a calendrical scheme. Although some testimonies about

Thales’s works are inconsistent, it seems reasonable to conclude that Thales laid a ground work for the classical solar system through his observation of heavenly bodies and viewed the world as a vast mechanical system (White, 2008).

45 Although there is no writing by Thales, his life and works documented by Herdotus, Aristotle, Plato, and doxographers. From their testimony, Nautical Astronomy, On the Solstice, and On the Equinox are ascribed to Thales. Thales’ doctrine of water as the primary substance is found in Aristotle’s Metaphysics. Aristotle called Thales the founder of philosophy (Freeman, 1953; Curd & McKirahan, 2011). According to Thales, water can be converted to vapor, a gas-state water and vapor can return to be water as well. Water can also transform to ice, a solid-state water, and with heat ice can change into water. Thales’ primary concern is how water changes. 46 Eudemus, a student of Aristotle and the author of Research in Astronomy, entitled Thales as the first astronomer. 47 In his Histories, Herdotus reported that Thales had predicted the solar Eclipse. 93

A pupil or follower of Thales, Anaximander, is thought to have used the word ‘time’ for the first time. He claimed that changes occurring all around us have regular patterns.

With regard to the primary substance of the world, Anaximander had a different idea from

Thales. Anaximander argued that only apeiron, a composite of the four substances of earth, air, fire, and water, is the primary substance (i.e., arkhē) of the world. The term ‘apeiron’ refers to unlimited or boundless. In Anaximander’s system, the primary substance, apeiron, encompasses all things and is self-sufficient, self-generative, and self-destructive. White

(2008) argued that self-sufficiency of apeiron may influence the concept of abstract time in the West. In regard to self-generativeness and self-destructiveness of apeiron, all existing things of the world are generated from their source, apeiron, and return to it when they perish. The idea that apeiron perpetuates the process of generation and perishment is similar to the Indian ṛta and the Chinese qian as the organizing principle of the world.

Apeiron itself does not change but governs the changes of all other things via a cycle of reciprocal opposites of generation and perishment (Turetzky, 1998; Curd & McKirahan,

2011). This may correspond to yin and yang in Chinese philosophy. In addition to cyclical process of generation and perishment, Anaximander referred to apeiron’s moral quality.

So the role of time is to judge injustice of a thing and to let it be punished:

Of those who declared that the arkhē (‘originating point’ or ‘first principle’) is one, moving and apeiron, Anaximander . . . said that the apeiron was the arkhē and element of things that are, and he was the first to introduce this name for the arkhē [that is, he was the first to call the arkhē apeiron]. (In addition he said that motion is eternal, in which it occurs that the heavens come to be.) He says that the arkhē is neither water nor any of the other things called elements, but some other nature which is apeiron, out of which come to be all the heavens and the worlds in them. The things that are perish into the things from which they come to be, according to necessity, for they pay penalty and retribution to each other for their injustice in accordance with the ordering of time, as he says in rather poetical language. (Simplicius’ Commentary on Aristotle’s Physics 24.13–21, Curd & McKirahan, 2011, p.16-7)

94

For Anaximenes, Anaximander’s apeiron is too vague and it is unclear as to how it governs the generation and disappearance of things in the world that cause cyclical changes.

Revealing himself as a , Anaximenes considered air as the primary substance and investigated the processes of how air changes. For him, air is transformed through condensation and its reverse, rarefaction. These processes change other substances of the world in the continuum between the heat and the cold:

Anaximenes...like Anaximander, declares that the underlying nature is one and unlimited [apeiron] but not indeterminate, as Anaximander held, but definite, saying that it is air. It differs in rarity and density according to the substances . Becoming finer, it comes to be fire; being condensed, it comes to be wind, then cloud; and when still further condensed, it becomes water, then earth, then stones, and the rest come to be from these. He too makes motion eternal and says that change also comes to be through it. (Theophrastus, quoted by Simplicius, Commentary on Aristotle’s Physics 24.26–25.1, Curd & McKirahan, 2011)

With regard to the process of change, Anaximenes recognized a connection not only between density and temperature but also between motion and change of form

(Freeman, 1953). Anaximenes argued that two reciprocal opposites are changing through the condensation and rarefaction processes. In terms of density and temperature, rarefied air refers to the heat while condensed air indicates cold. When air is rarefied, for example, it becomes fire. When air is condensed, it transforms into wind, and so on. Wind as the form of mobile vapor is located foremost on the continuum of rarefaction and condensation.

Next, cloud is a less mobile vapor. Then water is less mobile than cloud. Cloud is followed by earth and stone. Air is the only substance that persists through each change and can be transformed into other distinctive states from fire to stone. In terms of motion and change

95 of forms, Anaximenes speculated about air in a continual and everlasting motion.

Anaximenes’s theory of change was determined by a degree of rarefaction or condensation that had influences on scientists of the following generation because it enabled them to develop a quantifiable model of change (White, 2008).

In sum, an examination of three Milesian thinkers demonstrates that, in spite of differences in views of the world among them, their approaches are different from their predecessors who understood the world from mythological perspectives. These physicists initiated scientific inquiries about substances that compose the world. Their discussions about changes of primary substances lead to other questions like motion and time, which continue to be asked until today.

Heraclitus was dissatisfied with Milesian thinkers’ accounts of the world that consists of a primary substance that changes systematically by itself. He speculated how the world works and focused on how humans react to the world. Heraclitus was interested in the human condition in order to understand the world. According to Heraclitus, all things are in flux. His doctrine of flux is famously known with the aphorism, “You could not step twice into the same river ” (Plato, Cratylus, 402a), which was further elaborated as “Upon those who step into the same rivers, different and again different waters flow” (Arius

Didymus, fr. 39.2 = Dox. Gr. 471.4–5, Curd & McKirahan, 2011, p. 45) and “We step into and we do not step into the same rivers. We are and we are not” (Heraclitus Homericus,

Homeric Questions 24, Curd & McKirahan, 2011, p. 45). The flux doctrine explains the nature of our temporal experience in terms of change. Compared to the Milesian thinkers, especially Anaximenes, who had discussed the connection of change and motion in terms

96 of the physical world, Heraclitus focused on the connection between change and subjectivity in human experience of time. In other words, our temporal experience is shaped by an interaction between a certain condition of environment and human perception.

In other words, time is a collection of perceptual changes of our surrounding.

As Heraclitus said, “Eyes and ears are bad witnesses to people if they have barbarian souls” (Sextus Empiricus, Against the Mathematicians 7.126, Curd &

McKirahan, 2011, p. 42), the flux cannot be fully understood only with human perception.

Warning us not to be deceived by perceptual changes, Heraclitus insisted that the flux follows a rational principle, the logos, that is the hidden harmony behind all changes.48

Literally, the term ‘logos’ refers to the word. According to Graham’s (2008) interpretation, logos is knowledge, that is, synthesized human intelligence. Heraclitus’s account of logos reminds me of knowledge (zhi ), one of the three units of the Chinese space, and buddhi

(knowledge, intellect), one of Sāṅkhya’s three door keepers:

Although this logos holds always humans prove unable to understand it both before it and when they have first heard it. For although all things come to be [or, “happen”] in accordance with this logos, humans are like the inexperienced when they experience such words and deeds as I set out, distinguishing each thing in accordance with its nature (physis) and saying how it is. But other people fail to notice what they do when awake, just as they forget what they do while asleep. (Sextus Empiricus, Against the Mathematicians 7.132, Curd & McKirahan, 2011, p. 40)

Heraclitus’ description of the logos shows several similarities with Anaximander and Anaximenes. One similarity among them is that they are all monoists. These thinkers argued one material as the primary substance. For Anaximander, it is apeiron and

Anaximenes has air. Heraclitus argued that fire is the primary substance. For Heraclitus,

48 Nature (Physis) loves to hide. (Themistius, Orations 5.69, Curd & McKirahan, 2011, p. 42) 97 the world arises from an ever-living fire, consumed by fire through all eternity in certain cycles (Danielson, 2000). However, Heraclitus did not rely on an idea of the primary substance as a governing principle of the world. Rather, he emphasized the human ability of cognition. The type of knowledge called logos allows us to conceive of a combination the opposites (e.g., the heat vs. cold by Anaximander) or the opposite processes (e.g., condensation vs. rarefaction by Anaximenes) in flux (Turetzky, 1998; Danielson, 2000).

In sum, Heraclitus seemed to consider time in terms of human experience that relies on both perceptual changes and human intelligence, that is, logos.

The founder of the school of Elea, Parmenides, claimed that whatever can be spoken of or inquired into must exist (Turetzky, 1998). That is, reality. Seeking the reality of the world, Parmenides argued that reality is misled by perceptual illusion.49 Then what is reality for him? Accepting an idea of eternity in Anaximander’s aperion, Parmenides argued that the reality is:

Just one story of a routeis still left: that it is. On this [route] there are signsvery many, that what- is is ungenerated and imperishable,a whole of a single kind, unshaken, and complete.Nor was it ever, nor will it be, since it is now, all together one, holding together: (Simplicius, Commentary on Aristotle’s Physics 1-6, Curd & McKirahan, 2011, p. 59)

Parmenides’ reality is characterized by 1) ungenerated and imperishable, 2) whole, complete, all together and holding together (i.e., ‘what-is’),50 3) neither ‘was not’ nor ‘will

49 For in no way may this prevail, that things that are not are; but you, hold your thought back from this route of inquiry and do not let habit, rich in experience, compel you along this route to direct an aimless eye and an echoing earand tongue, but judge by reasoning (logos) the much-contested examination spoken by me. (Plato, Sophist 242a; Sextus Empiricus, Against the Mathematicians 7.114, Curd & McKirahan, 2011, p. 59) 50 Nor is it divisible, since it is all alike, and not at all more in any way, which would keep it from holding together, or at all less, but it is all full of what-is. Therefore it is all holding together; for what-is draws near to what-is.(Simplicius, Commentary on Aristotle’s Physics 22-5, Curd & McKirahan, 2011, p. 60) 98 be’ but the ‘now’, and 4) fixed and steadfast 51 (McKirahan, 2008). Therefore, for

Parmeides, human temporal experience is unreal because it is associated with perceptual changes. In terms of the Chinese space units and Sāṅkhya’s cosmos model, Parmenides may accept only knowledge (zhi ) and buddhi (knowledge, intellect) as the reality because other components in the Chinese space units and door keepers involve sense data.

He believed not only that all changes seemingly created by the opposites are illusion of the senses but also that the things of the world do not arise from something different from itself or from another state in a cycle of change. This suggests Parmenides’ reality does not consider cyclicity. Real time for Parmenides is independent of any type of change whether it is external or perceptual. The real time exists ‘now’ and is ‘atemporal’. The closest understanding of time to Parmenides is Yanchevskaya & Witzel’s (2017) term two times.

According to McKirahan (2008), Parmenides’ understanding of real time (i.e., ‘what-is’ now) can be understood from an ontological aspect:

For what birth will you seek out for it? How and from what did it grow? From what-is-not I will allowyou neither to say nor to think: For it is not to be said or thoughtthat it is not. What need would have roused it,later or earlier, having begun from nothing, to grow? In this way it is right either fully to be or not. (Simplicius, Commentary on Aristotle’s Physics 6-11, Curd& McKirahan, 2011, p. 59)

That same principle of the reality can also be applied for space. McKirahan (2008) interpreted that Parmenides’ real space is not based on physical materials. For Parmenides,

‘what-is’ is not fixed in terms of physical world but it must be bounded properly, which

51 Remaining the same and in the same and by itself it lies and so remains there fixed (steadfast); for mighty Necessity holds it in bonds of a limit which holds it in on all sides (Simplicius, Commentary on Aristotle’s Physics lines 29-31, Curd & McKirahan, 2011, p. 60) 99 occupies space. This may be close to the Chinese yuchu which refers to the eternal world or as an infinite universe. Real space is unlimited, and endless, but is uniformly from all directions. In doing so, Parmenides’ space achieves the inviolability (McKirahan, 2008):

But since the limit is ultimate, it [namely, what-is] is complete from all directions like the bulk of a ball well-rounded from all sides equally matched in every way from the middle; for it is right for it to be not in any way greater or lesser than in another. For neither is there what-is-not—which would stop it from reaching the same—nor is there any way in which what-is would be more than what-is in one way and in another way less, since it is all inviolable; for equal to itself from all directions, it meets uniformly with its limits. (Simplicius, Commentary on Aristotle’s Physics 42-9, Curd & McKirahan, 2011, p. 60-1)

Exploring and extending Parmenides’ arguments, Melissus also posed a question about the reliability of sense data in terms of truth. Especially, he developed and clarified the inviolability of Parmenides’ space. According to Parmenides, ‘what-is’ is full so there is no space (kenon) to move.52 Here no space alludes to an external emptiness. Melissus approached this from the other way. ‘What-is-not’ is nothing so there is no space (kenon) required. 53 No space refers to internal emptiness. For Melissus, ‘what-is-not’ is characterized by no space, and no space means no motion and furthermore full.54 While

52 With an introduction of three spatial terms, chôra, topos, and kenon, Algra (1995) argued that chôra and topos in general correspond to space and place respectively depending on the context. However, the term kenon is more complicated than other terms. Chôra and topos are nouns but kenon is associated with the adjective kenos meaning empty. Therefore, kenon by itself can refer to three different things 1) space, 2) empty space or empty place, and 3) empty thing or empty part of a thing. According to Algra (1995), it is better to translate kenon to place. My intention in this text is to show how Melissus developed Parmenides’ ideas of ‘what-is’ in terms of space in general so I did not differentiate kenon’s meaning. 53 Algra (1995) suggested empty space or empty place for kenon. 54 [After saying of what-is that it is one and ungenerated and motionless and interrupted by no void, but is a whole full of itself, he goes on:] Now this argument is the strongest indication that there is only one thing. But the following are indications too. If there were many things, they must be such as I say the one is. . . . Hence these things do not agree with one another. For although we say that there are many eternal things that have definite forms and endurance, we think that all of them become different and change from what we see at any moment. Hence it is clear that we do not see correctly and we are incorrect in thinking that those many things are. For they would not change if they were real, but each one would be just as we thought. For nothing can prevail over what is real. But if it changes, what-is was destroyed, and what-is-not has come to be. Thus, if there are many things, they must be such as the one is. (Simplicius, Commentary on Aristotle’s On the 100 connecting no space of ‘what-is’ and ‘what-is-not’, Sedley (1982) and Algra (1995) argued that empty space proposed by Melissus is characterized by ‘immobility’ and ‘fullness’.

Therefore, Melissus’ empty space delineates ontological aspects of space in Parmenides’ inviolability:

Nor is any of it empty (or void). For what is empty is nothing, and of course what is nothing cannot be. Nor does it move. For it cannot give way anywhere, but it is full. For if it were empty, it would give way into the empty part. But since it is not empty it has nowhere to give way. It cannot be dense and rare. For it is impossible for the rare to be equally full as the dense, but the rare thereby proves to be emptier than the dense. And we must make this the criterion of full and not full: If something yields or is penetrated, it is not full. But if it neither yields nor is penetrated, it is full. Hence it is necessary that it is full if it is not empty. Hence if it is full it does not move. (Simplicius, Commentary on Aristotle’s Physics line 7-10, Curd & McKirahan, 2011, p. 129)

In reaction to the Eleatic school’s objection to motion and change (e.g., empty space), Leucippus, the founder of Atomism, 55 and his pupil, Democritus56, sought to understand not an ontological but a physical reality through atoms.57 According to these

Heavens 558.19–559.12, Curd & McKirahan, 2011, p. 129-30). The comment in square brackets is from Simplicius. 55 Two books On Mind and the Great World System (Makrokosmos) are attributed to Leucippus but nothing is left. Additionally, there is a controversy about his birth place because of insufficient evidence of his life. 56 One of Democritus’ writing includes the Little World System (Mikrokosmos), an homage to his teacher. (Curd & McKirahan, 2011). 57 Leucippus . . . did not follow the same route as Parmenides and Xenophanes concerning things that are, but seemingly the opposite one. For while they made the universe one, immovable, ungenerated, and limited, and did not even permit the investigation of what-is-not, he posited the atoms as infinite and ever-moving elements, with an infinite number of shapes, on the grounds that they are no more like this than like that and because he observed that coming-to-be and change are unceasing among the things that are. Further, he posited that what-is is no more than what-is-not, and both are equally causes of things that come to be. For supposing the substance of the atoms to be compact and full, he said it is what-is and that it moves in the void, which he called “what-is-not” and which he declares is no less than what-is. His associate, Democritus of Abdera, likewise posited the full and the void as principles, of which he calls the former “what-is” and the latter “what-is-not.” For positing the atoms as matter for the things that are, they generate the rest by means of their differences. These are three: rhythm, turning, and touching, that is, shape, position, and arrangement. For by nature like is moved by like, and things of the same kind move toward one another, and each of the shapes produces a different condition when arranged in a different combination. Thus, since the principles are infinite, they reasonably promised to account for all attributes and substances— how and through what cause anything comes to be. This is why they say that only those who make the elements infinite account for everything reasonably. They say that the number of the shapes among the atoms is infinite on the grounds that they are no more like this than like that. For they themselves assign this as a 101 mechanical philosophers, atoms, meaning simply ‘un-cuttable’, are things that cannot be cut, split, or divided. According to Leucippus and Democritus, each atom differs from others. An atom is defined depending not only on intrinsic properties like a shape but also on relational properties including positions and arrangement. Leucippus and Democritus’ arguments on intrinsic aspects of atom seem to relate to Parmenides’ reality. In terms of the inner aspects of the atom, each atom can be considered as ‘what-is’ that is full,58 eternal, sticking-together 59 because its own inner core that is unchangeable. However, the relational properties claimed by the atomists are not acceptable for Parmenides because they cause motion of the atom but do not damage the inner core of the atom.

Moreover, they say that the differences are three: shape, arrangement, and position. For they say that what-is differs only in “rhythm,” “touching,” and “turning”—and of these “rhythm” is shape, “touching” is arrangement, and “turning” is position. For A differs from N in shape, AN from NA in arrangement, and Z from N in position. Concerning the origin and manner of motion in existing things, these men too, like the rest, lazily neglected to give an account. (Aristotle, Metaphysics 1.4 985b, Curd & McKirahan, 2011, p. 111)

As implied in internal and relational properties of atom, they are separated by external empty space. As discussed previously, for Parmenides, external empty space

cause of the infiniteness. (Simplicius, Commentary on Aristotle’s Physics 28.4–26, Curd & McKirahan, 2011, p. 112-3) 58 Leucippus and his associate Democritus declare the full and the empty [void] to be the elements, calling the former “what-is” (to on) and the other “what-is-not” (to mēon). Of these, the one, “what- is,” is full and solid, the other, “what-is-not,” is empty [void] and rare. (This is why they say that what-is is no more than what-is-not, because the void is no less than body is.) These are the material causes of existing things. (Aristotle, Metaphysics 1.4 985b4, Curd & McKirahan, 2011, p. 111) 59 Democritus believes that the nature of the eternal things is small substances (ousiai) infinite in number. As a place for these he hypothesizes something else, infinite in size, and he calls their place by the names “the void,” “not-hing” (ouden) and “the unlimited” [or, “infinite”] and he calls each of the substances “hing” (den) and “the compact” and “what-is.” … The grounds he gives for why the substances stay together up to a point are that the bodies fit together and hold each other fast. For some of them are rough, some are hooked, others concave, and others convex, while yet others have innumerable other differences. So he thinks that they cling to each other and stay together until some stronger necessity comes along from the environment and shakes them and scatters them apart. (Aristotle, On Democritus, quoted by Simplicius, Commentary on Aristotle’s On the Heavens 295.1–21, Curd & McKirahan, 2011, p.112) 102 makes an object lose its status of ‘what-is’ so empty space refers to ‘what-is-not’ and for

Melissus, ‘what-is-not’ is nothing meaning internal emptiness, so no space by Melissus is motionless. In contrast, for the atomists, an existence of external empty space allows atoms to set boundaries among them and this void space allows atoms to move, which is highly credited by Aristotle:

The most powerful explanation of this is the unified theory of Leucippus and Democritus, who take as their staring-point what is according to nature. For some of the ancient thought that what-is is by necessity one and motionless; for since the void is not, and it is not able to move without a separate void, nor is there any plurality since there is nothing to divide things… But Leucippus thought he had arguments which in a way consistent with sensation would not do away with coming into being or perishing, motion or the plurality of existing things. He agrees with appearance in these things, but also with those who establish the one in holding that there could be no motion without void; and he maintains that the void is not-being and nothing of what-is is not-being. For what-is in the primary sense is completely full. But such being is not one, but infinite in multitude and invisible because of the smallness of the masses. These things travel in the void (for there is void) and they combine produce generation, as they dissolve produce destruction. For they act and are acted on insofar as they happen to come into contact with each other, for in this way they are not one. When they combine and become entangled they produce something. For from what is truly one a plurality could not come to be, nor from what is truly many a unity, but this is impossible. But as Empedocles and certain others say about things being affected through pores, all alternation and all modification happen in this way, with destruction or dissolution resulting from the void, and likewise growth through gradual accretion of solid bodies. (Aristotle On Generation and Corruption 324b35-325a6, a23-b5, Graham, 2010, p.528-9.)

To summarize, Hesiod and the pre-socratics demonstrated a diversity of the ancient

Greek perspectives on time and space. This diversity results from a shift from mythical to paradigmatic views on the world. In terms of time, Hesiodic poems describe that the divine world is associated with eternity and ephemerality whereas humanly world corresponds to ephemerality and cyclicity. His Work and Days guides human activities in accordance to a cycle of a year. The pre-socratics looked at the world from different perspectives. The three

Milesian thinkers concerned about what the primary substance of the world is, how it moves or changes, whether the substance is eternal or not, and what processes are

103 responsible for the change. In terms of time, Heraclitus took both human perception and logos into consideration. From an ontological perspective, Parmenides investigated reality that is not understood via perceptual changes and suggested ‘what-is now’. Melissus further developed real space by Parmenides with the concept of inviolability that is characterized by motionlessness. Two atomists, Leucippus and Democritus, explored empty space in terms of the atom that is unchangeable. This shift of views on the world of the pre-socratic philosophers gradually contributes to the foundation of western understanding of the world.

Conclusion

Time and space are the cognitive constructs. With these constructs, we understand the world. As a complement to chapters 2 and 3, the aim of this chapter is to understand cultural factors of the cognitive constructs, time and space, from cosmological and philosophical perspectives. For this I examined the origins of notions of time and space of four selected different cultures (i.e., Babylonia, India, China, and Greece) that are reflected in the ancient texts. The ancient fragments from these cultures describe that an ordered world that arises from chaos. The Babylonian Enūma Eliš and the ancient Chinese texts imply the priority of space. The Babylonian genesis sets spatial order first - probably due to the influence of environmental conditions. The Chinese word for space yuchu contains the concept of eternity. On the contrary, the Indian Ṛgveda describes eternal time as the governing rule of the world and space can be created with eternity via scarification of a god. In Hesiod’s Theogeny, the divine world represents eternal time.

104

When it comes to cyclicity, the Ṛgveda does not clearly state this property. Through an examination of the wheel chariot metaphor in the famous riddle of the universe hymn,

I speculated about cyclicity which eternal law presumes. The Enūma Eliš describes cyclicity as a by-product of movement of the spatially ordered god-stars. The Chinese divination text, the Book of Change, and its commentaries reflecting Confucianist and

Taoist philosophy find a cyclicity in time, which further leads to a continuous cycle between chaos and recovery of spatial order. It is due to the dynamics of yin and yang.

With correlative thinking, space units establish a continuum of ‘world – states – household

– body – mind – thoughts – knowledge’ and an achievement of harmony between the world and humanity through cultivating the body is emphasized. A similar idea can be found in

Hesiod’s Works and Days that provides a guide for human activities on the basis of an annual seasonal cycle.

In terms of ephemerality, I found an importance of rituals in the akītu festival, the solemn vedic rituals, and the Book of Rites. Ancient India developed the idea of a continuum of the cosmos on the basis of the proper moment, the ephemeral time of the vedic rituals. I have discussed Sāṅkhya’s cosmos model that may correspond to the Chinese continuum of space units. These two models encompass external and internal worlds.

Especially they divided three levels of internal world that consists of mind (i.e., xin or manas), thought or ego (i.e., yi or ahaṅkāra), and knowledge (i.e., zhi , ahaṅkāra).

Among the pre-socratic philosophers, Heraclitus’ consideration of both human perception and logos seems to be in parallel with two of these three levels. On the contrary, Parmenides’

105 ontological discussion of reality negates the mind due to its involvement of sense data and focused on knowledge.

This chapter demonstrates the cultural diversity of time and space concepts in spite of shared attributes of the world, that is, eternity, cyclicity, and ephemerality. It shows that their views on the world and their understandings of time and space rely significantly on the way they interact with their environments and experience the world.

106

Chapter 5. Transformative power of music-making and the origins of music-making

Music is a special kind of time and the creation of musical time is a universal occupation of man.

From Wachsmann’s “Universal perspectives in music” in Ethnomusicology (1971, p. 384)

The ability of music to transcend social, spatial, and psychological distance without an accompanying physical presence.

From Seeger’s “What can we learn when they sing?” in Ethnomusicology (1979, p. 384)

Transformative power of music-making

List (1984), a vanguard of non-universalism of music, arriving at his conclusion that “Nirvana of the universal in music, alas, unattainable” (p. 47) in the paper “Concerning the concepts of the universal and music”, denied that the power of music transforms human experience. To begin with, I would like to discuss how we can understand nirvana before thinking about the universals of music-making. In Buddhism, the term ‘nirvana’ refers to a transcendent state where an individual emancipates him- or herself from sufferings or desires arising from everyday life.60 This indicates that nirvana is the ultimate freedom that

60 Wŏnhyo (원효, 元曉, 617- 686), one of the most influential Buddhist thinkers, lived in the transition period from the Three Kingdoms of Korea to the Unified Silla whose established religion was Buddhism. He contributed to the development of the early Korean Buddhist doctrines and popularization of Buddhism. His doctrine on nirvana is understood with his famous anecdote of drinking water in a skull. In 650, he tried to go to Tang, China, to follow the traces of Indian Buddhism with his fellow monk Uisang (의상, 義湘, 625 - 702). On their way, they had to stay in a dark cave for one night due to heavy rain and storm. Wŏnhyo was thirsty before sleeping. He found water in a bowl near him and thankfully drank it. The next morning, he found that the cave they stayed is an old tomb and that the water he drank was rotten rainwater contained in a skull, which made him sick. The storm did not stop so they had to stay one more night. During the second night, they were not able to sleep and suffered from nightmares. Wŏnhyo reflected his experience for the past two nights in the tomb and realized that nothing is clean or dirty but everything is created by mind. His great awakening later became the epitome of his doctrine. That is, the perception of the world is based on the 107 can be achieved by controlling one’s own mind. Therefore, nirvana is a psychological phenomenon. Although there is a complexity in the relation between music and the mind,

Wachsmann’s (1971) discussion about the universals of music suggests how nirvana of the universal in music can be attainable. First of all, Wachsmann (1971) regarded sound as a byproduct of human music behavior and argued that the mind is complexly connected to sound. The complexity arises from the interrelatedness of 1) the physical properties of sound, 2) the physiological response to the acoustic stimuli, 3) the recognition of sound made by the human mind that is established on the basis of past experiences, and 4) the response to environmental pressures of the moment. Although these components allude to the transformative power of music on us, how they relate to each other is still unknown.

Sāṅkhya’s cosmos model (see fig. 5) may give us some clues to Wachsmann’s (1971) theoretical layout.

Interestingly, List (1971) wrote “At that time there was a particular passage in

Brahms’ First Symphony which invariably communicated electric tingles to my spine” (p.

400) in his earlier writing. List’s (1971) recognition of his bodily responses to Brahms satisfies the second and third points of Wachsmann’s (1971) model of the universal perspective of music. Given the definition of nirvana and the famous debates on the universals of music in the 1970’s, if music has any effect on the body and the mind, then

List’s conclusion is problematic. If nirvana refers to a psychological state and is one of the

perceiver’s mind. Therefore, nirvana does not exist outside but inside of our minds (for details see Biography of State Preceptor Wonhyo 元曉國師傳 by Vermeersch, 2012). 108 universals of music, List’s 1971 & 1984 writings showed a self-contradiction, because his personal anecdote shows that listening to music transforms human experience.

Then why did List (1971, 1984) conclude that there are no universals of music?

Probably, for List, the influence of music on us is subjective and inconsistent. It shows that each piece of music does not have the same effect on all people. This assumes that his understanding of the universals of music does not consider effects of music on the mind and the body. In other words, List’s understanding of the universals of music accounts only for the first point of Wachsmann’s (1971) model. Still today, there is an attempt to find the universal acoustic features of music without consideration of cultural factors, namely, the third and fourth points of Wachsmann’s (1971) model. For example, Mehr, Singh, York,

Glowacki, & Krasnow (2018) argued for a universal link between certain acoustic features of vocal music and functions of in their Natural History of Song project. It has provoked a dispute in ethnomusicology. For instance, Sakakeeny criticized the project because it does not consider songs’ social, political, and cultural contexts (Marshall, 2018).

His criticism reflects Wachsmann’s (1971) consideration of past experiences and environmental pressures. In this chapter, I argue that music-making transforms our experience of the world and that this is an undeniable universal of human music-making.

This is not a simple but rather complex processes as shown in Wachsmann (1971) and

Sāṅkhya’s models, and this transformation occurs at multiple levels. It could be on neuronal, psychological, behavioral, and cultural levels. McAllester (1971) focused an experiential aspect of music (e.g., heightened excitement or soothed tensions, etc.) in terms of emotion and noted the transformative power of music-making:

109

I am especially interested, however, in what music does to people. I would say that one of the most important of the universals, or near-universals, in music is that music transforms experience. Music is always out of the ordinary and by its presence creates the atmosphere of the special. Experience is transformed from the humdrum, the everyday, into something else. Music may heighten excitement or it may soothe tensions, but in either case it takes one away into another state of being. There are many of these musical states of being. Just to mention a few of these, one can think of nationalistic fervor, consumer fervor or religious fervor (p. 380).

Then how can we study the transformative power of music-making that is involved in these complex processes? One way to look at this can be found in the fields of

(ethno)musicology and . First of all, the transformative power of music- making has been one of the primary research topics in these fields. Ethnomusicologists and music theorists have been trained to speculate why we make music and how music influences us. Therefore, they might have better explanations regarding human music- behavior when it is at the metaphysical level (e.g., Wachsmann’s theoretical framework).

Another way to look at the transformative power of music-making can be found in cognitive psychology. This discipline allows us to break down the transformative power in music-making into multiple levels that we can test. From a cognitive psychological perspective, Harwood (1976) introduced information processing to the field of music research. Information processing is the study of how one perceives the world, constructs knowledge about the world, memorizes it, and retrieves the knowledge when it is needed.

Each of these processes is a transformation of information. A more important point that

Harwood (1976) made is that each of these processes is not universal but context dependent, which corresponds to the second and third points of Wachsmann’s model.

Another way to study the transformative power of music-making is found in cognitive neuroscience. Based on the findings of neuroscientific research for the past

110 decades, Patel (2008, 2010) proposed a transformative of the mind (TTM) theory. His TTM theory indicates an important role of biological components in musical experience and an involvement of complex processes in a transformative power of music.

Patel (2008, 2010) argued that musical behaviors make use of biological mechanisms for other brain functions like perception, attention, memory, language etc., and, reciprocally, that music has significant influences on those mechanisms.61 Therefore, for him, music is

“biologically powerful”. These effects of music can be seen by structural and functional changes of the brain. For example, a study by Schlaug, Marchina, & Norton (2009) shows effects of music on brain areas for language. Using a diffusion tensor imaging technique, for example, Schlaug et al. (2009) investigated the effects of the melodic intonation therapy on the brain of aphasia patients. They reported a significant increase in the number of the arcuate fasciculus fibers connecting Wernicke’s and Broca’s areas of the brain of the treatment group compared to aphasia patients without melodic intonation therapy. Loui,

Patterson, Sachs, Leung, Zeng, & Przysinda (2017) investigated how music affects emotion.

For this, Loui and her colleagues had a musically anhedonic participant who is not able to enjoy music. In other words, musical anhedonia is associated with low sensitivity to music- induced reward. Using the same technique, Loui et al., (2017) showed differences in the structural connectivity in the auditory-reward network (e.g., the superior temporal gyrus

61 Like other cognitive scientists, Patel’s TTM theory has been developed from comparison of language and music. Then what are the shared mechanisms between language and music? According to Patel (2008), tonality processing would be a shared mechanism between music and language. Tonality relates to the long- term knowledge of tonal hierarchies that are comparable to linguistic syntax. In addition, underlying psychological mechanisms for tonality processing may include the use of cognitive reference points and mechanisms of statistical learning. Cognitive reference points play an important role in our cognitive categories and classification systems. Statistical learning involves tracking patterns in the environment and acquisition of implicit knowledge of their statistical properties without any direct feedback. 111 and the nucleus accumbens) between a musically anhedonic participant and normal controls.

In his TTM theory, Patel (2008, 2010) argued that, in terms of evolution, humans are unique among all animals due to our ability to invent things that transform our own existence. In language, for example, writing can be considered as an important transformative technology. The development of a writing system allows us to share complex thoughts beyond time and space, which further contributes to the maintenance and distribution of knowledge. While taking Ong’s (1982) claim that literacy restructures human consciousness and transforms the mind, Lawson (2014) elaborated the TTM theory from an ethnomusicological perspective. She examined the effects of a music writing system on the shuochang performance. Shuochang is a northern Chinese narrative form. In specific, Lawson (2014) compared different shuochang performances. One performance is practiced on the basis of an oral/aural tradition. The other is practiced with the support of a Chinese logographic script. She hypothesized that visual orthography stabilizes the shuochang performance. Although it was not conclusive due to a small sample size,

Lawson (2014) reported that the shuochang performance based on a written script is less variable than the opposite case.

From an embodied perspective, Kung (2017) investigated how cultural factors affect meter perception. Pointing out that both pulse and meter are cognitive constructs, she challenged the most influential Western metrical theory that is based on the assumption of a hierarchical structure. She discussed not only that the Western metrical concept is based on music notation (i.e., a visual-spatial representation of time) but also that the

112 hierarchical metrical structure is a byproduct of the analyses of notated music. In contrast, music-making in oral cultures is associated with bodily actions, which allows us to overcome the limits of the psychological present or the memory span (Kung, 2017). From my personal experience with djembe learning from a who has never learned music from written score, I observed that his memory span for drumming is extraordinarily long and often it is associated with chunks of musical onomatopoeia. I will discuss musical onomatopoeia in detail in the ‘Music vs. Language’ section.

In sum, there are plenty of examples to support the idea that music transforms our experience at various levels in a complex way. The transformative power of music-making is a universal phenomenon. This brings us the most important question in music research.

What is music? In retrospect, I initially accepted a definition of music as organized sounds.

This definition was a foundation of my music. I, as a composer, set my own rules to arrange notes on a paper. At that time, I believed that I had created a new sonic world. According to Kung’s (2017) argument, however, writing music is the arrangement of abstract sound symbols in the visual-spatio domain. For my past years, my education in ethnomusicology enlightened me that my old definition of music has been too narrow to discuss all the human activities that we call music. Human music-making consists of a multitude of different human behaviors. With regard to defining music, Nettl (2000) pointed out not only that there is no agreement on definitions of music but also that different cultures show various understandings of music. In the Islamic world, for example, there is a clear distinction between Musiqi and non-Musiqi. Musiqi includes lullabies, wedding songs, work songs, military music (Tabl Khanah), vocal/instrumental improvisations (Taqasim, Layali,

113

Qasidah, Awaz), serious metered music (Muwashshah, Dawr, Tasinf, Batayihi), music related to pre-Islamic or non-Islamic origins, and sensuous music. On the contrary, non-

Musiqi is associated with chanting of religious or sacred texts. It consists several types of chants: qu’anic chant (Qia’ah), call to prayer (Adhan), pilgrimage chant (Tahlil), eulogy chants (Madih, Na’t, Tahmid), and chanted poetry with noble themes (Shi’r) (Neubauer &

Doubleday, 2001). In contrast to the many subdivisions of music in the Islamic culture, the conception of music by the Blackfoot Indians in North America is more comprehensive.

Their term saapup encompasses singing, dancing, and ceremony (Nettl, 2000).

In spite of the behavioral diversity of music-making, there are two types of human activities that is associated with sound production in general as Nettl (2000) noted, “All societies have vocal music. Virtually all have instruments of some sort” (p. 468). Given the above discussions on the transformative of power of music in chapters 2 & 3, a distinction of music performance, namely, vocal and instrumental music-making raises a question. Do the different modes of music-making transform our experience of time and space differently? First of all, findings in cognitive sciences for the past two decades have given some indications of this, as mentioned in chapter 2. In terms of acoustic processing, differential processing between vocal and non-vocal sounds has been reported (e.g., Belin et al, 2000; Belin et al., 2002; Belin & Zatorre, 2003; Gunji et al.,2001; Lee et al, 2015;

Levy et al, 2001; Levy et al., 2003). Vocal and non-vocal sounds also affect rhythm and rhythm memorization processing differently (Hung, 2011; Klyn et al., 2015). These studies suggest that vocal vs. instrumental music-making transforms, at least, the human

114 experience of time differently. I speculated about potential different transformative powers between singing and instrument-playing on spatial experience in chapter 3.

At any rate, this distinction of vocal vs. instrumental in music-making has attracted scholars who investigate systems and human cognition. While Nettl

(2000) noted the co-existence of vocal and instrumental music-making in all human societies, Fitch (2005, 2006) argued that the co-evolution of vocal and instrumental music is a human-specific phenomenon. This is because we are the only using both vocal and non-vocal sounds simultaneously and interchangeably (e.g., whistle language, talking drum, and musical onomatopoeia) whereas other animals communicate either vocally (e.g., bird song or whale song) or non-vocally (e.g., bimanual drumming by chimpanzee Pan troglodytes). Therefore, knowing the differences between vocal and instrumental music- making not only can give us a better idea of the transformative power of music but also may deepen our understanding of the origins of music and the evolution of human cognition as well.

How can we comprehend the differences between vocal and instrumental music- making? In this chapter, I will investigate this question from the interdisciplinary perspectives with regard to the evolution of music and language. First, I will approach this distinction of vocal and instrumental music-making historically. Since antiquity, various music cultures explicitly or implicitly make the distinction between the vocal and instrumental music. The fact that ancient societies made a distinction between vocal and instrumental music can be considered an indication of their different origins. Then, is there any evidence that allows us to speculate about these two different trajectories of music-

115 making in the prehistoric time? One way is to look at archeological evidence and the other way is to compare different animal species using vocal and non-vocal communication systems. Therefore, I will discuss the evolution of vocal and non-vocal communications in the human prehistory mainly from the perspective of archeology. The two different modes of vocal and non-vocal sound communications are prevalent in animal kingdom. In terms of acoustic communication, some animal species rely primarily on vocal sounds but others use non-vocal sounds. Next, we humans use both vocal and non-vocal sounds in two different behaviors, that is, music and language. However, human vocal sound communication has often been considered as language while non-vocal communication has been regarded as music. Therefore, a non-vocal form of language (e.g., speech surrogacy) and vocal music have largely been ignored. This may be due to the fact that the early cognitive scientists were primarily linguists. All four forms of human acoustic communication system (i.e., vocal form of language, speech; non-vocal form of language, speech surrogate; vocal music; non-vocal music, instrumental music) should be systematically investigated with equal importance in order to have a more comprehensive picture of the development of human cognition in the context of .

Therefore, I will compare design features of vocal vs. non-vocal music and language. I believe that historical, archeological and comparative tracing of the evolutionary trajectories of different music-making may give us some hints on how music transforms our experience, and further how music contributes to the development of human cognition.

116

The origins of vocal and instrumental music in the human history

Since antiquity, a distinction of vocal and instrumental music has been made in various societies. For example, the earliest written distinction is found in the Nāṭyaśāstra, a treatise on the performing arts, traditionally attributed to the sage (muni) Bharata, who treated singing and instruments separately. In the twenty-eighth chapter of the

Nāṭyaśāstra, Bharata distinguished different types of music:

Thus the song (gāna), the instrumental music (vādya) and the acting (nāṭya) having different kinds of appeals (vividhāśraya, lit. depending on different kinds) should be made by the producers of plays like a brilliant entity (alātacakra-pratima). That which is made by the stringed instruments and depends [as well] on various other instruments, and consists of notes(savra), Tāla (time-measure) and verbal themes(pada) should be known as the Gāndharva. As it is very much desired by gods and as it gives much pleasure to Gandharvas, it is called Gāndharva. Its source is the human throat (lit. body), the Vıṇ̅ ā and the flute (vaṃśa). I shall describe the formal aspects of (lit. arising from) their notes. (Chap. XXVIII 7-10. trans. by Ghosh, 1961, p. 2-3)

The above quote demonstrates the origin of the Indian conception of vocal and instrumental music. Bharata had made a distinction between vocal and instrumental music on the basis of the different sources of the musical notes (), that is, the human throat and instruments respectively. Śārṅgadeva, an Indian music theorist of the thirteenth century and the author of the Saṅgītaratnākara (Ocean of Music), followed Bharata’s tradition.

The Saṅgītaratnākara has been regarded as one of the most important and influential works in the history of Indian music. Śārṅgadeva discussed that saṅgīta is constituted by song

(gītam), instrument (vādyam), and dance (nṛttam). Among these three components, song precedes the others:

117

(v) the definition of saṅgīta: its classification as mārga and désī : 21c-24b Gītam (vocal melody), vādyam (playing on instruments) and nṛttam (dancing), all the three together are known as saṅgīta which is twofold, viz. mārga and désī. That which was discovered by Bramā and (first) practiced by Bharata and others in the audience of lord Śiva is known as mārga (saṅgīta), which definitely bestows prosperity; while the saṅgīta comprising gītam, vādyam, and nṛttam, that entertains people according to their taste in the different regions, is known as désī. (v) the predominance of gītam: 24c-25b Dancing is guided by instrumental music, which in its own turn, follows the vocal practice. Therefore, the vocal melody (i.e., gītam), being the main constituent (of saṅgīta) is expounded in the first instance. (Chapter 1:21c-25b, trans. by Shringy & Sharma,1996, Vol.1, p.10-1)

Such a distinction between vocal and instrumental music can be also found in the

West. For example, Anicius Manlius Severinus Boethius’s (480–524) De institutione musica implies this distinction. Following the Ancient Greek philosophical tradition,

Boethius had synthesized his predecessors’ ideas and had written the De institutione musica. It later became a foundation of the Western theory and philosophy.

As one component of the ‘quadrivium’, music is one of the fourfold paths to the knowledge of essence together with arithmetic, geometry, and astronomy (Bower, 2001). Boethius classified and ranked music into three categories: musica mundane (music of heaven), musica humana (music of human), and musica instrumentalis (music of instrument).

Musica mundane is an omnipresent force of the universe that governs the courses of spheres, the structure of the elements, and the seasons of the years. Boethius argued that musica mundane is acoustically imperceptible because the fast motion of the universe does not generate audible sound. As a unifying force, musica humana integrates the body and soul into a harmonious entity. According to Boethius, music of instruments (musica instrumentalis) includes string, wind, and percussion. Although Boethius was not explicit about vocal music in his discussion in the fifth Century, the surviving musical notations suggest that already the Ancient Greeks differentiated vocal and instrumental music. 118

Hagel’s (2009) analysis shows two complete sets of the music notation systems depending on whether it is instrumental or vocal.

One of the most interesting Western studies about the different origins of music accompanied an elaborated speculation that can be found in the Essay on the Origins of

Language62 by Jean-Jacques Rousseau (1712 - 1778). As one of the most influential political philosophers, music theorists, and educators in the Age of Enlightenment,

Rousseau also provoked a heated debate about music and language. He argued that language and music originate from the expression of human passion and discussed that various means of communication that are associated with the different types of senses:

The general means by which we can act upon the senses of others are limited to: namely, movement and the voice. Movement is immediate through touch or is mediate through gesture; the first, having an arm’s length for its limit, cannot be transmitted at a distance, but the other reaches as far as the line of sight. That leaves only sight and hearing as passive organs of language among dispersed men. Although the language of gesture and that of the voice are equally natural, nonetheless, the first is easier and depends on less on convention: for more object strike our eyes than our ears and shapes are more varied than sounds; they are also more expressive and say more in less time. (Rousseau, 1761, trans. by Scott, 1998, p. 290).

The above citation illustrates that Rousseau made a distinction between movement and voice. Movement consists of touch and gesture. In connection with Śārṅgadeva’s saṅgīta, the voice in Rousseau’s discussion may correspond to the vocal melody (gītam) while touch based movement seems to correspond to instrumental music (vādyam).

62 In chapter 12 of Essay on the Origins of Language, Rousseau (1761) elaborated his discussion about passion from which both song and speech originate: Along with the first voices were formed the first articulations or the first sounds, depending on the kind of passion that dictated the one or the other. Anger wrests menacing cries which the tongue and the palate articulate; but the voice of tenderness is gentler, it is the glottis that modifies it, and this voice becomes a sound. Only its accents are more or less frequent, its in sections more or less acute depending on the feeling that is joined to them. Thus cadence and sounds arise along with syllables, passion makes all the vocal organs speak, and adorns the voice with all their brilliance; thus verses, songs, and speech have a common origin. (Rousseau, 1761, trans. by Scott, 1998, p. 317-8). 119

Additionally, gesture-based movement may refer to acting (nāṭya) or dancing (nṛttam) respectively. It is also interesting that Rousseau’s movement can be understood in terms of body space and peripersonal space of chapter 3. First, Rousseau distinguished movement depending on an involvement of touch. This seems similar to the distinction of two components of body space, that is, postural schema and superficial schema. Second,

Rousseau (1761) mentioned that the arm length can limit movement, which reminds me of one of the names of peripersonal space, namely, arm reaching space.

Although in East Asian cultures there is no written document that shows a distinction between vocal and instrumental music, they nevertheless seem to make distinctions between them. The names of various musical genres indicate whether it may be instrumental (e.g., Korean Sanjo, Chinese Jiangnan sizhu), vocal (e.g., Korean p'nsori,

Chinese shuochang), or combination of them (e.g., Japanese sankyoku) (Witzleben, 2001).

The ancient Chinese instrumental classification system, bayin (‘eight tone’ or ‘eight timbre’) was constructed during the Zhou dynasty. This system classifies instruments on the basis of the materials that they are made of: metal, stone, skin, gourd, bamboo, wood, silk, and earth (Liang, 1985).

To conclude, various cultures have made clear distinctions between vocal and instrumental music. The writings about music from India and the West and genres of East

Asian music indicate that these cultures have distinguished vocal from non-vocal activities.

Especially, the Indian treatises see the sound of vocal and instrumental music arising from the throat and musical objects, respectively. In a broad sense, music-making also includes a non-acoustical component, namely, dance, action, or gesture. Interestingly, Rousseau’s

120

(1761) discussion about non-vocal communication in the Essay on the Origins of Language shows a connection to body space and peripersonal space discussed in the chapter 3. The documents I examined in this section distinguish different origins of vocal vs. instrumental music. From that, can we trace the different origins of vocal and instrumental music- making in the prehistoric time? Is there any evidence for the different origins of two modes of music-making that is not written? In the following section, I shall look at archeological data that suggest abilities for vocalization and non-vocal sound communication of early humans.

The vocal and non-vocal communications in the human prehistory

With man song is generally admitted to be the basis or origin of instrumental music. As neither the enjoyment nor the capacity of producing musical notes are faculties of the least direct use to man in reference to his ordinary habits or life, they must be ranked amongst the most mysterious with which he is endowed.

From Darwin’s The Descent of Men (1871, p.333)

Prehistory vocal communication

An ethno-organologist, Montagu (2017) raised the question “vocalization versus motor impulse: which came first, singing or percussive rhythms?” (p.2). Associating the vocal and instrumental music with distinctive acoustic parameters, pitch and rhythm respectively, he speculated the two different origins of music. Furthermore, he distinguished vocal music from speech in human vocalization while discussing differences in physical requirements between two types of vocalization. According to him, the first type, vocal music, is mainly associated with controlling discrete pitches. The second type, speech, requires an ability to produce the various consonants and vowels in addition to the 121 capability of pitch control. Therefore, Montagu (2017) hypothesized that vocal music may precede speech because the former involves one process whereas the latter is associated with two processes.

The most systematic approach to the human capability of music-making in the prehistoric time can be found in Morley (2013). He dealt with the evolution of music from a cognitive archeological perspective and investigated underlying mechanisms of different types of music-making and their contribution to the development of human cognition. First,

Morley (2013) noted differences in archeological evidence between vocal and instrumental music from which we can speculate about human music-making in the prehistoric time.

With regard to vocal music, human body parts primarily responsible for vocal production are made of biodegradable tissues. These body parts are not durable. As the famous aphorism among archeologists ‘Absence of evidence is not evidence of absence’ warns, any interpretation of musical capabilities of our ancestors from archeological data must be done with caution. Although there is no direct clue for vocalization ability of the early human, preserved bone structures of the early humans allow archeologists to speculate about the early human capabilities for vocalization. With regard to instrumental music, some musical instruments that were resistant to degradative processes are preserved.

Similar to Montagu’s (2017) distinction of vocal music vs. speech, Morley (2013) distinguished vocalization from verbalization (i.e., speech). Morley (2013) discussed the development of vocalization on the basis of two points: 1) the change of vocal physiology and its potentials for vocalization and 2) the neurological development to control the vocalization.

122

To begin with, human vocalization involves an interplay of three subsystems including the supralaryngeal vocal tract resonating system, the larynx vibrating system and the subglottal respiratory system (see fig. 7). The supralaryngeal vocal tract resonating system consists of the nasal cavity, oral cavity, and pharynx. The vibratory system includes the larynx whose vibration changes air pressure to sound waves of voice. The respiratory subglottal system comprises lungs, diaphragm, and trachea. All three systems play important roles in varying pitch, intensity, contour, and the duration of sounds, etc. In terms of differences of physiological control for singing vs. speech, singing involves a more opened vocal tract and requires greater control of airflow than speaking does (Frayer &

Nicolay, 2000).

Figure 7. Human vocalization system

In terms of changes in the vocal in the early humans, Morley (2013) discussed the articulatory organs that are critical to control pitch (intonation), duration, timbre, and intensity. He explained how the development of vocal anatomy is associated 123 primarily with pitch regulation. The larger supralaryngeal space and the lower placement of the larynx play an important role in pitch regulation. Two anatomical structures are indicative of the laryngeal position control. First, the hyoid bone, the small bone that connects to the tongue base, supports the larynx (see fig.7). The position of the hyoid bone and the larynx are indicated by the angle of the mylopyoid groove, the groove on the inside of mandible. The fossil evidence for the hyoid bone dates back to ‘Lucy’s baby’, the juvenile Australopithecus afarensis 3 million years ago. The second anatomical structure that involves pitch regulation is the basicranium, the bottom of the skull. The degree of curvature of the basicranium is indicated by the basicranial flexion. This is found not in the Australopithecus but the .

In addition to that of vocal anatomy Morley (2013) also argued that human vocalization had become fully functional together with the development of the brain and ears in the Homo ergaster. Previous fossil endocast studies of the hominin brains suggest the development of the language-associated brain regions (e.g., Broca’s area). The connection of the subglottal respiration system with the periaqueductal grey matter (PAG) shows emotional components in vocalization. Using an fMRI technique, Brown, Ngan, &

Liotti (2008) attempted to localize the laryngeal-specific brain areas while their participants performed a series of articulatory organ movement required to tasks including 1) phonation

(i.e., vocalization) for vocal cord abduction, 2) glottal stops for vocal cord adduction, 3) lip protrusion, and 4) vertical tongue movement. They found not only significant activation in the precentral gyrus for phonation and larynx specific tasks but also a significant difference in the location of the larynx motor area between the non-human primate and

124 humans. As to the Rhesus monkey, the frontoparietal operculum is identified as a larynx motor and seems to be associated with an extrinsic laryngeal musculature. On the basis of the results that the dorsal-posterior presentation of the larynx is overlapping with the lip area of the precentral gyrus in humans, Brown and his colleagues proposed a duplication and migration model for which the representation for the extrinsic laryngeal muscle movement has preserved ventrally in the precentral gyrus where that for the intrinsic laryngeal muscle movement area has migrated dorsally (see fig. 8). In recent comparative studies, researchers argued that the brain development plays a more important role in speech acquisition than the development of the vocalization system. Pointing out a methodological problem of early studies whose data is based on postmortem primates,

Fitch, Boer, Mathur, & Ghaznfar (2016) found evidence that a living macaque monkey’s vocal tract can produce speech sounds. The authors concluded that, compared to the human brain, the non-human primates’ brain did not develop neural networks for sophisticated vocal controls, which is in line with Brown’s laryngeal duplication and migration model.

Dunn & Smaers (2018) investigated the relationship between complexity of vocal repertoire and cortical association areas in non-human primates. Cortical association areas

(e.g., the prefrontal cortex) are within the neocortex. They are associated with the higher cognitive processes. Dunn & Smaers (2018) found the positive association between the cortical association areas and the hypoglossal nucleus in hominoids. Given that the hypoglossal nucleus is related to the motor control of the tongue, lower jaw, and the areas of neck and chest, the authors concluded that vocal complexity co-evolved with an increase in capacities for higher cognitive functions.

125

Figure 8. Laryngeal duplication and migration model (Brown, 2019): AF = arcuate fasciculus; IFG = inferior frontal gyrus; LMC = larynx motor cortex; Reprint permission granted by the author.

An upright posture of the Homo ergaster requires a continuous head adjustment corresponding to locomotion activity. Morley (2013) speculated that bipedalism and locomotion may have led to significant changes of the ear morphology, the shortening and flexing of the basicramiun, and the expansion of brain size in the later hominin species.

Prehistory non-vocal communication

Deliberating on the origins of dance and instrumental music as motor impulse,

Montagu (2017) hypothesized that flint tools may be the oldest preserved musical instruments that serves a rhythmic accompaniment. This hypothesis has been examined and tested for decades under the Lithoacoustic project led by Cross and his colleagues

(Cross, Zubrow, & Cowan, 2002; Blake & Cross, 2008). Through an examination of stone artifacts whose sonic and musical significances were experimentally assessed, they argued

126 that flint blades might have been used as portable sound tools (i.e., lithophone) in the Upper

Paleolithic contexts.63

While an ability of vocal music-making in prehistoric time has been surmised from anatomical evidence (e.g., the hyoid bone, etc.) and neuroscientific findings (e.g. the larynx motor cortex, etc.), some musical objects, excavated from archeological sites, gives us some idea about an ability to produce those instruments (i.e., technology) and possible of the prehistoric non-vocal music. In the beginning of non-vocal music- making, it may be possible that the early humans produced sounds by using natural objects.

Our ancestors may have beaten their bodies for music-making (e.g., hand clapping). They might pick a tree branch and then hit a rock. One day, they might have found that some rocks made pleasant sounds and drew painting on the resonating rocks. Those rock gongs have been discovered worldwide (Fagg, 1956; Jackson, Gartlan, & Posnansky, 1965). All of these scenarios are plausible. Mithen (2006) speculated that the might have used natural materials to produce sound by blowing an object, swirling a string of animal or fibers in the air, beating objects, etc. An ability to produce non-voice sounds with natural materials might emerge much earlier than production of artificial musical instruments. In the field of archeology, however, whether these objects are man-made musical instruments or not is a primary concern. It is because artificial objects reflect different stages of human cognition in terms of evolution due to an involvement of tool- technology. In the following, I make a distinction between natural materials, used

63 The Upper dates back approximately from 45,000 years ago to 12,000 years ago and is famous for musical instruments in Europe. The earliest of different technology repertoires, including flint tools, that characterize the , coincided with the arrival of anatomically similar Homo sapiens. The Paleolithic population maintained the hunter-gatherer lifestyle (Morley, 2013). 127 presumably for non-voice sound production, and the bone flutes in terms of the first musical instruments. The latter, as the man-made objects, demonstrates complexity of technology.

Although the above discussion by Montagu (2017) and the Lithoacoustic project by Cross and his colleagues suggested flint tools as one of the earliest music instruments, which seems quite plausible, the primary function of the flint stone is not sound production.

It seems reasonable to think that flint tools were co-opted for their own uses including knapping rather than producing sound. Then what is the earliest human artefact made solely for music-making? As alluded to in my discussion about vocalization in the pre-historic era, the durability plays an important role in being claimed to be the oldest music instrument, too. Objects that we can identify as music instruments are mainly made of bones. In 1995 at the Divje babe I cave site in Slovenia, a group of archeologists were excited with their discovery of a femur of a young cave bear with holes. The holes seem equidistant from each other, which could be an indication of the oldest musical instruments

(Kunje & Turk, 2000) and this bone has been called the flute. Other researchers, however, regarded the holes as tooth marks of a carnivorous animal (d’Errico,

Villa, Llona, & Idarraga, 1998; Morley, 2013). This means that the bone may not be a human but made by chance. Investigating the tooth marks of the Neanderthal bone flute, Diedrich (2015) identified the femur bone was punctured by an ice age spotted hyena and concluded that Divje babe I flutes are incorrectly dated. Diedrich (2015) also argued that the incorrect estimate of the Neanderthal flute date occured because the researchers did not take an archeological context, that is, specific environment of that time, into

128 consideration. So far, the currently oldest known man-made music instruments without any doubt were found in the , , Vogelherd, and Geißenklösterle in the southwestern part of (Conard, Malina, & Münzel, 2009). Conard et al. (2009) pointed out the polished surface of the flutes, carved notch, and refined finger holes made by thinning technique. These features lack in the Neanderthal flute excavated in the Divje babe I cave. The German flutes were made from either bird bones or Mammoth ivory. The dates of these materials are estimated to be from approximately 35,000 - 40,000 years ago when the modern humans lived in Southern Europe. Presumably, they could not only manufacture but also play those instruments.

The existence of these bone flutes which is thought to have been only in Europe is associated with their specific environment. Although there is evidence for the technology and use of bone at the in Africa around 70,000 years ago (d’Errico et al.,

2003; Morley, 2013), the technology for manufacturing bone artifacts had somehow been lost in Africa and yet it had continued in modern humans who had been migrating from

Africa to Europe. Birds were a main subsistence source for larger mammals including our ancestors in Europe. This demonstrates an importance of an environmental stressor in the abilities of flute production of the early humans. Furthermore, d’Errico et al., (2003) argued that the complexity of bone technology must have been reliant upon the basis of some kind of linguistic communication. It indicates a transmission of the shared knowledge of the technology in a certain community. Morgan et al. (2015) hypothesized that even production, before the development of bone technology, might have involved linguistic communication. Investigating how the stone tool making contributes to a

129 development of human cognition, Morgan and his colleagues explored possible connections between the stone tool production and language. Specifically, they argued that the production of stone tools might lead to the evolution of language and teaching. Morgan et al. (2015) proposed five different transmission chains of stone tool production depending on different types of social learning mechanisms. They experimentally tested the efficacy of the proposed five chains. Participants were asked to transmit the knowledge of stone tool production with one of the following five chains: 1) reverse engineering, 2) imitation/emulation, 3) basic teaching, 4) gestural teaching, and 5) verbal teaching. In all conditions, participants were given a core stone and a hammer stone and asked to make a flake with the given tools. Each participant learned how to make a flake and then teach the next participant in the same chain. In the reverse engineering setting, a participant was asked to make a flake without any instruction in a social setting. The participants in the reverse engineering setting were allowed to see flakes that were already manufactured. In the imitation or emulation setting, a student participant saw finished flakes and observed how his or her tutor participant made a flake. However, any interaction between a student and a tutor was not allowed. In the basic teaching, a tutor participant demonstrated the production of a flake and could adjust student participant’s movement when manipulating tools. In the gestural teaching setting, a tutor and a student could interact using any gesture but vocalization was not allowed. Last, teachers and students were permitted to speak in the verbal teaching. In this experiment, the participants’ performance was measured by the number of viable flakes, the proportion of viable flakes, the probability of a viable flake per hit, etc. The best performance on flake production was found in the verbal teaching

130 condition. Morgan et al. (2015) interpreted the experimental results that stone technology allowed the early hominin to have complex communication on the basis of abstract concepts such as symbols. Given that the early humans acquired bone technology after stone technology in addition to findings of Morgan et al. (2015), the production of musical instruments might also rely on verbal communication.

To conclude, I reviewed archeological data from the pre-historic time, which is suggestive of different origins of vocal and instrumental music-making. In the following section, I will look at animal communication systems in order to see whether there is any comparative indication for the different origins of vocal vs. non-vocal human music- making.

The vocal and non-vocal communications in the animals

Animal vocal communication

When she turned and saw her brother to be a bird, the older sister was very upset. ‘Oh adε,’ she said, ‘don’t fly away.’ He opened his mouth to reply, but no words came out, just the high cooing cry of the muni bird, the Beautiful Fruitdove (Ptilinopus pulchellus). He began to fly off, repeating the muni cry, a descending eeeeeeeeee. …The boy was now a muni bird and continued to cry and cry. After a while the cry became slower and more steady.

From Feld’s Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression (1982, p. 20)

Two types of animal vocal communication are distinguished, namely, calls and songs. There are several differences between calls and songs in general. Acoustically, animal calls are shorter and less complex than songs in general. Calls are almost genetically fixed, which implies no involvement of (Tomasello, 2008). Functionally,

131 animal calls found in many social animals are associated with alarming a potential danger and/or keeping conspecifics in contact. In contrast, animal songs show acoustic features that make us regard them as musical (e.g., for example temporal regularity and a slight variation in sound pattern). Some animal species can learn songs and they can evolve through time. Sexual display is known to be one of functions of animal songs.

In the following, I will first focus on calls of non-human primates and then discuss the vocalization of primate communication and its implication of the origin of human language. Then, I will look at analogous traits between singing animals and humans and examine a vocal learning hypothesis in connection with two timing mechanisms (i.e., event timing and emergent timing) that was introduced in chapter 2.

Animal calls

Animal calls have been observed in a variety of social animals. Vocal production of calls differs depending on different social environments and contexts (Cheney &

Seyfarth, 1985; Townsend, Rasmussen, Clutton-Brock, & Manser, 2012). Geissmann

(2000) reported that loud/long calls (e.g., pant-hooting of chimpanzee, great call of gibbons

Hylobates, etc.) are generally used in a territorial or an alarm context. Geissmann (2000) also observed that the gibbons’ duet happens not only for territorial advertisement but also for pair bonding. Acoustically, a male gibbon in the duet produces distinctive types of short phrases that become gradually more complex. A female inserts great calls consisting of rhythmic series of long notes uttered with increasing tempo and/or increasing peak frequency. From his personal communications with other researchers and his own

132 observation, Geissmann (2000) made an important point regarding an ability to keep pulse by stating “non-human primates, unlike humans, do not seem to be able to keep a steady pulse in their song vocalizations” (p. 119). The inability of the non-human primate to have beat extraction will be discussed in detail in terms of a vocal learning hypothesis in the following animal song section.

In addition, non-human primate vocalization has been a main interest among many researchers due to its indications of the origins of language in terms of signs. Cheney &

Seyfarth (1985) observed that vervet monkeys produce different vocal signals depending on the type of predators, which implies shared acoustic conventions, that is, acoustic symbol use. Furthermore, Arnold & Zuberbühler (2006) observed that putty-nosed monkeys combine two basic existing call sounds in order to create a variety of messages although the calls are innate and structurally fixed vocalizations. Combining two basic calls, they can generate information about the presence of different types of danger and induce actions for a group. After observing five populations of wild orangutans, Wich et al. (2012) reported that orangutans occasionally invent arbitrary calls that can spread when orangutans have shared needs. Individuals in an orangutan group can understand from the context what the function of the calls should be. These examples suggest that the non- human primates on rudimentary stages of language development and that language, understood as symbolic communication, is not human specific.

Mentioning a strong tie between vocalization and gestures in non-human primate communication, Tomasello (2008) pointed out an ignorance of gesture in studies of the primate communication and human language because researchers have focused only on

133 vocal channel. Considering both vocal and gestural forms of communication of great apes in terms of origins of language and development of symbols, Cissewski & Boesch (2016) recently discussed how great apes can communicate using displacement and productivity, both having considered human specific design features (e.g., Hockett, 1960a, 1960b). The design features will be discussed in the following Music vs. Language section. In short, displacement is the possibility of making a reference to objects, events, or ideas that are distant in space and/or time. Productivity is the possibility of creating new symbols and an unlimited number of symbol combinations. Cissewski & Boesch (2016) proposed a

“population-specific semantic shift” in order to explain use of displacement and productivity in great ape communication. Some wild chimpanzee populations are capable of modifying the meanings of existing communicative signals within a population. There is no change in the form of the signal but population-specific semantic shift results in an alteration of vocabulary items with modified meanings. This occurs through a social learning process. Additionally, Cissewski & Boesch (2016) proposed a “covertly intentional provision of eavesdroppers with natural meaning” (p. 233). Through eavesdropping that refers to monitoring and interpreting conspecific’s behaviors, great apes gather information about the behaviors. It includes not only sounds that conspecifics make but also movements, body postures, and the flowing of eye gaze as well. The proposal of Cissewski & Boesch (2016) that non-human primates’ abilities to communicate vocally and non-vocally (e.g., gesture) within their own social system allow them to acquire abstract and mental concepts may explain an emergence of language although it is restricted compared to human language.

134

Animal song

The second type of animal vocal communication is animal song that is further distinguished depending on whether it is innate or learned. Animal songs as learned behavior are observed in some birds and whales. At any rate, bird songs inspire many human cultures in different ways, as illustrated by Sound and Sentiment by Feld (1982).

Compared to animal calls, animal songs have been understood as a closer connection with music. As Kaluli’s muni story shows, birdsongs become the source of myths (Feld, 1982).

It is also well known that a French composer Olivier Messiaen (1908-1992) was occupied with bird songs and wrote Oiseaux exotiques on the basis of his transcription of bird songs.

From developmental and cultural perspectives there are commonalities in learning between birdsong and human language acquisition. In terms of development, some bird species have a sensitive period for song memorization (Brenowitz & Beecher, 2005), which is comparable to a critical period of language acquisition in human. Zebra finches

(Taeniopygia guttata) have to learn to produce a single stereotyped conspecific song before reaching maturity. If a zebra finch is isolated and loses the chance to learn the song during this critical period, it will result in abnormal song behavior. Nottebohm (2005) discussed not only that song learning in birds starts with a stage called “subsong” that is comparable to human infant babbling but also song learning in this stage is characterized with imitation.

In regard to cultural aspects of animal song, some bird species have easily discernible song dialects. For example, white-crowned sparrows (Zonotrichia leucophrys) in one area show the homogeneity of song patterns. Marler & Tamura (1964) argued that geographical variability in white-crowned sparrows’ songs is due to cultural transmission. Similarly,

135 whale songs are learned and culturally transmitted (Slater, 2000; Whaling, 2000; Payne,

2000; Noad, Cato, Bryden, Jenner, & Jenner, 2000). Specifically, Payne (2000) found that songs by humpback whales evolve during their migrations. Noad et al. (2000) further observed that their songs change within a year when new songs are introduced by a neighboring population.

Researchers found that some birds can dance to music like we do although non- human primates cannot as Geissmann (2000) noted. Dancing requires beat perception.

When an animal can extract beats from music, then it can couple its body movements with the extracted beats. Dancing is an example of emergent timing, as I pointed out in chapter

2, which is of interest to many cognitive scientists. Then, what is the characteristic of musical beat that leads us and these birds to dance? Musical beats are established on regular pulses that allow periodic expectancies that are the basis of motor synchronization to the beat. One of the first scientific investigations of animals’ abilities to synchronize to musical beats was conducted by Patel, Iverson, Bregman, & Schulz (2009). They performed an experiment with a dancing sulphur-crested cockatoo (Cacatua galerita Eleonora) named

Snowball. This animal is one of the vocal learning species. Eliminating a possibility that

Snowball might imitate human movements, Patel et al. (2009) found evidence for beat perception and synchronization in a broad range of tempi of song excerpts in Snowball’s body movements (e.g., head bobs, footsteps, etc.) that we can consider as dance. Fascinated by Snowball’s periodic motor responses to complex sound sequences, Patel and his colleagues not only proposed the vocal learning and rhythmic synchronization hypothesis but also BPS that may build on the neural structures for complex vocal learning.

136

Then, what is vocal learning? And what are the components of the neural circuitry for vocal learning? To begin with, there are two different types of learning. Jarvis (2007) made a distinction between vocal learning and auditory learning. According to him, auditory learning is an ability to make associations with heard sounds whereas vocal learning is an ability to alter or modify the acoustic and/or syntactic structures of produced sound (e.g., imitation, improvisation, etc.). Vocal learning encompasses auditory learning.

Jarvis (2007) pointed out a neurobiological difference between vocal non-learning and vocal learning animals. For non-vocal learners, the midbrain and the medulla regions govern vocalization. Vocal learners have specialized anterior vocal pathways in addition to auditory pathways that include brain regions in the cerebrums that control vocal behavior.

The forebrain regions of songbirds appear to be divided into two sub-pathways depending on different functions. First, a vocal motor pathway is primarily used to produce learned vocalizations. Second, a pallial–basal–ganglia–thalamic loop mainly involves learning and modification of vocalizations. Humans have two analogous forebrain pathways. The first pathway not only serves functions of learning speech and syntax but also consists of the premotor cortex, the anterior insula, Broca’s area, the anterior dorsal lateral prefrontal cortex, the anterior pre-supplementary motor areas, and the anterior cingulate. Speech production is accompanied by activations of components of the second pathway that is composed of the anterior striatum, the globus pallidus, and the anterior dorsal thalamus.

Recently, the vocal learning and rhythmic synchronization hypothesis has been challenged by several studies reporting that vocal non-learning animals have potential ability of a beat perception and synchronization. For example, domestic horses (Equus

137 ferus caballus) seem to be able to entrain their gaits to auditory beats (Bregman, Iverson,

Lichman, Reinhart, & Patel, 2013). However, the horses may not have an ability to extract beats in different tempi (Fitzroy, Lobdell, Norman, Bolognese, Patel, & Breen, 2018). A trained California sea lion (Zalophus californianus), named Ronan, has an ability to move her head up and down in time to the auditory beats. Moreover, Ronan like Snowball, can adjust her bob to a range of different tempi (Cook, Rouse, Wilson, & Reichmuth, 2013).

In addition, trained rhesus monkeys can press a response button in time to a rhythmic auditory stimulus (Zarco, Merchant, Prado, & Mendez, 2009). Hattori, Tomonaga, &

Matsuzawa (2013) reported that Ai, one of their three chimpanzees, spontaneously aligned her tapping with the sounds when hearing beats with 600 ms inter-stimulus intervals only.

Dufour, Poulin, Curé, & Sterck (2015) observed that manual drumming on a barrel by a chimpanzee named Barney shows characteristics of long-lasting, dynamically changing rhythms and evenness of beat. Interestingly in terms of a relationship between vocalization and drumming gesture, Remedios, Logothetis, & Kayser (2009) reported that both vocal and drumming sounds of macaque monkeys activate the caudal auditory cortex and the amygdala.

At any rate, in order to identify to what extent we humans share the neurocognitive mechanisms of temporal processing with non-human primates, Merchant & Honing (2014) reviewed studies on both event timing and emergent timing tasks (see fig. 1). On the basis of their comparative examination, Merchant & Honing (2014) proposed the gradual audio- motor evolution hypothesis while arguing that the neural circuits, engaged in emergent timing that allows rhythmic entrainment, are not deeply linked to vocal learning. Rather,

138 the authors noted that the superior temporal areas of the human brain not only have massive reciprocal connections with the premotor areas but also project the basal ganglia intensively. Compared to humans, the non-human primates show a diminished connectivity in the audio-premotor and the audio-basal-ganglia circuits. Using an electroencephalography method, Honing, Bouwer, Prado, & Merchant (2018) showed that event related potentials of rhesus monkeys appear to be influenced only by event timing related stimuli (i.e., isochrony). In line with the unified model of time perception by Teki et al. (2012), Merchant & Honing (2014) hypothesized that humans fully use the two timing mechanisms but other primates depend fully on event timing and only partially on emergent timing.

With the vocal learning and rhythmic synchronization hypothesis, Patel & Iverson

(2014) summarized several features of human beat perception and synchronization (BPS).

First, it is predictive. In many tapping experiments, for example, the participants’ responses to musical beats demonstrate that taps occur very close to beats in time but in a predictive manner. Taps precede the actual beats within a few milliseconds. This is known as a negative mean asynchrony (Repp, 2005; Repp & Su, 2013). Second, human BPS is flexible across a wide range of tempi. As shown in a discussion of spontaneous tempo by Fraisse

(1982) in the chapter 2, we prefer a tempo in 120 BPM (500 ms, 2Hz) but humans can readjust tempo between 30 BPM (2s, 0.5 Hz) to 240 BPM (250ms, 4Hz). Third, it is constructive. Patel & Iverson (2014) argued that beat perception is not just to find periodicities in sound but can be consciously altered. When people are asked to tap highly syncopated rhythms, for example, individuals pick up different acoustic features such as a

139 reference for their tapping (Patel & Iverson, 2014). Fourth, human BPS is hierarchical.

Following the contemporary western metrical theories including Cooper & Meyer,

Lehrdahl & Jackendoff, and London, Patel & Iverson (2014) argued that the hierarchical patterning of beats is meter. For this, they played a simple western melody with hemilola and asked participants to tap to the beat to the melody and found a strong beat every two beats or every three beats. In western music notation, hemiola is notated with two notes in triple meter or three notes in duple meter. Approaching meter perception from the cognitive ethnomusicological perspective, however, Kung (2017) argued that notation-based

Western music theories lead to the hierarchical view on meter. She wrote “metrical understanding is neither the result of one-way influences nor the sum of individual musical figures, but the outcome that is largely shaped by the lister’s subjectivity nurtured in his or her previous experience” (p.38). She pointed out that Western literary tradition is based on abstract principles, rather than actual listening experiences, whereas oral culture is associated with the handling of the sequential acoustical flows under the limits of the psychological present. For this, she cross-culturally investigated meter perception with three different groups, one familiar with Arab music, one familiar with Indian music, and one unfamiliar with complex meter. For this, she performed a behavioral experiment where the participants were asked to tap to the beat of Middle-eastern rhythms. I participated in her experiment. There were two different tasks. One was to tap to the rhythm and the other was to tap to the beat. I found that tapping to the rhythm was fine but tapping to the beat was extraordinarily difficult, which I was not able to complete. After the experiment, I had a conversation with Kung (2017). She told me that many participants with long western-

140 music training had difficulties in tapping the beat while participants without serious western-music training expressed no problem at all. My experience showed a strong effect of cultural factors on meter perception. Her experimental result showed no supporting evidence of hierarchical processing. I agree with Patel & Iverson’s (2014) argument that

BPS is constructive but not in a hierarchical way. Fifth, it is modality-based. According to

Patel & Iverson (2014), humans can tap to beats retrieved from an acoustic metronome more easily compared to those retrieved from visual one in terms of modality. In comparative experiments, Zarco, et al., (2009) demonstrated that human subjects showed greater accuracy and less temporal variability in the auditory stimuli than in the visual, while rhesus monkeys (Macaca mulatta) did not show evident differences in two different modalities. Last, human BPS engages the motor system. Neuroimaging studies have shown not only that beat perception involves activations of the premotor cortex, the basal ganglia

(putamen), and the supplementary motor areas but also demonstrated enhanced coupling between auditory and motor regions (Grahn & Brett, 2007; Grahn & McAuley, 2009; Teki et al., 2011; Teki et al, 2012).

Modifying his earlier vocal learning and rhythmic synchronization hypothesis,

Patel and his colleagues proposed a theory of action simulation for auditory prediction

(ASAP), for which they integrated the new observations of vocal non-learners’ capability to entrain and summarized the features of BPS. Similar to the gradual audio-motor evolution hypothesis by Merchant & Honing (2014), the Patel groups’ new ASAP hypothesis emphasizes a strong functional connection between auditory and motor

141 planning regions with a weighted emphasis on the dorsal auditory pathway for musical

BPS.

Animal non-vocal sound communication

In this section, I will concentrate on non-vocally produced sounds by non-human primates. An interesting feature of the vocalization of non-human primates is that, at the climax of their loud/long calls, a transition occurs from the calls to special locomotor activities. For example, gibbons are famous for acrobatic arm movements throughout the trees. This is called brachiation. Chimpanzees also show comparable behaviors, which will be discussed more in the next paragraph. These ritualized activities seem to contribute to the production of non-vocal sounds.

Looking at biological and evolutionary aspects of capabilities of rhythm production comparatively, Fitch (2011) discussed African great apes including chimpanzees, bonobos

(Pan paniscus), and gorillas (Gorilla gorilla) who are able to generate non-vocal sounds by pounding their limbs either on external objects as chimpanzees and bonobos do, or on their bodies as gorillas do. Fitch (2011) not only noted that this drumming serves functions of entertainment in young individuals and of display by male adults among the non-human primates but also speculated that rhythmic behaviors arising from a quasi-periodic motor impulse might exist before the early humans. Recently, Fitch (2018) summarized that, as a homologous trait between the African great apes and humans, percussive drumming may have evolved more than seven million years ago. However, ape drumming does not provide sufficient evidence of emergent timing due to an absence of the ability to entrain their limb

142 movement to the extracted beats (Merchant & Honing, 2014) in spite of some evidence of mutual entrainment in vocalization (e.g., the bonobo’s hooting).

In addition to the study of drumming, researchers have observed that in kiss squeak calls, which is part of aggressive display, by wild orangutans that may use tools (i.e., tree leaves) (Peters, 2001; Lameira, Hardus, & Wich, 2012). Peters (2001) reported that wild orangutans can use tools for other purposes like cleaning the body, making an umbrella, etc. but the squeak call is the only tool-used behavior that involves sound production.

Lameira et al. (2012) called them instrumental-gesture calls and argued that an ability to modify oro-laryngeal acoustic sound production with or without tools may be indicative of the origins of finger-assisted whistling or playing an aerophone.

Functions of animal sound communication and their implications on the origins of music

Animal sound communication serve many functions. Among them, two functions, competition and cooperation have been well investigated in connection with evolution

(Brown, 2000; Keller, Koenig, & Novembre, 2017). In this section, I will discuss the functions of competition and cooperation with regard to the sexual selection and the social cohesion hypotheses respectively and then I look at their implications for the origins of music. In addition, animal sound communication (e.g., alarming calls) are strongly associated with emotion and human music has been claimed to be a language of emotion.

Therefore I will examine different theories that consider emotion in terms of the origins of music.

143

Competition-sexual selection hypothesis

In his chapter 14 of The Descent of Man (1871), Darwin noted that the better biological capacities (e.g., elongated vocal cords) for singing in various male animals to attract females should not be neglected, although this is not a main function in humans.

However, we humans show similar musical behavior. For example, a serenade originally refers to a musical greeting and singing a song by a lover outside the beloved’s window

(Unverricht & Eisen, 2002). In an scene of the famous serenade, “Deh vieni alla finestra (Ah, come to the window)” in Mozart’s ‘Don Giovanni’ (Rushton, 2002), Don

Giovanni serenades Elvira’s maid to his mandolin accompaniment.

Calling our attention to Darwin’s suggestion that human music is a biological and may be a product of sexual selection, Miller (2000) argued that music is a because it is functionally analogous to sexually-selected courtship displays. According to Miller (2000), a psychological mechanism used in mating may be associated with emotion and music has a power to trigger emotion. Therefore, music may have an influence on an emotional preference for mate choice. Competition is important in this sexual selection hypothesis because females would select the male as a mating partner on the basis of some features. For example, a female peacock would prefer a male with large, colorful, and iridescent tail feathers while a female robin would select the one that can sing the most complex songs as her mating partner. Miller (2000) considered not only that music is operational in the evolutionary process of sexual selection but also that musical performance constitutes a display of protean behaviors of sexual display. However, this view has been criticized by Cross (2001a, 2007) who argued that

144 the role of music in courtship is insignificant compared to the other musical activities for healing, praying, mourning or instructing. In addition, sexual selection theory reflects the teleological evolutionary perspective by the early comparative musicologists (e.g.,

Alexander Ellis, and mentioned in Schneider (1991) and

Cross (2001b, 2007) and adopts a problematic definition of music as patterned sound or the perception of patterned sound.

At any rate, a recent study by Keller, König, & Novembre (2017) provides supporting evidence of competition in human music-making. The St. Thomas of

Leipzig in Germany performed music by J. S. Bach twice in front of different audiences while the authors recorded the repeated performances. In the first concert, the choir was requested to sing in front of the audience consisting of males only. In the second concert, the choir performed the same piece but four females were added to the audience. The authors analyzed the audio recordings retrieved from individuals of the choir. Their analysis demonstrates that the boys taking part in the bass part increased the brightness of their vocal sounds under the female present condition. The brightness of vocal timbre was measured by the proportion of energy in the singer’s formant spectral regions (2,500 -

3,500 Hz). It is worth noting that the boys singing the bass are older than the boys singing other voice parts. The authors’ interpretation of the data is that the female presence in audience may elicit competitive behavior exclusively in the bass singers.

Cooperation-social cohesion hypothesis

In addition to competition, music has a cohesive effect on a group or society.

Cooperation is important in terms of the social cohesion hypothesis because group music- 145 making benefits individuals within the group members. The best examples for cooperation are the Afro-American work songs in the Ellis Unit in a Texas prison in Huntsville that was filmed and documented by Seeger, Seeger, Seeger, & Jackson (1966). The prisoners sang the work songs together in a call-response form while welding axes to cut trees or digging the ground with pickaxes in synchrony with the beat of the song and, hence, with each other. These behaviors kept the prisoners from being too exhausted from hard works.

Brown, Merker, & Wallin (2000) noted in their introductory article about or that the specificity of the human ability to keep time that is not to move rhythmically but to entrain movements to an external time keeper.

Merker (2000) developed this idea by paying attention to a musical pulse that he considers as a principle device for coordinating people’s behavior in joint and synchronized performance. He raised a question as to why behavioral synchrony to a regular beat is not common in other higher animals, only in humans, although there are synchronous behaviors in including fireflies and crickets, frogs, and crabs. On the basis of the social structure of a hunter-gatherer society, Merker (2000) argued that the synchronous chorusing may play an important role in the emergence of hominids. In the early societies, synchronous vocal signaling may benefit members in a group because collective music- making may contribute to strengthening the emotional bonding (e.g., pair-bonding between sexual partners, mother-infant interaction, etc.). As discussed in the animal call section, vocal signaling itself plays an important role for the survival of species, too. Focusing on pair-bonding between sexual partners via synchronous vocal signaling, Merker (2000) considered that this way of cooperation may attract females. Merker’s (2000) perspective

146 is different from Miller’s (2000) sexual selection hypothesis in which competition is the main factor. Referring to the brain expansion of hominids, Merker (2000) further argued that the synchronous hominid chorusing may trigger a coupling of auditory-receptive and vocal-productive functions, which is a characteristic of vocal learning. The St. Thomas

Choir study by Keller et al. (2017), mentioned in the previous section, is a good example for Merker’s (2000) argument. In contrast to the increase of the brightness in the bass singers with the female presence, the authors reported no significant change in the brightness in other voice parts (i.e., soprano, tenor, and alto). Their analysis demonstrates no significant difference in sound intensity, tempo and note-onset timing for both the male- only and the female-added conditions. Specifically, note-onset timing analysis is used as a measurement of ensemble coordination. The authors demonstrated that no difference in their measurements between the two conditions is indicative of social cohesion.

Dissanayake (2000) investigated the effects of music on pair-bonding between mother and infant. The mother-infant dyad is the smallest unit and the most basic structure in all human societies. By focusing on an early interaction in this dyad, Dissanayake (2000) noted that the attunement between mother and infants engages multimodally processed, mutually improvised, and jointly constructed activities. Later, Dissanayake (2009a, 2009b) elaborated not only that multimodal - audio-acoustic, kinesic, visual, and tactile - interactions in the mother-infant dyads contributes to the evolution of an emotional bond between mother and infant but also that neural circuits of mothers’ brain are modulated due to physiological and neuroendocrine effects when mothers communicate their feelings of love and attachment to babies. Furthermore, she argued that the features of the mother-

147 infant interaction would have been adaptive for maternal reproductive success and infant survival. Using a dual electroencephalographic (EEG) recording in adult-infant dyads, for example, Leong, Byrne, Clackson, Georgieva, Lam, & Wass (2017) reported an increase of neural couplings when infants interact lively with adults compared to when infants interact watch their partners with screen images. Leong et al. (2017) also observed that infants vocalized more in live interactions and these infants, compared to less vocalizing babies, show a stronger neural synchronization with adults.

Trehub and her colleagues approached the mother-infant bonding empirically from cross-cultural perspectives. They acknowledged that some kinds of music, what we would call lullabies, are found across cultures. They argued that the primary function of lullabies is to pacify and soothe infants so as to induce sleep (Trehub, Unyk, & Trainor, 1993;

Trehub & Trainor, 1998; Trehub, Becker, & Morley, 2015). According to them, lullabies with verbal and/or non-verbal texts tend to be melodically and rhythmically simple and are highly repetitive. Nakata & Trehub (2004) investigated the difference in infants’ responsiveness between maternal singing and infant-directed speech (IDS) which is also known as baby-talk and “motherese” due to the distinctive manner when we talk to infants.

Nakata & Trehub (2004) reported that singing is more attractive for babies than IDS. In this study, six months old babies were exposed to audiovisual presentation of infant- directed singing or IDS while the authors measured the duration of the babies’ visual fixation. The results indicate that the repetitive maternal singing may modify the participant babies’ arousal level and keep the babies’ attention compared to IDS.

148

Additionally, Trehub & Trainor, (1998) noted that singing the lullabies are often accompanied by rhythmic movements of rocking, swaying and patting. Compared to the non-human primates’ duet where vocalization is followed by motor movement, in lullabies singing is juxtaposed simultaneously with body movement. This is in line with

Dissanayake’s discussion about multimodal interaction in mother-infant dyad. For a final project of Dr. Boone’s seminar on music and emotion, I performed a movement analysis of a Korean lullaby called jajangga that was posted on YouTube by a mother whose username is “The Song Family” (see Appendix A for the text). The mother was singing the lullaby while rocking her baby. The analysis yields 136 hypothetical pulses in the mother’s singing and 135 pulses in her rocking. The rocking pulses consist of two cycles that I labeled (a) and (b) depending on two different reference points on the screen. The cycle (a) refers to the time of the movement when the baby moves away from the mother while the cycle (b) refers to the time of the movement when the baby is positioned the closest to the mother. I performed Kuiper’s uniformity tests for cycles (a) (V = 2.29, p<0.01) and (b) (V

= 2.89, p<0.01) that turn out to be statistically significant. I looked at the phase relationship between singing and rocking pulses in time through the relative phase. The relative phase is the latency between one event in singing and its corresponding rocking movement in time (see fig. 9). Entrainment between singing and rocking movement was observed with some degree of fluctuation of synchrony.

149

Figure 9. Phase relationship between singing and rocking in The Song Family’s lullaby performance

Phillips-Silver & Trainor (2005) experimentally investigated how the experience of body movements contributes to babies’ rhythm perceptions. In their study, babies were listening either to the ambiguous rhythm pattern (i.e., hemiola) passively or to the same patterns while they were bounced on every second or third beats. So listening to the pattern with every second beat bouncing leads to a triple meter while that with every third beat bouncing yields a duple meter. With the head turn preference procedure, the authors found that babies prefer the auditory stimuli with accented beats that matched the beats on which they were bounced. As Dissanayake indicated an importance of multimodal processing in the mother-infant dyad, this study suggests that coherent multisensory inputs and cultural influences play an important role in early human development.

Furthermore, Cirelli, Einarson, & Trainor (2014) demonstrated the effects of interpersonal motor synchrony on infants’ prosocial behavior. In the study, an experimenter’s assistant was holding a baby and bounding to music while the baby saw the experimenter who was bounced in synchrony or out of synchrony with a bouncing

150 movement. After the bouncing movement treatments, the babies were asked to help to pick up and hand over an object dropped by the experimenter, which is a measurement for prosocial behavior. They found that the infants exposed to synchronous bouncing to the experimenter tended to show more altruistic behavior by helping her than those who were bounced to music asynchronously.

From a developmental and psychopathological perspective, Leclère et al. (2014) reviewed studies of the past few decades on synchrony in mother-child interactions and argued that synchrony should be considered as a social signal per se. They first clarified synchrony as the dynamic and reciprocal adaptation of temporal structure behavior between interactive partners. Especially in the mother-infant interaction, synchrony involves a matching of behavior, emotional states, and biological rhythms in the mother- infant dyad. At the micro level, oxytocin as a bonding hormone may enhance physiological and behavioral readiness for social engagement in the dyad. With the synchrony measurement methods, the authors demonstrated that not only better mother-infant synchrony but also more positive cognitive behavioral outcomes among children are found with familiar caregivers compared to strangers, healthy mothers compared to pathological ones, and typical development compared to psychopathological developments including

ADHD, autism, etc.

In his book Keeping Together in Time, McNeill (1995) proposed a term ‘muscular bonding’ to characterize coordinated movements in a social setting. This seems to correspond to Merker’s synchronous chorusing but McNeill’s (1995) primary concern is an evolution of dance and its relation to social cohesion. He proposed that the experience

151 of shared emotion is elicited by moving together in time. For example, synchronous movements like dance, may enhance cooperative survival strategies. Merker’s synchronous chorusing and McNeill’ muscular bonding assumes that the coordination in time, that is, entrainment, is a possible mechanism for social cohesion. In line with

Merker’s (2000) remark on potential effects of synchronous chorusing on early humans’ brain expansion via coupling of auditory-receptive and vocal-productive functions, Trainor

(2015) argued that the coupled auditory-motor pathways allow humans to entrain their movements to music and consequently contribute to social cohesion.

Emotion

Music has been considered a language of emotion. For example, Spencer (1857) said “the whole body of these vocal manifestations of emotion (e.g., grief, joy, affection, etc.) forms the root of music” (p. 427). Rousseau (1761) wrote that music originates from the expression of the human passion. Both competition and cooperation also argue for an importance of emotion. In terms of the sexual selection hypothesis, Miller (2000) considered emotion triggered by music as a psychological mechanism that plays an important role in mating. With regard to the social cohesion hypothesis, McNeill (1995) mentioned that shared emotions can be elicited by moving together in time, namely, music or dance. Arguing that music requires cognitive coordination and physical coordination,

Mithen (2006) also noted the arousal of a shared emotional state among group members in a collective music-making context. Trehub, et al. (2015) discussed that, although synchronous activities in music-making are not necessary to build or maintain social bonds,

152 musical activities could strengthen social bonds among group members. They hypothesized that a connection between synchronous music-making and social bonding is mediated by a heightened synchronous arousal. Given the discussions about a link between music and emotion, I will briefly discuss not only Panksepp’s biological perspective on emotion in terms of music evolution but also the mixed origins of music theory by

Altenmüller, Kopiez, & Grewe (2013).

Panksepp (2009) approached the emotional power of music by tracing the evolution of the basic emotional system of the brain in terms of neurochemistry. Suggesting that various social-emotion neurotransmitters including oxytocin, prolactin, and endogenous opioids may play an important role in mediating the dynamics of musical emotions,

Panksepp (2009) speculated a possible connection between social emotions and the evolution of music:

The few more subtle predictions that can be made about our musical nature can arise only from our capacity to peer more deeply into the functional structures of living animal and human brains and to identify the genetic progressions that led to the most foundational brain systems for the emergence of music, which I assume must be those that can generate social-affective experiences. Although not yet achieved, I assume many genes will be discovered that guide the construction of the basic subcortical emotional systems, and that is where the trickle of relevant evolutionary information could eventually widen into a river. (Panksepp & Panksepp, 2000 cited in Panksepp, 2009, p. 239)

Proposing the ‘mixed origins of music (MOM)’ theory, Altenmüller, et al. (2013) speculated not only that music might originate from ancient affective signaling systems that are found in many social animals but also that music might have been associated with aesthetic emotions throughout human evolution. According to Altenmüller, et al. (2013), everyday emotions (e.g., piloerection due to thermoregulation) are different from aesthetic emotions (e.g., goose bumps during music listening). They argued that there are two types

153 of chill responses in terms of the evolutionary adaptive value. One is negative and the other is positive. The negative chill responses are associated with everyday emotions whereas the positive chill responses are considered as one of aesthetic emotions. The negative chill responses seem to be related to an affective signaling of alarm calls or pain shrieks. Sounds associated with these negative responses generally cause animals to avoid the sound sources. The positive chill responses are more complex than the negative chill responses.

For example, emotions induced by music involve activations of the neural circuits for rewards. This implies that musical chills may have developed relatively recently in human evolution. Altenmüller, et al. (2013) hypothesized that the development of human auditory memory contributes to the positive chill responses via auditory learning. They argued that the positive chill responses play a role development of the human auditory system.

Altenmüller et al. (2013) further speculated that this development plays an important role for both language and music. According to the MOM model, language serves cognitive development and symbolic behavior whereas music was adapted for social functions, increasing chances of survival by better organizing of society and by adding aesthetic emotions to daily life.

Mortillaro, Mehu, & Scherer (2013) argued that emotion is associated with biological, cognitive, and social determinants because these determinants influence how to express and perceive emotion. Compared to Panksepp and MOM theories that deal primarily with vocal affective expression, Mortillaro et al. (2013) investigated the evolutionary origin of emotional expression from that multimodal perspective. Mortillaro et al. (2013) regarded multimodal synchronization as one of the main contributing features

154 of musical emotions. In Mortillaro et al.’s (2013) experiment, participants were asked to rate whether two tenors’ performance of patter song by (1792 -1868),

La Danza (1835), is compatible with emotional representation. One performance is a ‘static’ condition and the other performance is ‘dynamic’ condition. The static performance is less expressive and active in the way of performing than the ‘dynamic’ tenor. For each condition, stimuli were presented 1) audio only, 2) visual only and 3) audio-visual together.

There was no difference in the participants’ rate response between the static and dynamic conditions when stimuli were presented in a unisensory mode (i.e., audio only or visual only). However, there was a significant difference in the participants’ rating between the static and dynamic when stimuli delivered in the multimodal mode. They interpreted their results in terms of evolution. As implied in the results, we can perceive other’s emotion easily and well when it is expressed multimodally. The authors argued not only that emotion that evolved in humans plays an important role in the preparation for adaptive action (e.g., flight vs. freeze response) and in the management of social relationship but also argued that information inferred from emotional expressions let people apply different strategies, which can be adaptive.

So far I looked at whether vocal vs. non-vocal animal communication systems have any implication on two different origins of singing and instrumental music-making. In terms of vocal communication, I examined animal calls and animal songs. First, animal calls are prevalent in social animals and have a strong tie with emotion (e.g., alarm calls).

Non-human primate calls, often integrated with gestural signals, are indicative of an emergence of human language with ‘displacement’ and ‘productivity’ that had been

155 thought to be human specific design features. Second, animal songs share acoustic features of music. Researchers have reported the cultural transmission of animal songs and an ability of beat extraction and entrainment of vocal-learners. In terms of non-vocal communication, I discussed the ritualized locomotor activities of non-human primates that have often observed in combination with calls. Although some researchers (e.g., Fitch,

2011) argued that these behaviors are suggestive of human instrumental music-making, non-human primates show a limited ability to entrain. In terms of timing mechanisms, the non-human primates seem to use event timing rather than emergent timing whereas humans have an unrestricted use of both.

I also discussed two functions of animal sound communication in order to see their implication on the origins of music. First, competition, considered as a function of animal song, has been elaborated with the sexual selection hypothesis. Second, cooperation is associated with the social cohesion hypothesis. The functions of animal call are to keep a society safe and to consolidate bonding among members. Often animal calls are associated with an expression of emotion. I discussed two types of bonding, 1) pair bonding between sexual partners, and 2) mother-infant bonding. In Merker’s synchronous chorusing and

McNeill’s muscular bonding, entrainment is considered as a plausible mechanism for cooperation or social cohesion. Both the sexual selection and the social hypotheses claim an importance of emotion in competition and cooperation. Emotion is a mediating factor for these two functions of competition and cooperation. Researchers have discussed musical emotions as biological, cognitive, and social adaptation.

156

Music vs. Language

For the past few decades, cognitive scientists have discussed the evolution of music and language and formulated a parallel between them. One of the main concerns in this discourse is whether music is a pure invention. In other words, researchers questioned if music has any adaptive benefit. In one perspective, music behaviors have nothing to do with biological evolution and no survival values for our ancestors. This is called non- adaptationist theory. As the earliest non- adaptationist, James (1890) posited that music is a pure incident of having a hearing organ and music has no “zoological utility” (vol. 2, p.

627). The most prominent figure in the non-adaptationists is with his famous metaphoric expressions on music, ‘auditory cheesecake’ and ‘a cocktail of recreational drugs’ in his 1997 . For him, music is a pleasure technology piggybacking on pre-existing brain functions and music is useless in terms of biological significance. He saw music as an evolutionary byproduct that has no cognitive advantage and argued that music relies on 1) language, 2) auditory scene analysis, 3) emotional calls,

4) habitat selection, and 5) motor control. According to Pinker (1997), prosody is an important component of language. Music borrows some mental machinery from language, particularly from prosody. Prosody-like properties in music are rhythm and intonation.

Next, timbre of music is determined by juxtaposed frequency components, which are associated with the identification of different sound sources. This is an important element of auditory scene analysis. may evoke strong emotions because music contain some features that resemble our species’ emotional calls. When describing music, people often associate the emotional calls as a metaphor. Some sounds like thunder or growl also

157 can induce strong emotions because they can signal danger of surroundings. Musical rhythm involves biologically important rhythmic motor patterns including locomotion.

Another non- adaptationist, Sperber (1996) wrote “humans have created a cultural domain, music, which is parasitic on a cognitive module the proper domain of which pre- existed music and had nothing to do with it” (p. 142) while elaborating the Jamesian, hedonistic-parasitic perspective on music. In his book Explaining Culture, Sperber extended Fordor’s modularity theory. The fundamental premise of this theory is that a system can be modular. Fordor proposed that modularity remains at the low-level systems like sensory-processing (e.g., perception). Sperber, a post-Fodorian theorist, argued that modularity can be seen both to low- and high-level processes (e.g., conceptual process). In his modular model of mind, Sperber argued that a cognitive module consists of two types of domains, the proper domain and the actual (cultural) domain. The proper domain is associated with biological processes while the actual (cultural) domain is associated with a conceptual process (e.g., knowledge). He speculated that the early human communicative sounds are associated with a much poorer articulatory ability, which implies an involvement of the proper domain. Over evolutionary time, some sounds became more detectable and more attractive than any sound of the proper domain while another module arose for human vocal communication. This new module is characterized by a more structured and more generative vocal signals of the proper domain. However, a new module did not replace the old module completely because the old model had an association with pleasure. The old module changed its function by displacing its proper domain. Sperber’s old module, associated with pleasure, becomes music.

158

However, other cognitive scientists have raised the question about the non- adaptationist perspectives, and a growing number of recent findings from various disciplines led researchers to look at the origins and evolution of music with reconciled views between adaptationist and non-adaptationist (van der Schyff, 2014). As discussed,

Patel’s TTM theory is the most representative of the reconciled views. Musical behaviors make use of biological mechanisms for other functions and in return music-making changes or reshapes those mechanisms. Criticizing Pinker’s provocative metaphor of music as an auditory cheesecake, adaptationists have argued that music might not have been derived from language but shared the same precursor (e.g., protolanguage, protomusic, musilanguage, etc.). In his monumental book The Singing Neanderthals, Mithen (2006) proposed “Hmmmmm: the acronym of holistic, manipulative, multi-modal, and musical model” and speculated about a prelinguistic musical mode of communication among

Neanderthals. Brown (2000) argued that modern music and language are derived from a common precursor between music and language.

Then, what would a common ancestor of music and language look like? One of the reasonable methods to characterize this is comparison. A comparison of features of modern music and language may lead us to see what features are shared and what are not. For the comparison of music and language in the following section, I will take the basic design features of animal communication systems proposed by Charles Hockett (1960a, 1960b). I will also consider the design features of music by Fitch (2005, 2006) that had been developed on the basis of Hockett. In spite of recent arguments for reconsideration of

Hockett’s design features due to its limited use for an examination of the cognitive abilities

159 of users of a language (Wacewicz & Żywiczyński, 2014), I could not find a better alternative to Hockett and Fitch’s design features to compare music and language in a systematic way. Hockett proposed the 13 features including 1) the vocal-auditory channel,

2) broadcast transmission/direct reception, 3) rapid fading, 4) interchangeability, 5) total feedback, 6) specialization, 7) semanticity, 8) arbitrariness, 9) discreteness, 10) displacement, 11) productivity, 12) duality of patterning, and 13) tradition/cultural transmission. I already discussed two features of ‘displacement’ and ‘productivity’ in the animal call section. Fitch suggested eight more features such as 1) complexity, 2) discreteness in pitch, 3) isochronicity, 4) generative, 5) transposability, 6) performative context, 7) repeatable, and 8) a referentially expressive for design features of music.

Therefore, I examine thoroughly 21 features proposed by Hockett and Fitch. Early researchers have not made a clear distinction between vocal and non-vocal music. To the best of my knowledge, Fitch is the only researcher who distinguishes vocal from non-vocal in music. None of the researchers has considered speech surrogates, an equivalent for non- vocal language. Therefore, I will first discuss speech surrogate and then systematically analyze vocal vs. non-vocal music as well as vocal vs. non-vocal language in terms of the design features in order to have a better picture about the common origins of music and language.

Non-vocal language: Speech surrogate

Speech surrogacy is the conversion of human speech into equivalent (non-vocal) sounds for transmission in the signaling systems of messages (Stern, 1957). There are two

160 types of speech surrogacy: whistled language and an instrumental surrogate. Some people might raise a question about the production of whistle which involves the movement of vocal organs, such as the mouth. Compared to vocal sound, however, whistled speech is often equipped with an additional treatment, for example, finger(s) and leaf, etc., (Busnel

& Classe, 1976), and the use of vocal folds is not compulsory in whistling (Meyer & Busnel,

2015). Given this, I will consider whistle speech as non-vocal in this writing.

Since antiquity, the use of speech surrogates has been reported in various cultures including both tone and non-tone language cultures. With regard to whistled language,

Elien, a Greek historian of the second century, reported in his De Natura Animalim that the

Kinoprosipi people in North Africa had no language but used acute whistling (Meyer &

Busnel, 2015). Ethnographically, whistled speech surrogates have been reported worldwide. The most famous whistled speech is silbo in the Canary Islands, Kusköy in

Turkey (Busnel & Classe, 1976). The Ka’aygua and Pirahã people in South America, etc. have a whistled form of speech (Everett, 1985; Meyer, 2004; Meyer & Busnel, 2015). The earliest Chinese remark on xiao (i.e., whistled speech) appears in the Shijing poem (the eleventh to fifth BCE) where the female protagonist whistles while singing (Meyer, 2004;

Meyer & Busnel, 2015; Su, 2006). Noting “air forced outwards from the throat and low in key is termed speech; forced outward from the tongue and high in key is termed xiao

(whistling)” (Edwards, 1957, p. 218), The Principles of Whistles (Xiaozhi) discusses that the origin of xiao (whistling) is speech.

Meyer & Busnel (2015) argued that speech surrogates have been developed under an interaction between environmental constraints (e.g., forest, savannas, mountains, etc.)

161 and human activities (e.g., hunting, herding, etc.) for various purposes including secrecy, courtship, alarm, and long-distance communication, etc. Concerning how speech surrogacy works like speech, Seboek & Umiker-Seboek (1976) proposed that some speech surrogates map either a fundamental or formant frequency contour of speech onto whistling and instrumental sounds. A study by Carreiras (cited in Meyer, 2004) shows that, when participants listened to sentences composed of silbo, Broca’s and Wernicke’s areas are activated in well-trained Silbo users but not in the control group. Expanding the idea of

Seboek & Umiker-Seboek (1976), some researchers have hypothesized that pitch contour may allow speech surrogate users to access a mental lexicon. Using a priming paradigm,

Will & Poss (2008) reported that vocal primes lead faster responses than the instrumental one when participants were asked to repeat the target syllables. In line, Cheong, Will, &

Lin’s (2017) fMRI study shows a significant priming effect for vocal primed words in the bilateral Heschl’s gyri, the left planum temporale, the right planum polare, and the right anterior superior temporal gyrus in tonal language speaking non-musicians.

Saying that “whistled forms of languages are eminently prosodic phenomena as they permit the transformation of human voice into simple prosodic signals” (p.170),

Meyer & Busnel (2015) urged a consideration of whistled language and instrumental surrogate in discussion about the evolution of music and language because many models addressing the origin of music and language (e.g., Darwin’s musical protolanguage,

Brown’s musilanguage, Falk’s motherese, and Fitch’s prosodic protolanguage) point out an importance of prosodic aspects of speech. Taking the advice by Meyer & Busnel (2015),

162

I will analyze the design features of vocal music, instrumental music, speech and speech surrogates.

The design feature analysis of vocal vs. non-vocal music and speech vs. speech surrogate

Hockett’s design features are prevalent in various disciplines that ask the question regarding the origins of music and language. One of the most influential works related to this question was done by Fitch (2005, 2006, 2010) whose studies are grounded in

Hockett’s design feature. However, none of them looked at the design features of speech surrogate. In the following discussion, I will examine the design features of language and music proposed by Hockett and Fitch respectively while integrating anthropological arguments on an evolution of language (e.g., Everett, 2017; Boroditsky & Gaby, 2010) and applying the features to vocal and instrumental music as well as speech and speech surrogate.

Vocal-auditory channel

Hockett (1960a, 1960b) proposed the design features64 not only to compare various animal communication systems but also to characterize speech. He noted that the design

64 Hockett continued to revise and developed his design feature model (Wacewicz & Żywiczyński, 2014; Fitch, 2010). Later, Hockett and his colleague Altmann (1968) replaced ‘productivity’ with ‘openness’ as well as added three more features including 1) prevarication, 2) reflexiveness, and 3) learnability. ‘Prevarication’ refers to the ability to lie or deceive. Linguistic message can be false or meaningless depending on ‘semanticity’ and/or ‘displacement’. With ‘reflexiveness’, we use language to discuss language. This feature is associated with ‘openness’. According to Hockett, new meanings, an open code like language, are easily added to new or old elements so the system can be potential to communicate about anything. ‘Learnability’ means that language is teachable and learnable so speakers of a language can learn 163 features are not independent. The vocal-auditory channel refers to the acoustic signals produced by movement of the respiratory and upper alimentary that are received by the auditory system. Hockett (1960a) distinguished speech from vocal gesture. A criterion for this distinction is that the sounds of vocal gesture are produced by articulatory motions but not the part of speech. Hotckett (1960a) also discussed vocal gestures as paralinguistic phenomena. Due to practical reason, Fitch (2006) defined vocal music as ‘speech minus meaning’ (p.177). Vocal music is all sounds generated by the vocal apparatus without distinct words, and requires certain movements of organs that contribute to production of formants, resonant reinforcement of energy at certain absolute regardless of the fundamental frequency. In addition to vocal music, Fitch (2005, 2006) explained that innate human calls (e.g., laughter, crying, screaming, moaning, etc.) are additional sounds produced by articulatory motions.

Hockett (1960a) claimed that a remarkable advantage of the vocal-auditory channel is to leave other body parts free that can be used for other activities such as playing an instrument so we can do multiple tasks simultaneously, which is a human-specific behavior.

Previously I discussed how non-human primates can produce either vocal or non-vocal sounds but they are not able to do both tasks at the same time as we do. In these animals, non-vocal sound production sequentially follows vocalization. How could we humans produce vocal and non-vocal sounds simultaneously? Although it is purely speculative, I suggest that Brown’s (2008, 2019) laryngeal duplication and migration model may give us

a new language. In the text, I take the earlier model in my discussion because Fitch’s music model is based on the earlier one. 164 some hints. According to Brown (2019) proposal, the duplicated dorsal laryngeal motor cortex (LMC) has established a novel pathway (see fig. 8). The duplicated LMC is localized on the dorsal part of the primary motor cortex where it is close to neural representation of limbs. In this pathway, the inferior frontal gyrus as a control center might contribute to the neural recruitment that is responsible for the production of both vocal and non-vocal sounds. Additionally, we are the only species that use vocal and non-vocal sounds interchangeably in certain situations. In terms of language, speech surrogate users can speak or whistle or play an instrument to deliver the verbal message. Both speech and vocal music have the vocal-auditory channel while speech surrogate and instrumental music are not vocal but auditory.

Complexity

Complexity is one of the design features of music proposed by Fitch (2006).

According to Fitch (2006), complexity refers to signals that are more complex than human innate calls. Therefore, both musical and linguistic signals can be characterized by complexity. Compared to human innate calls, vocal music and speech have complex structures. Given not only that the non-human primate calls are strongly connected to involuntary locomotor activities but also that Fitch’s criterion of complexity is an innate call, it seems logical to me that the complexity of instrumental music may be achieved by a separation from vocal calls. In this regard, all four modes I analyze in this section, vocal music, instrumental music, speech, and speech surrogate, have complexity.

165

Rapid fading

Rapid fading and broadcast transmission/directional reception are features that are derived from the vocal-auditory channel. Rapid fading is associated with the physical nature of sound, that is, transitoriness. Compared to signals associated with other sensory modalities like vision, acoustic signals fade instantaneously. If fast message delivery is an important aspect in communication, rapid fading can be considered as an advantage.

However, rapid fading is disadvantageous for durability. In order to overcome immediate disappearance, Hockett (1960a) claimed a necessity of repetitive transmission. Therefore, rapid fading leads to another associative design feature, cultural transmission, especially for music and language. Since rapid fading is the general characteristic of acoustic communication system, vocal music, instrumental music, speech, and speech surrogate are characterized by this design feature.

Broadcast transmission

According to Hockett (1960a), acoustic transmission is broadcasting. Broadcast transmission means that any receiver within a certain range can detect signals. In a natural setting, a widening of broadcast range is sometimes important. An alarm call is an example for this. Detecting a potential danger via an alarming signal produced by a conspecific relates to the survival of an organism. In this scenario, an increase of a signal-to-noise ratio is important. In terms of better broadcast transmission, speech surrogate is more adaptive to a natural environment than speech (Meyer & Busnel, 2015). Ecological conditions define . For example, the park of Gunma, a part of the Amazonian

166 rainforest in Brazil, is full of sounds produced by animal activities. The high humidity level also affects broadcast transmission. Other animals’ sounds and humidity disturb the transmission of acoustic signals. Meyer & Busnel (2015) discussed the Gavião people who use speech surrogate and live in the park of Gunma. Reporting that the average intensity of background noise is about 50dB, the authors noted the mean level of the spoken Gavião is about 55dB while that of the surrogate speech is about 77dB. They also found that the spoken Gavião words rapidly fade out in a natural setting and disappear at a 16 m distance.

The hand-whistled ones remain clearly at 20 m distance. Human ears have an anatomical structure called pinna as a funneling device that amplifies and localizes acoustic signals.

This pinna also contributes to an increase of signal-to-noise ratio. In sum, a broadcast transmission is a general characteristic of a sound-based communication system. Therefore, this feature is common among vocal and instrumental music, speech, and speech surrogate.

Interchangeability

Interchangeability means that one can (re)produce a signal that can be understood.

Hockett (1960 a) described human as a “transceiver”. We are freely able to transmit and receive signals. However, there are three exceptional cases in human speech according to

Hocektt (1960 a). One case is pathological. The other two cases are culture specific. First, speech-impaired people cannot produce sound signals while deaf people cannot receive speech sounds. Second, in certain societies there exists a difference in language between men and women. This phenomenon has been reported in ethnographical works. Everett

(2005) who worked with the Pirahã, a tribe in the Amazonian forest, wrote, “The Pirahã

167 people communicate almost as much by singing, whistling, and humming as they do using consonants and vowels” (p.622). As a tonal language, the Pirahã distinguishes two phonemic tones (O’Neill, 2014) and makes a clear gender distinction in speech and whistled speech. The Pirahã women have the smaller phonemic inventory consisting of seven consonants and three vowels than men whose inventory includes eight consonants and three vowels (Everett, 2005). Everett (1985) reported that only men whistle in Pirahã society. Cowan (1976) also reported the same observation that whistled speech belongs to only men and boys among the Mazateco Indians of Oaxaca, Mexico. Third, according to

Hockett (1960a), virtuosity in speech performance is the last exceptional case of interchangeability. This feature cannot be supported in music. In many cultures, music virtuosos and their works are valued. Maria Callas (1923 - 1977) was, for example, famous for her dramatic and magnificent singing. In Mande society, professional musicians called jali (jalolu plural) or griots have played multiple roles as genealogists, praise singers, entertainers, instrumentalists, and singers for the societies (Jessup, 1983). In terms of the interchangeability of music, jali tradition has an interesting aspect. Bala jalolu are men whose main job is to play musical instruments including the balafon and kora. In contrast, female jalis called griottes or jali musoli primarily sing (Hale, 1994). This separate gender role in music performance in a musician group is another exceptional case for interchangeability. Fitch (2005, 2006) argued for the violation of interchangeability in instrumental music performance because everyone can listen to music but not all of them can play instruments.

168

In sum, Hockett and Fitch regarded speech that has interchangeability with some exceptions. Ethnographic reports (e.g., Pirahã, Mazateco, etc.) show that interchangeability lacks in some speech surrogates. Fitch expressed some uncertainty about vocal music’s interchangeability, but he clearly concluded the lack of the design feature of instrumental music. Field works from West African music-making (e.g., bala jalolu vs. jail musoli) implies the violation of interchangeability in vocal music, too.

Total feedback

Total feedback means that the user receives a signal produced by him- or herself.

Total feedback encompasses the vocal-auditory channel and interchangability. The vocal- auditory channel is a particular type of total feedback. In vocalization, total feedback means the auditory feedback to sound produced by an animal’s own vocal organs. In non-vocal sound production, total feedback refers to the auditory feedback to sound that an animal produces bimanually. In terms of peripersonal space discussed in chapter 3, total feedback for vocal sounds relates to perihead space and probably peritrunk space. Total feedback for non-vocal sounds relies on an interaction between perihead and perihand spaces. Total feedback is a particular type of interchangeability. For interchangeability, a signal producer differs from a receiver. For total feedback, a signal producer is the receiver. This indicates that total feedback involves the internalization of communicative behaviors, which further contributes to the constitution of our thinking.

Total feedback occurs with an ensemble of various modalities. Hockett (1960a) pointed out an important aspect of auditory feedback that is supplemented by kinesthetic

169 and proprioceptive feedbacks. Auditory total feedback can be comprehended as auditory motor coupling. In vocalization, movements of vocal organs produce acoustic signals. In non-vocalization, movements of body parts other than vocal organs generate sounds.

Therefore, total feedback is shared across vocal music, instrumental music, speech, and speech surrogate.

Specialization

Specialization means that bodily efforts serve no other function than its teleology

(Hockett, 1960a, 1960b). For example, dog panting is biologically essential for regulation of its body temperature. Therefore, panting is a specialized behavior for the control of body temperature. Panting behavior incidentally produces sounds. Here, panting is not specialized for sound production because the sound is a side effect. Specialization refers to a connection between dedicated biological activities and their purposes.

Speech and vocal music require movements of apparatus for vocal sound production. Instrumental music and speech surrogates using musical instruments (e.g., xiao in China, talking drums in West Africa, etc.) are associated with bimanual movements.

Whistled speech involves the control of the movement of lips, tongue and the buccinators muscle, etc., and sometimes the use of a hand or hands depending on technique. Hockett

(1960a) noted “that a communicative act, or a whose communicative system is specialized to the extent that its direct energetic consequences are biologically irrelevant. Obviously, language is specialized communicative system” (p. 407). Given that music and language

170 are two communicative acts of humans, then vocal music, instrumental music, speech, and speech surrogate can be characterized by specialization.

Semanticity

Semanticity denotes the elements of a communicative system that have associative ties with things and situations, or types of things and situations in a certain environment.

In language, the relationship between a word and its meaning can be characterized by semanticity. Hockett (1960a, 1960b) argued that speech should deliver verbal message that is built on semantic conventions. An acquisition of semanticity in language relies on total feedback because of its contribution to an internalization of communicative behaviors.

Through this internalization process, we master semanticity and can finally communicate in a specific and interpretable manner. Hockett (1960a) selected bee dancing and a gibbon’s call as examples for semanticity. For bees, dancing movements can be indicative of a possible source of nectar. For gibbons, calls inform possible dangers. In both cases, the communicative signals indicate the existence of something that may be a food source or source of danger. Cissewski & Boesch (2016) also alluded to semanticity in the great apes’ communication.

In terms of Peircean semiology, Hockett’s (1960a) description of semanticity can be interpreted as an index. Hockett’s (1960a, 1960b) terms signal and meaning can be replaced with object and sign respectively. According to Charles Sanders Peirce (1839-

1914), an object and its sign are determined by constraints in its signification process. The nature of an object constrains the nature of its corresponding sign in terms of signification

171

(Atkin, 2013). The constraint can be either qualitative, existential or arbitrary. Different qualitative features of objects are associated with their signs. Accordingly, there are three types of signs: icon, index, and symbol. For icon, a physical resemblance is between an object and its sign. Index refers to an existential or physical connection between an object and its sign. For symbol, an object is associated with its sign without any natural or necessary connection. Here, an object and its sign have an arbitrary connection. In bee dancing and the gibbons’ call, signals refer to either food or danger. This implies the existential constraint; therefore, the corresponding sign is index. Then how about music and language? Do they have this feature? I will discuss the semanticity of music and language in connection with the next design feature, arbitrariness.

Arbitrariness

Hockett (1960b) also talked about two associative ties between signals and meanings that are either non-arbitrary or arbitrary. This binary distinction seems to be kept in his other writing (Hockett, 1960a), but with different names (i.e., iconic vs. arbitrary; analog vs. digital). However, he did not distinguish index from icon in both papers compared to Peirce and it seems that Hockett’s semanticity encompasses icon and index.

Hockett’s distinction of semanticity and arbitirarinness indicates that his primary concern is an arbitrary associative tie.

Let me return the examples of bee dancing and gibbons’ calls. The rate and the vertical angel of bee dancing gives information about distance and direction for the food location respectively. Alarm calls of gibbons refer to an existence of potential dangers.

172

Again, this suggests that semanticity is based on index. However, the association not only between bee dancing movements and food location but also between calls and danger is arbitrary. Noting a connection between a degree of arbitrariness and an advantage in communication, Hockett (1960a) pointed out that it is problematic to assume that arbitrariness itself contributes to the development of an arbitrary system.

With anthropological consideration, Everett (2017) attempted to explain the evolution of language in connection with Peircean semiology. In his book How Language

Began: The Story of Humanity’s Greatest Invention, he argued that, although Peirce’s sign theory does not aim at describing the evolution of language, the three signs (i.e., index, icon, and symbols) are associated with a progression of language evolution (see fig.10), because of possible correlations between an increase in complexity of signs and a progression of language. Writing “Language arises from interaction of meaning

(semantics), conditions on how it is used (pragmatics), the physical properties of its inventory of sounds (phonetics), a grammar, phonology (its sound structure), morphology, and the organization of stories” (p. 105), Everett (2017) argued that language has been fully developed on the basis of a semiotic progression. Language emerges from index. Then icon and symbol successively appear in the continuum of the semiotic progression.

173

Figure 10. Semiotic progression model (Everett, 2017): Reprint permission granted by the author.

Everett (2017) also argued a correlation between a semiotic progression and advancement of the early tools, which reminds me of Morgan et al.’s (2015) chain experiment about the transmission chain of stone tool production. According to Everett

(2017), the pebbles found in the Makapansgat cave in Africa resemble a human face, which indicates iconicity. He speculated about natural objects with visual icons that might have been collected as early as by australopithecus and homo sapiens. Everett (2017) argued that the early tools are associated with symbols but not completely arbitrary. Similar to

Cissewski & Boesch’s (2016) discussion about the context dependency in the non-human primates’ ‘displacement’ and ‘productivity’, Everett (2017) considered that, with a connection to culture, tools might have signified human activities that would have been 174 displaced from the actual context of tool use. For example, a Schöningen might have had a meaning of hunting to the early human hunters and the tool might become an indexical sign that signifies hunting.

From an archeological perspective, Shea (2017) compared tools of early humans with of non-human primates. Characterizing non-human primates’ tools with homogeneity, that is, the same material being used to make an object or an aggregate of objects, Shea

(2017) pointed out that the early humans had combined different materials with various mechanical properties to produce tools. The raw materials are assembled and used in a procedurally structured order to produce a tool, and the procedures retain a capacity for variation (d’Errico, et al., 2003). Further speculating about a possible connection between tools and language ecologically, Shea (2017) not only surmised that tool use and language involve complex and situationally variable combinations of action, but also argued that the geographical environment shapes the language and tool technology. Taking an interrelatedness between tool, language, and environment into account, he proposed the term ‘behavioral variability’. It refers to a statistically quantitative property of an archeological evidence and human behavior. Shea (2017) hypothesized that early humans used stone tool as media for symbolically encoded social messages. Relating music, tool, and symbolic behaviors, d’Errico, et al. (2003) argued that the archeological records of musical instruments are also indicative of the evolution of conscious thoughts of the early humans but how music has developed human cognition is different from the way language has done.

175

With regard to the design features of semanticity and arbitrariness, both Hockett and Fitch concluded that speech has these two features but they are absent in instrumental music. However, there are differences in understanding of semanticity between Hockett and Fitch. For Fitch who defined semanticity as “words associated with things” (p.177), both semanticity and arbitrariness are associated with words. This shows a difference in semanticity between Fitch and Hockett. I think that this discrepancy between Hockett and

Fitch leads to a problematic conclusion about vocal music. When we consider musical onomatopoeia as vocal music, for example, we may reach different conclusion about whether semanticity is a design feature of vocal music. It is a well-documented phenomenon that musicians in various cultures use musical onomatopoeia. Its use has primarily been found in societies where music has been learned and transmitted orally. For example, musical onomatopoeias like both Gueum and Kuchishoga, meaning literally

‘mouth sounds’, are verbal recitation systems used by traditional musicians in Korea and

Japan, respectively. In India, onomatopoeias are used by musicians and even dancers.

Musical onomatopoeias are based on transforming instrumental to vocal sounds. This transformation relies on the physical resemblance of instrumental and vocal sounds as well as the learned association with how to produce the sounds and to interact with the instruments. First, onomatopoeias used for musical sounds are characterized by iconicity due to the acoustic similarity between onomatopoeias and sounds of the corresponding instruments. For instance, the different vowels of Korean Gueum signify different pitch levels. Specifically, /i/ and /a/ are relatively higher than /o/ and /u/. Another feature of musical onomatopoeias is indexicality for human action. Second, human beings are acting

176 agents in musical onomatopoeias, which indicates indexicality for human action. For a shamisen, a traditional Japanese three-stringed instrument, /t/ and /tz/ of Kuchishoga syllables signify the performer’s downward movement with the plectrum on the strings. I proposed a term ‘embodied sign’, that is, the combination of iconicity and indexicality for human action.65 Another important aspect of musical onomatopoeia is that a bidirectional transformation occurs between vocal and instrumental sounds through the body. The embodied sign has the property of semanticity only when we accept Hockett’s definition.

How about semanticity of speech surrogacy? Sebeok & Umker-Sebeok (1976) discussed the semiotic nature of speech surrogate in terms of Piercean semiology.

According to them, many drum and whistle signs physically resemble their corresponding speech and the resemblance can be found in verbal elements (e.g., pitch, accent, loudness, sentence contour, etc.) as well as at different levels (e.g., phonemic, morphemic, lexical, etc.). Furthermore, the authors said, “While drum and whistle signs are frequently iconic in several respects, then they may simultaneously be indexical and symbolic as well” (p.

XVIII). Therefore, speech surrogates have the features of semanticity and arbitrariness.

Discreteness

Discreteness indicates that the segregation of regions in a physical continuum. This feature is in connection with categorical perception. In speech perception, a continuous

65 I proposed an idea of musical onomatopoeias as an embodied sign that is a combination of human-action related indexicality and iconicity at the 44th International Council of Traditional Music World conference presentation, “Sound of action: musical onomatopoeia as embodied signs. Evidence from rhythm memorization experiments”. 177 acoustic dimension is perceived as two distinctive categories (e.g., bin vs. pin). According to Hockett (1960a), humans tend to classify things into binary categories rather than continuous scales and language shows this tendency best. He argued that any language has a definite and a finite stock of phonemes, that is, basic signaling units. Hockett (1960a) said, “phonemes are not sounds, but ranges of sound quarried by quantization out of the whole multidimensional continuum of physiologically possible vocal sound” (p.414). It is worth noting that some languages use discrete pitches or discriminable pitch contours to deliver different meanings (i.e., tone language) and that others differentiate meanings depending on phonemic durations other than phonemes. My analysis of this will be discussed in connection with the following two design features of discrete pitches and isochronicity.

Discrete pitches

Fitch (2005, 2006) applied discreteness to two parameters, pitch and rhythm for his design feature of music. On the basis of Nettl’s (2000) description that notes are chosen from a scale to build melodies, Fitch (2006) interpreted scale as a set of discrete pitches and argued that song uses discrete pitches whereas speech is based on the continuously variable pitch. To me, it seems that Nettl’s note is not the same as Fitch’s pitch. For example, Will (2004) observed that the melodies of aboriginal Australian songs consist of specific basic contour units. He also argued that melodic contour units rather than stable pitches are the building blocks of song. In the classical Indian music, a note called svara is quite different from a note in Western music, where each note has its own discrete

178 frequency level based on tuning (e.g., A4 = 440Hz). In spite of its theoretical pitch position in theory, each svara has its own ways of execution with a specific gamaka (ornament), l!"g (way of taking), ucc!"r (pronunciation), etc. (Powers & Widdess, 2001). Pearson’s

(2013) analysis of Kanartic vocal lesson shows that, in practice, a svara is not discrete rather continuously varying, which is correlated with the movement of hand gestures (see fig.11). In addition, one group of tone language is based on a register system that is based on discrete pitches. For example, Yourba people can make a lexical distinction between three level tones. Further, Navajo people differentiate high- and low-level tones

(Maddieson, 2013). It seems reasonable to raise a doubt on Fitch’s (2006) consideration of discrete pitches as a design feature of music. Pointing out an idea of discrete pitch levels has been formulated on the basis of western notation systems, Will (2011) proposed a melodic contour as an alternative to discrete pitches.

Figure 11. Varying pitch in a Karnatic vocal music rendition (Pearson, 2013): Reprint permission granted by the author.

179

Isochronicity

Fitch (2005, 2006) proposed isochronicity to describe a temporal characteristic of music and discussed it as regular periodic pulse that provides a reference framework. He pointed out that isochronicity is a relative feature to speech. More importantly, he noted not only that no music is perfectly isochronous but also that some musical genres are not isochronic (e.g., a lament). Fitch (2005, 2006) concluded that in addition to discrete pitches discrete time (i.e., isochronicity) is the design feature that makes music acoustically distinguishable from language.

In this paragraph, I will discuss how language and music are related with the above three features, discreteness, discreteness in pitch and isochronicity. In sum, Hockett concluded that speech has discreteness due to phoneme, and instrumental music partly shows this feature. However, Hockett’s analysis of discreteness on instrumental music is not clear. He might have Fitch’s discreteness in pitch and isochronicity in mind. Fitch argued that discreteness exists in both vocal and instrumental music, specifically in terms of pitch and rhythm and proposed discrete pitches and isochronicity. As discussed above, it seems that Fitch’s discreteness in pitch may be applicable only for the Western music.

Discrete pitch has been found in some languages, for example, tone languages. Speech surrogates may lack discreteness due to abridgement that drops some components of phonemes that Hockett considered as a main component of discreteness of language. Then what is abridgement? Stern (1957) discussed that drum and whistle languages are abridging systems that employ sound as a sign, each of which exhibits a significant resemblance to a corresponding base speech sound. At a phonemic level, the speech sound may consist of

180 phonemes (e.g., consonant, vowel, length, or tone) that are linearly arranged, and a suprasegmental feature (e.g., intonation or stress). An abridging system keeps some acoustic resemblance to the base speech but only part of its phonemic qualities. In other words, speech surrogates frequently remove some components of phonemes (e.g., consonants) of the corresponding base speech. Speech surrogacy would not have all the components of a phoneme but may keep discreteness in pitch and/or discreteness in speech rhythm.

Displacement

Displacement means the capacity to refer something remote in time and/or space.

Displacement implies the ability to discuss what happened in the past and what may occur in the future. Hockett (1960a) argued that only a few semantic communicative systems, including language, have this property. He discussed a close link between displacement and stored information. An increased capacity for internal storage leads to an expansion of the use of displacement, the development of language, an enlargement of size and the complexity of the brain. Often displacement has been discussed with past or perfect tense forms and an existence of history. Everett (2005) challenged displacement as a universal characteristic of human language. He observed that the Pirahã language lacks a perfect tense and Pirahã history is limited within two generations. For the Pirahã, there is no past, or stored information, beyond the grandparent generation. Everett (2005) pointed out a significance of human experience in the kinship-based history of the Pirahã, what he called the principle of immediacy of experience. Pirahã history is totally dependent on the span

181 of human memory without any external aids including writing systems. This brings us a question about modes of information storage. The Pirahã may keep their knowledge in a purely auditory domain (i.e., orality & aurality) rather than in a visual domain (i.e., literacy).

Therefore, it may be worthwhile to rethink about displacement in terms of human memory in the context of orality vs. literacy.

In regard to displacement, Hockett was not conclusive about instrumental music but considered that language has this feature. Fitch accepted Hockett’s conclusion in terms of language and argued the lack of displacement in both vocal and instrumental music.

Since the message of speech surrogates corresponds to that of speech, displacement may or may not exist depending on language.

Productivity

Productivity refers to an ability to say and to understand something that has never been said or heard. Each message is novel and unique. According to Hockett (1960a), elementary signaling units carry meanings (i.e., words, morphemes) and they create something new (e.g., fictions, myths) on the basis of a certain pattern or convention (i.e., grammar). Novel messages are a combination of familiar elements arranged by familiar patterns. Hockett (1960a) also noted that a combination of productivity and displacement maximize novelty. According to Hockett (1960a), myth and novel are examples of novelty.

Hockett’s (1960a) other notion of an open system shows its association of novelty. Another important aspect of productivity is efficiency of a system. The familiar pattern and

182 convention, namely, grammar, enables a system to be efficient. In sum, productivity requires two components, novelty and efficiency. And the latter is associated with grammar.

Although both Hockett and Fitch reached a conclusion that language and music have productivity, some ethnographic reports do not seem to support them. Asserting a reformulation of the design features, Everett (2005) reported an absence of myth and fiction.

In terms of efficiency, speech surrogates do not seem to have this component. Speech surrogates deliver meanings of the corresponding base speech but sometimes becomes ambiguous of meaning due to abridgement. Let’s assume that there are three of the same drummed sounds that should have three different meanings. If this is the case, a drummer employs a technique called enphrasing. An individual lexical unit is replaced or embedded in a phrase (Sebeok & Um-Sebeok, 1976; Stern, 1957). In the Kele language, for example, the three words songe (the moon), kɔkɔ (the fowl), and fele (a kind of ) have the same tone pattern, that is, two high pitches (Carrington, 1976). When these words are drummed, therefore, two high tones are produced. When they are spoken on the drum, they are always accompanied by a little phrase in order to minimize confusion between the words (see fig.12). This seems to satisfy the convention, one of the components of Hockett’s productivity but it seems that enphrasing violates the efficiency of the system.

183

the moon : the moon [ looks down at the earth ] in Kele : songe [ li tange la manga ] drummed pitch : H H [ L H L L L L ]

the fowl : the fowl, [ the little one which says ‘kiokio’ ] in Kele : kɔkɔ [ olongo la bokiokio] drummed pitch : HH [ L H H L LHLHL ]

the fish fele : all the fele fish [ and all the mboku-fish’ ] in Kele : yafele [ la yamboku ] drummed pitch : L HH [ L L H L ]

Figure 9. Drum language of the Kele and enphrasing technique (Adapted from Carrington, Figure 12. Three examples of the Kele Drum language and enphrasing technique (Adapted 1976); H=high tone, L=low tone from Carrington, 1976); H=high tone, L=low tone; inserted phrases for enphrasing in a square bracket

OftentimesDuality it has of patterningbeen discussed that grammar is associated with efficiency and that embbeding Duality is one of of t he patternin principlesg refers for this. to theBriefly, fact embedding that elements is putting each ofa linguistic which is itself unit as meaninglesspart of another are unitcombined of the andsame produce type or a level.large numberEverett of(2005) meaningful argued elements.that he could A primary not finddifference evidence betweensupporting productivity embedding. and I willduality discuss of patterning embedding is thatin detail the former in connection concerns how with recgrammarursion in combines the following word section. to create novelty whereas the latter involves how phonemes are

combin ed in order to create a word. So both design features share similar principle, that is efficiency. Hockett (1960 a) posited “the principle of duality of patterning is a source of Duality of patterning efficiency and economy for any communicative system for which a large number of Duality of patterning refers to the fact that elements each of which is itself different meaningful signal is desired (p. 421)”. In English, for example, the phoneme /b/, meaningless are combined and produce a large number of meaningful elements. A primary for example, has no meaning of its own but distinguishes meaning of words when it is difference between productivity and duality of patterning is that productivity concerns how combined with other phonemes (e.g. beat vs. meat, but vs. put, bin vs. pin). In sum, Hockett grammar combines word to create novelty whereas duality of patterning involves how (1960a; 1960b) concluded that only language has this property and did not mention whether phonemes are combined in order to create a word. So both design features share a similar instrumental music has this or not. Fitch (2005; 2006) considered that both vocal and principle, that is efficiency. Hockett (1960a) posited “the principle of duality of patterning instrumental music does not have duality. Given to my previous discussion on productivity is a source of efficiency and economy for any communicative system for which a large

184 164 number of different meaningful signal is desired” (p. 421). In English, for example, the phoneme /b/, for example, has no meaning of its own but distinguishes the meaning of words when it is combined with other phonemes (e.g., beat vs. meat, but vs. put, bin vs. pin). In sum, Hockett concluded that only language has this property and did not mention whether instrumental music has this or not. Fitch considered that both vocal and instrumental music does not have duality of patterning.

Although Hockett’s (1960a, 1960b) productivity and duality of patterning share efficiency as their components, they differ from each other depending whether they relates to combining words or phonemes respectively. Therefore, productivity, not the duality of patterning, relates to grammar. In his semiotic progression model (see fig. 10), Everett

(2017) used Hockett’s duality of patterning as a combination principle of all different levels of linguistic units including phoneme, morpheme, grammar, etc. In other words, the use of the term ‘duality of patterning’ by Everett (2017) encompasses both Hockett’s (1960a) productivity and duality of patterning. In the semiotic progression model, the complexity of signs had been progressing from index to icon, and then on to symbol. The author argued that the duality of patterning introduces different types and/or stages of grammar so humans can construct a sentence. Specifically, people in a certain culture can assemble meaningless sounds (i.e., a phoneme) in a conventionalized order. This placing of phonemes into a meaningful word is a duality of patterning. On the basis of the principle of duality of patterning, other elements of communication, such as body gestures and intonation, are synthesized with the sounds and then it concretizes a meaning of the signs. This is compositionality. Specifically, Everett (2017) wrote “prosody (pitch, loudness, length),

185 gesture and other markers of salience (body positioning, eyebrow raising, etc.) have the joint effect of beginning to decompose the utterance, breaking it down into parts according to their pitch and gestures. Once utterances are decomposed and only then, they can be

(re)composed (synthesized), to build additional utterances” (p.213). Therefore, the acquisition of signs, the duality of patterning and compositionality are the prerequisites for a grammar of modern language. Similar to the progression of signs (i.e., index to icon, then to symbol), Everett (2017) argued that grammar has been developed at three different levels.

G1 is associated with linearity whereas G2 is based on hierarchy. Linear grammar G1 and hierarchical grammar G2 are associated with syntagmatic and paradigmatic organizations respectively (see fig.13). As shown in the fact that syntagms, also called chains, are often characterized by sequential signs by Ferdinand de Saussure, G1 is characterized by a horizontal organization of grammar. Everett (2017) emphasized the role of culture in determining of the conventionalized order of words. Paradigms are built by the associative relations of words, so it determines whether a word can be substituted by another.

Therefore, the vertical organization of the grammar (e.g., possessive case) explains the characteristic of G2. Everett (2017) pointed out that G2 lacks recursion. Recursion is a type of embedding and it is boundless. Next, Everett (2017) introduced G3 consisting of both hierarchy and recursion.

186

Figure 13. Extended duality of patterning: Linearity and hierarchy.

Then, what is recursion? In their discussion of the faculty of language, Hauser,

Chomsky, & Fitch (2002) claimed that recursion provides the capacity to generate an infinite range of expressions from the limited numbers of elements. Recursion is the embedding at the edge or in the center of an action or object, one of the same type (Parker,

2006). Hauser et al. (2002) asserted that recursion as universal grammar is a human specific characteristic among animal communication systems.66 Arguing language as a

66 The paper by Everett (2005) has led to an inextinguishable Pirahã recursion debate in the field of evolution of language. Supporters of Chomsky, Nevins, Pesetsky, & Rodrigues (2009) strongly disagreed with Everett (2005)’s conclusion that a cultural constraint established by immediacy of experience affects grammar and language. First, Nevins et al. (2009) argued that Everett’s argument on the cultural importance of grammatical and lexical properties of a language had already been acknowledged in cross-disciplinary studies of ethnosyntax. Second, Nevins et al. (2009), disputed with Everett’s (2005) report on the lack of recursion in the Pirahã language. Recursion is the most important mechanisms of universal grammar (UG) because Chomskyan linguists argued UG as a human specific property. Everett (2009) responded to criticisms by Nevins et al. (2009) while clarifying some errors in his earlier works in 1980’s and noting that they might lead Nevins et al. (2009) to misinterpret his data. With regard to immediacy of experience, Everett (2009) mentioned that it should be considered as a first proposal for describing unique properties of the Pirahã holistically and admitted that he was not clear about UG in his 2005 paper. He argued that Hauser et al. (2002) considered that recursion is not essential but optional for UG 187 cultural tool, Everett (2017) speculated that recursive thinking may have arisen early in human cognition only in a community where the ability to think recursively would be beneficial for certain situations (e.g., hunters, community defenders, or tool makers).

Parker’s (2006) discussion showed another aspect of recursion in the development of human cognition in terms of memory. According to her, recursion is associated with long- distance dependencies that require an ability to keep track or memory. For example, the subject noun of a long English sentence is hard to keep in mind. For successful retrieval of the subject noun without identical repetition of phrase, a better working memory is required.

Generativity

In connection with Hockett’s productivity and duality of patterning, generativity of

Fitch (2005, 2006) not only accepts an idea of the combinatorics of elements but also assumes that there is a rule that governs the combinatorics, namely, grammar. According to Fitch (2005, 2006), there is one universal rule in music that governs how many

and pointed out an erroneous connection between genes and UG. According to Everett, this assumes the genetic component of human nature that guarantees a core of knowledge common to all humans. Defining phonology as human knowledge concerning the patterning of meaningless linguistic elements in language, Berent (2013) suggested that phonology may present a ‘core knowledge’ system that is constrained by UG. Supporting Chomsky’s theory from the developmental perspective, Berent (2013) considered ‘core knowledge’ as knowledge of innate and asserted that universal principles not only determine an understanding of the world at the early stage of development but also shape the acquisition of mature knowledge at the later stage. Furthermore, she argued an importance of algebraic rules in phonological mind. Everett (2016) criticized cognitive nativism of Berent (2013) behind the core knowledge hypothesis and emphasized his claim that the absence of recursion is constrained by culture rather than a culture-dependent grammar. According to Everett (2016), Berent’s (2013) phonological mind assumes innateness, namely, instinct in phonological knowledge. In her reply to Everett (2016), Berent (2016) stressed that her core knowledge hypothesis has been tested and argued that Everett’s (2016) reading on her experimental results is superficial. Additionally, Berent (2016) pointed out that Everett (2016) misunderstood the relation between genotypes and phenotypes. She argued that core knowledge (e.g., UG) is associated with properties of the cognitive phenotype that can be represented in the brains of individual speakers and above all this is not identical to the genotype of the individuals. 188 combinations and permutations of a limited number of notes can be generated in an unlimited number of hierarchically structured signals. In other words, Fitch (2005, 2006) assumed two properties in the generativity of music: one is simple combinatorics of notes and the other is hierarchy as the principle of the combinatorics. In contrast, a recent work concerning the effects of cultural factors on meter perception by Kung (2017) addressed that hierarchy, especially in the temporal domain of music, is a cognitive construct based on musical literacy. The study of Kung (2017) implies an existence of other musical grammar than the hierarchical one. This reminds us of Everett’s (2017) proposal of different grammars in language (see fig. 10). He mentioned a possibility of at least three types of grammar and discussed an association between the type of grammar and cultural experience in his reflection on the Pirahã people.

Since Fitch developed generativity from Hockett’s productivity and duality of patterning, generativity seems to be the shared design feature for both vocal and instrumental music as well as language. On the contrary, for Everett, generativity may not be universally applicable in language. With regard to whether speech surrogates have generativity, it is plausible that Everett (2017) has the same conclusion for language given a note by his colleague, O’Neill (2014), “whistle speech (as with the other non-normal speech channels) is thus not considered a separate grammar but rather a systematic phonological adaptation within the standard grammar of Pirahã” (p. 368).

189

Transposability

Transposability by Fitch (2005, 2006) refers to that of a melody which is thought to be the same when it is performed on a different starting note. In music, the relationships between notes in a melody, in other words, its pitch structure or melodic contours, enable us to identify the melody. Not the absolute frequencies of the individual notes but relative relations between the notes, the pattern of note movements, contribute to melody perception. As Deutsch (2013) noted, the ability to name or produce a note of a given pitch in the absence of a reference note, so-called , is rare. With regard to transposability, Fitch concluded that all vocal and instrumental music as well as language have this feature. In terms of speech surrogate, transposability can be understood as the underlying principle of the abridging system. Meyer & Busnel (2015) discussed two different types of whistle languages depending on whether they stem from either non-tonal language or tonal language. The former is formant-based but the latter is pitch-based. The former follows pitch contours that are extracted from primarily the vowel timbre of articulated speech while the latter transposes the fundamental frequencies of spoken language. Therefore, transposability is the design feature that is shared across vocal and non-vocal music as well as speech and speech surrogate.

Performative contexts

Performative context refers to an association of particular songs and music performance with specific social events and contexts that vary from culture to culture.

Western classical music in general is appreciated in a concert hall while marching band

190 music is played at open places like a stadium before a football game or in parades. K’antu, a panpipe ensemble in the , for example, is performed at ceremonies and festivals that are associated with the annual agricultural cycles and the Christian calendar (Baumann,

1985). However, most cultures have rituals to mark important events in the course of human life (e.g., birth, marriage, death, etc.) and the rituals are often accompanied with what we call music. With regard to a performative context, Fitch concluded that music, not language, has this design feature. Meyer & Busnel’s (2015) discussion on whistled language may provide an answer for the performative context of speech surrogacy.

According to him, many of whistle languages are being used for outdoor activities, and whistled speech communications are associated with group activities like hunting, gathering, harvesting or herding. As mentioned previously, for example, the Pirahã men communicate each other via whistle speech while hunting in the forest or playing a hunting or warring game (Everett, 1985; O’Neill, 2014). Therefore, music and speech surrogates have the property of performative context even with Fitch’s narrow perspective of performance context.

Repeatability

A performative context leads to repeatability. This feature means that there are songs and music performances that are typically repeated in a proper context. When a repertoire is being established, a corresponding context can be identifiable with musical pieces. Later, Fitch (2010) emphasized repeatability that demonstrates a contrast between music and language. According to him, a simple repetition of the same speech phrase

191 violates basic conversational conventions. In contrast, repetition is the norm in music and in music performance does not lose satisfaction and violate sense of boredom. Therefore,

Fitch concluded only music has repeatability. Although performative context is important, it seems that repeatability lacks in speech surrogate. The speech surrogate related activities like hunting, gathering, harvesting, or herding do not have a fixed repertoire. These group activities aim at promoting coordination between members of a community to adapt ever- changing environments. On the contrary, ceremonial group activities as Fitch explained for music follow relatively fixed scenarios that can be repeatable.

A referentially expressive

A referentially expressive is a design feature about musical meaning. Some researchers consider music as language of emotion and find its meaning in emotional expression (see the emotion section of this chapter). Others look at the meaning of music in comparison to linguistic meaning. One of them is Cross (2001a) who argued that meanings of music are quite different from meanings in language, yet music is not meaningless because of its expressive power. In terms of evolution of musical meaning,

Cross (2001a, 2009) proposed ‘transposable aboutness’ or ‘floating intentionality’ to discuss music’s adaptive value in cognitive and cultural development based on

D’Andrade’s (2001) notion of culture. According to D’Andrade (2001), culture is a purely mental phenomenon; culture is a psychological phenomenon and hence constrained by the psychological processes of cognition and learning. Cross’ (2001a, 2009) two terms refer to the multiplicity of musical meaning. Cross (2001a) explained this in terms of human

192 development. For example, a child, during the socialization period, explores multiple meanings that collective musical activities carry in a risk-free environment through the form of play. In other words, the child learns multiple meanings of music from the contexts of when it happens. These musical meanings are collected through an embodied experience.

Therefore, music can contribute to developing cognitive flexibility and social understanding. This reflects three elements of Blacking’s (1977) humanly organized sound that is structured through human activities in a socio-cultural context, but also Meriam’s

(1964) tripartite model that has sound, behavior and conception. However, I am not convinced with Cross’ argument on the relationship between music and the context dependent meanings of signals in human development. First, children can explore multiple meanings of signals other activities rather music behaviors given discussion by Cissewski

& Boesch (2016) on non-human primates’ productivity and displacement. Second, musical meanings could be specific for some people, especially when music is related to an autobiographical memory.

Fitch (2006) approached musical meanings cautiously and confined a design feature of music, a referentially expressive, to gestural forms that include mood and movement expressions. This is in line with my proposal of an embodied sign. The best example for this feature in association with movements can be found in the , Hindustani Kathak. The rhythmic syllables, bols, are associated with a dancer’s and arm movement. In August of 2018, I attended an Indian drumming workshop at the St. Stephen’s Episcopal Church at OSU. The workshop attendees were allowed to see the rehearsal of , a concert and dance performance which was the end of a

193 workshop program. Some performers and dancers were professional. The most impressive observation to me was an improvisation between Amisha Shah, a Kathak dancer, and

Abhinav Sharma, a player. They had not worked together before. However, they recited bols once or twice during the rehearsal and successfully finished their performance.

To conclude, Fitch concluded that only music has this design feature. Language and speech surrogates deliver messages clearly as well as are non-gestural forms.

Cultural/traditional transmission

Cultural/traditional transmission is the only overlapping design feature between

Hockett and Fitch. To begin with, Hockett (1960a) argued that there are three basic mechanisms where human beings have the basic continuity of behavior patterns. The first is the gradual change of the physico-chemical environment. The second is the genetic mechanism. The third is a cultural/traditional transmission. The three mechanisms interact with each other. For example, a human cannot speak any language at birth. Human infants acquire language at a certain period early in their development. If they are deprived of the experience of language during that critical period, it will lead to an irrecoverable consequence. This implies to the involvement of the first two mechanisms. Additionally, language is one of the human behaviors that is learned. Cultural transmission relates to behaviors that are transmitted by learning and teaching. For a certain communicative system, such as language, the properties of semanticity, arbitrariness, displacement and other design features are transmitted via tradition. Hockett (1960a) argued that cultural

194 transmission has an obvious survival value in that it allows a species to learn through experience and to adapt to a new environment.

In terms of music, Fitch (2005, 2006) argued that musical styles are transmitted culturally. This is a reminder of a term ‘gharānā’ that literally means household or lineage.

In Hindustani music, a group of musicians are connected by kinship and/or discipleship, and each group is identifiable by their own unique style. Widdess (2001) said, “The musical repertory of a gharānā often includes special techniques, compositions or rāg known only to its members”. With regard to speech surrogacy, Meyer & Busnel (2015) argued that the depreciation of traditional knowledge and change of a traditional way of life threaten whistle speech. For example, the Gavião people of Rondônia in Brazil have contacted with the national community since the 1940’s. Although the Gavião language is still used, their traditional way of life drastically changed due to deforestations of the and their encounters with missionaries. Researchers found that their competency of whistle speech correlates to the intensity of practice during a user’s youth, ecological situation, and so on.

Therefore, both vocal and non-vocal types of music and language have the last design feature of cultural transmission.

Conclusion

In this section, I discussed the design features of language and music proposed by

Hockett and Fitch. I examined their proposals critically with four different categories of human acoustic communicative forms. This analysis (see Table 2 for summary) suggests that some design features need to be clarified. I found that their interrelatedness of the

195 features (e.g., arbitrariness, productivity & displacement; productivity & duality of patterning, etc.) leads to different usages of a term among scholars. There was also misunderstanding of terms (e.g., Fitch’s use of ‘note’). Ethnographic reports seem to not support an earlier interpretation of the design features. Therefore, for a generalization of the model, it is necessary to include more anthropological observations.

Nonetheless, several design features are useful to understand the evolution of human cognition via music and language. As alluded to in Everett’s semiotic progression model, the design features relating to meanings, that is, semanticity and arbitrariness seem interesting to me. On the basis of their association with Piercean sign theory, I proposed an embodied sign of musical onomatopoeia that allow us to transform between vocal and instrumental music. Above all, in music, this transformation is bi-directional. Additionally, acquisition of this musical meaning is associated with cultural transmission. Compared to music, a switch between vocal and non-vocal modes seems to be more unidirectional due to its involvement of delivery of a verbal message. Therefore, speech surrogacy seems to be an of speech.

I agree with some points made by Everett in terms of evolution of language except for his view on music. Everett considered that music has no equivalent power to language on the development of human cognition and explained this in terms of gaining complexity in signs from index to icon, and then to symbol. His semiotic progression model demonstrates that early human tools are associated with the emergence of different sign types. First, my proposal of musical onomatopoeia as embodied signs involves a different way of semiotic progression via music. Signs are not the characteristics that only language

196 has. Second, musical instruments are important tools that humans invent as shown in the debates of the first bone flutes (e.g. the Neanderthal bone flute vs. the flutes from the Hohle

Fels)

Three design features including the vocal-auditory channel, total feedback, and specialization can be used to discuss potential differences in spatial experience between vocal and non-vocal communicative forms, which I discussed in chapter 3. In future work, these features could be used to investigate the distinctive transformative power of different modes of music-making.

197

Human Music Language Design Communication Vocal Non-vocal Vocal Non-vocal Features of (Instrumental) (Speech) (Speech surrogates) Systems Language(H) & Music() Vocal-auditory channel (H) Yes (F) No; auditory not vocal (H) Yes (H) No (C) Complexity (F) Yes (F) Yes (F) Yes (F) Yes (C) Rapid fading/Transitoriness (H) Yes (F) Yes (H) Yes (H) Yes (C) Broadcast transmission Yes (F) Yes (H) Yes (H) Yes (C) /Directional reception (H) Interchangeability(H) ? (F); No (C) ? (H); No (F) Yes (H); No (E) No (E) Total feedback (H) Yes (F) Yes (H) Yes (H) Yes (C) Specialization (H) Yes (F) Yes (H) Yes (H) Yes (C) Semanticity (H) No (F); Yes (C) No in general (H); No (F) Yes (H) Yes (C) Arbitrariness (H) No (F) - (H) ; No (F) Yes(H) Yes (C)

Discreteness (H) Yes (F) Partly yes (H); Yes (F) Yes (H) No due to abridgement (C) Discreteness in pitch (F) Yes (F); Yes (F) No (F) Yes (C) 198 No for leveled pitch (C) Yes only for level tone if base language is level languages (C) tone language Isochronicity (F) Yes (F) Yes (F) No (F) No (C) Displacement (H) No (F) - (H) ; No (F) Yes, often (H); Yes (F); No Maybe no (C) (E) Productivity (H) Yes (F) Yes (H) Yes (H); Partly yes (E) Partly yes (C) : violation of efficacy Duality of patterning(H) No (F) - (H); No (F) Yes (H, E) Yes (C) Generative (F) Yes (F) Yes (F) Yes (F); No (E) No (C) Transposability (F) Yes (F) Yes (F) Yes (F) Yes (C) Performative context (F) Yes (F) Yes (F) No (F) Yes (C) Repeatability (F) Yes (F) Yes (F) No (F) No (C) A referentially expressive (F) Yes (F) Yes (F) No (F) No (C) Traditional transmission (H/F) Yes (F) Yes (H) Yes (H) Yes (C)

Table 2. Design features of language and music; C=Cheong, E=Everett, F=Fitch, H=Hockett

Do two modes of music-making transform our experience of the world differently?

In this chapter I examined the evolutionary traces of human music-making in a reverse chronological order. First, I reviewed cross-culturally how written documents looked at origins of vocal and instrumental music in human history. The writings consistently distinguish vocal music from instrumental music. Second, I discussed whether a vocal vs. instrumental (i.e., non-vocal) distinction existed even in the prehistory. Third, I examined two different modes of acoustic communication in other animal species.

The examination shows a consistency of such a distinction historically, prehistorically and comparatively. Therefore, vocal and non-vocal modes of music-making may give us some clue on how human language, music, and cognition evolved together. In terms of the origins of music, I examined basic design features in music and language. I focused on the vocal-auditory channel, total feedback, and specialization because they seem to differentiate the processing of space between vocal and non-vocal communicative forms. In connection with my discussion in chapter 3, vocal forms are associated with body space whilst non-vocal involves peripersonal space in addition to body space. For example,

Baily & Driver (1992) argued that musical instrument as a type of transducer converts the patterns of body movement into acoustic patterns and emphasized a connection between auditory and spatiomotor modes in music cognition. In line, Du Souza (2014) phenomenologically investigated the corporeal grounding of instrumental music-making in his Music at Hand and described musical instruments as transformational tools.

Therefore, for him, the acquisition of instrumental technique is a process of bodily

199 technicization. In other words, instrumental music-making shapes how we perceive, understand and imagine the world.

My second focus on design features was that of the meanings of music and language

(e.g., semanticity, arbitrariness, & a referentially expressive). Humans are the only species using vocal and non-vocal forms of communication with complexity in symbolic meanings simultaneously and interchangeably, which are not found in other animals. In the context of music learning, we combine different signs (e.g., my proposal of embodied signs), which enables us to exchange vocal sound with an instrumental one (e.g., musical onomatopoeia).

Messages delivered by speech seem to be delivered in the same way with speech surrogates in some cultures.

Therefore, both vocal and non-vocal modes of music-making and language contribute to the development of our cognitive abilities in a complex way but it seems that there are differences depending on culture and environment. This dissertation investigates whether there is any difference in vocal and non-vocal sound production in our experience of the world, and if so, then how they transform our experience of the world differently.

The answers for this question will explain our musical nirvana.

200

Chapter 6. Hear Your Touch: Experimental investigation of embodied time and space in music-making

..the body is a productive starting point for analyzing culture and self. … an analysis of perception (the preobjective) and practice (the habitus) grounded in the body leads to collapse of the conventional distinction between subject and object. From Csordas (1990, p.39-40)

We experience the world with our own bodies. We perceive time and space through our senses. It is mysterious that we experience time and space as integrated mental representations although each sensory input contributes to information that is specific to its own modality. Therefore, it is of interest to know the underlying mechanisms of how retrieved information from different sensory organs is processed (i.e., perception) and integrated in our mind.67 On the basis of information that is conveyed multisensorially, we

(re)construct time and space in a meaningful way (i.e., cognition), which is a human specific trait. In the previous chapters I have discussed the human experience of time and space mainly from theoretical perspectives. In chapters 2, 3, and 4, I have looked at time and space as psychological and cultural constructs. In chapter 5, I not only discussed transformative powers of music-making on our experience of time and space but also

67 According to Pike, & Edgar (2005) who distinguish sensation from perception, sensation is the “initial detection of signals” whereas perception is “the process of constructing a description of the surrounding world” (p.73). In other words, sensation is purely bottom-up processes whereas perception involves both bottom-up and top-down processes. 201 speculated an evolutionary implication of two different modes of music-making. Here in chapter 6, I report my experimental findings that support my arguments in the previous chapters empirically via two experiments of the Hear Your Touch project. The aim of this project is to explore the relationship between different behaviors in music performance (i.e., singing vs. instrument playing) and our experience of time and space. In particular, I investigated the effects of cultural factors (e.g., being a non-musician vs. being a musician; being a singer vs. being an instrumentalist) regarding audio-tactile integration.

Until today, multisensory studies have been conducted primarily on visual pairings

(for review, see Kitagawa & Spence, 2006; Spence, 2013). The earliest report on audio- tactile integration can be found in Hirsh & Sherrick’s (1961) cross-modal study where the task of temporal order judgment was used to investigate the temporal aspects of multisensory perception.

After Hirsh & Sherrick’s (1961) classical study on multisensory integration, several research teams have paid attention to audio-tactile integration and few have investigated its role in spatiotemporal processing on top of it. Their results have demonstrated behavioral inconsistency depending on task type (e.g., simple reaction time vs. temporal order judgment), acoustic stimulus types (e.g., white noise vs. pure tone), location of tactile stimulation (e.g., hand vs. ear lobe), involvement of attention, etc. (Adelstein, Begault,

Anderson, & Wenzel, 2003; Murray et al., 2005; Zampini, Brown, Shore, Maravita, Röder,

& Spence, 2005; Kitagawa & Spence, 2006; Zampini, Torresan, Spence, & Murray, 2007;

Fujisaki & Nishida, 2009; Tajadura-Jiménez, Kitagawa, Valjamae, Zampini, Murray &

Spence, 2009; Occelli et al., 2010; Occelli, Spence, & Zampini,2011; Spence, 2013).

202

Using a simple reaction time (SRT) paradigm, Murray et al. (2005) and Zampini et al. (2007) investigated the effect of spatial factors (e.g., audio-tactile stimuli being presented from the same vs. different location) on the Redundant Signal Effect (RSE). RSE is the facilitated reaction time for a multisensory presentation compared to unisensory stimuli. These studies tested whether redundant spatial information modulates RSE, however, none of them reported an effect of spatial factors. This contrasts with the spatial modulation of RSE in audio and visual paired stimuli (e.g., Gondan, Niederhaus, Rösler,

& Röder, 2005).

Zampini et al. (2005) introduced a spatial factor in their audio-tactile temporal order judgment (TOJ). Similar to the SRT experiments, they reported no spatial factor effect.

However, they observed a difference in response accuracy depending on the level of participants’ experience of the task. TOJ experiments provide data for two theoretical parameters, the Point of Subjective Equality (PSE) and the Just Noticeable Difference (JND) on the basis of the psychometric function. PSE, known as an absolute threshold, is the value that a comparison stimulus is equally likely to be judged as higher or lower than that of a standard stimulus. JND, known as a difference threshold or a difference limen, is the smallest difference between two stimuli that can be consistently and accurately detected at least fifty percent of the time (APA Dictionary of Psychology). Zampini et al. (2005) found that experienced participants had lower JND than those who were inexperienced.

The studies mentioned above concern external spatial factors. According to my discussion of body space in chapter 3, there is that postural schema. It gives us spatial experience arising from inside of the body. Therefore, an internal spatial factor was

203 considered in the Hear Your Touch project. How can I introduce an internal spatial factor in experiments? Yamamoto & Kitazawa (2001) showed how the postural and superficial schemata interact with each other by investigating effects of different arm postures (e.g., crossed vs. uncrossed arms) on tactile TOJ. They found that crossed arms lead to incorrect judgment. This tendency of reversed order for the crossed posture (i.e., incorrect response in TOJ) indicates a failure in the processing of a tactile signal delivered from outside of the body. This may due to the remapping of touch in external space. Landry & Champoux

(2017) studied the effect of crossed arms on tactile TOJ in musicians and non-musicians and found that musicians are faster but less correct than non-musicians.

To date, a few studies have examined the effects on musical training on audio- tactile integration. A study by Kuchenbuch, Paraskevopoulos, Herholz, & Pantev (2014) investigated how musical training affects the interaction of musically related auditory and tactile cues. The authors reported musicians are better at detecting incongruent audio- tactile stimuli than non-musicians. Noting that previous studies about effects of musical training on audio-tactile integration exclusively have used music-related stimuli (e.g., positioning fingers on musical instruments; Brown & Goodale’s (2013) near-tool effect ),

Laundry & Champoux (2017) explored whether musical training modulates audio-tactile integration at a behavioral level even when an experimental setting is not music-related.

However, their study did not include external or internal spatial factors. To the best of my knowledge, there is no study that explores the effects of different types of musical training

(i.e., singing vs. playing an instrument) on spatiotemporal processing in terms of audio- tactile integration.

204

Therefore, I experimentally explored the potential differences in the spatiotemporal processing of three participant groups (e.g., vocalists, instrumentalists, and non-musicians as a control). The primary questions include: 1) do musicians respond to multisensory audio-tactile inputs differently compared to non-musicians and 2) do instrumentalists respond to multisensory inputs differently from vocalists. I focus on two psychological building blocks of time: 1) event detection and 2) perception of temporal order. On the basis of my discussion about potential differences between singing and instrument playing bodies (see Table 1), I focus on subcomponents of two features of peripersonal space, that is, audio-tactile coupling and hand-centered specificity, and their interaction with body space. With uni- or -sensory auditory and tactile stimulus presentation, the simple reaction time explores the effects of musical training on event detection and spatial factors, whereas the temporal order experiment investigates the effects of musical training on perception of temporal order and spatial factors.

Simple Reaction Time (SRT) Experiment

Simple reaction time is the time taken by an experiment participant to detect and respond to a stimulus without any cognitive demand. Given the Redundant Signal Effect phenomenon, that is, the facilitatory effect of multimodal inputs compared to unimodal ones, I hypothesize that if instrumental training contributes to a different audio-tactile representation of perihand space (i.e., peripersonal space near hand), compared to non- musicians and vocalists, instrumentalists would show different RSE.

205

Why Redundant Signal Effect (RSE)? It is calculated as the difference in reaction time between unisensory and multi-sensory stimuli. In other words, any change in a response to unisensory and/or multi-sensory stimuli affects RSE. Therefore, I will perform an RSE analysis because it is indicative of any modulative effect of unisensory and/or multi-sensory processing.

Methods

Participants

Participants were classified as musicians if they satisfied all of thee three criteria:

1) self-identification as a musician, 2) at least seven years of formal musical training, and

3) ongoing active musical engagement at the time of experiment participation. Any participant who did not meet all three criteria was therefore categorized as a non-musician.

Among musicians, singers were selected if they met either criteria: 1) self-identification as a singer or 2) primary musical activity as singing at the time of participation even if their identification is both a singer and an instrumentalist.

Thirty-three participants took part in the SRT experiment which took approximately 70 minutes to complete. The participants include 12 non-musicians (4 males;

11 right-handed; avg. age = 24.09 years, min. 27, max.48), 12 instrumentalists (8 males;

11 right handed; avg. age = 28.33 years, min. 19, max. 37), and 10 singers (4 males; 10 right handed; avg. age = 22.5 years, min. 18, max. 31). Instrumentalists reported their primary instruments including (1), percussion (4), (1), piano (1),

(1), (2), (1) and trumpet (1). The participants completed a self-

206 reported questionnaire about musical training. All participants were from the Columbus,

Ohio area and reported normal hearing and touch. Having been provided a written experimental procedure, the participants gave verbal consent to proceed with the experiment. Experimental participation was compensated with $10. This research was approved by The Ohio State University Institutional Review Board for Human Research.

Stimuli

Participants were presented with four unisensory and four multisensory stimuli: 1) left auditory only, 2) right auditory only, 3) left tactile only, 4) right tactile only, 5) simultaneous left audio-tactile (left aligned), 6) simultaneous right audio-tactile (right aligned), 7) simultaneous left auditory and right tactile (right misaligned), and 8) simultaneous right auditory and left tactile (left misaligned) (see fig.14). The 8 configurations of stimuli presentation were counterbalanced. Left- and right- sided stimuli were separated by 100 ° within azimuth.

Figure 14. Sixteen stimulus configurations of the SRT experiment 207

Equipment

A stimulus presentation was controlled by a microcontroller Arduino Uno Rev 3

(https://store.arduino.cc/usa/arduino-uno-rev3), and the participants’ responses were registered to an additional microcontroller (see fig. 15). I used these because commercially available software and hardware as well as open source applications were found to contain a serious timing error of stimulus presentation and response registration. Each microcontroller was connected independently to two laptops. For stimulus presentation,

Arduino IDE 1.8.5 was running on Windows 8.1. (Toshiba Satellite S55-B, i7, 2.5 GHz,

64 bit). For registration, Arduino IDE 1.8.5 was operated on Mac OS High Sierra

(MacBook Pro Intel core Duo, 2.66GHz).

Figure 15. Presentation and registration microcontrollers for SRT experiment

Auditory stimuli were 15 ms random noise bursts generated by the presentation microcontroller. Left and right sound stimuli were delivered independently through

208 speakers (Bose Companion 3 series II system). The amplitude of the speakers was adjusted to a comfortable level for each participant.

Fifteen ms-long tactile stimuli were delivered to the participant’s both hands by two tactile stimulation transducers (BIOPAC TSD190). The tactile stimulators were driven by

5V TTL pulses that were generated by a presentation microcontroller. A BIOPAC TSD

190 delivers a touch pulse by moving a metal rod that is located in the middle of the pad

(see fig 16). Since this experimental set-up was new in the cognitive ethnomusicology lab,

I had to test equipment meticulously. When the metal rod of a TSD 190 was not completed covered and moved, it produced a clicking sound. I used this to test the timing of a stimulus presentation. I recorded the sounds, both a random noise from the speakers and a clicking sound from a TSD 190 while presenting simultaneous sound-touch conditions. From this equipment testing, I found that two TSD 190s deliver tactile pulses with a temporal delay of 14.10 ms (SD = 0.92ms) and 20.46 ms (SD =1.85). Therefore, I adjusted the stimulus presentation timing (i.e., presenting TTL pulses for tactile stimulators 14 ms or 20 ms prior to the onset of auditory stimuli) in order to achieve simultaneity between sound and touch for multisensory conditions.

Figure 16. BIOPAC tactile stimulator TSD 190

209

Procedure

Participants were seated comfortably in a dark room, and they either closed their eyes or wore a disposable eye mask to minimize the interaction of other sensory inputs

(e.g., lights). They also wore ear plugs to cancel out potential noise produced by the rod movement of TSD 190; then the speaker volume was adjusted to the detectable level to auditory stimuli. Each subject completed one practice block and eight experimental blocks.

A practice block has 40 trials. One experimental block consists of 160 trials. Each of the eight stimulus configurations was randomly presented 20 times in one block. For four blocks, participants placed their arms in a normal, uncrossed position. For four blocks, participants crossed their arms. The arm postures were counterbalanced within a subject.

The inter-stimulus interval varies randomly between 750 ms and 2500 ms. The participants placed their hands near the speakers on two wrist rest pillows. Subsequently, they were requested to respond as soon as possible after detecting any type of stimuli. The participants pressed a response button on a BBTK response pad (https://www.blackbox toolkit.com) with their dominant foot (see fig.17). Timing information of both stimulus presentation and participants’ reaction time were recorded to the other registration microcontroller (see fig.

15). After the experiment, participants had a short debriefing interview.

210

Figure 17. Simple Reaction Time experiment setup

Results

Reaction times are onset differences between presented stimuli and participants’ responses. Reaction times (henceforth RT) below 75 ms and above 1000 ms were eliminated (Whelan, 2008). Two analyses were performed: 1) ANOVA with means of individual RTs and 2) General Linear Mixed Model (GLMM) analysis. ANOVA is a useful statistical tool when comparing three or more independent variables, however, when the data are not balanced (i.e., unequal number of responses for different conditions), the F- statistics become unreliable and, for a complex model like mine, it is difficult to interpret.

ANOVA also assumes a normal distribution of the depended variable. In general, RT data are positively skewed so do not satisfy the normality assumption, which may lead to an incorrect interpretation. Therefore, I performed GLMM analysis with raw RT data. The benefits of a GLMM analysis are 1) multi-level analysis, 2) normality of response data not required, and 3) random variability in responses of individuals. To characterize the 211

Redundant Signal Effect, I performed three additional analyses including the redundancy gain, the race model inequality test, and the cumulative probability analysis using 10 ms time bins.

ANOVA

I performed a repeated measurement ANOVA with within-subject factors of 1)

ARM (crossed vs. uncrossed), 2) LOCATION (left vs. right), 3) MODALITY (auditory only, tactile only, spatially aligned multisensory audio-tactile, or spatially misaligned multisensory audio-tactile), and between-subject factors of 1) GENDER (female vs. male), and 2) STATUS (instrumentalist, vocalist, or non-musicians).

2 The main factor of MODALITY (F3, 84 = 178.24, p <0.0001, effect size ηp = 0.98),

2 and the interaction terms LOCACTION:MODALITY (F 3,84 = 8.71, p = 0.001, ηp = 0.24)

2 and STATUS:MODALITY (F 6,84 = 2.21, p = 0.049, ηp = 0.64) turned out to be significant at a= 0.05. For a full ANOVA result, see Appendix B.

Factor F-value P-value MODALITY F 3,84 = 178.244 <0.0001 LOCATION: MODALITY F 3,84 8.715 0.0001 STATUS: MODALITY F 6,84 = 2.211 0.049

Table 3. Simple Reaction Time (SRT) experiment ANOVA summary, showing only the main factors and interaction terms significant at a=0.05.

A post-hoc analysis was performed on the significant factors. The p values were adjusted with the Bonferroni correction. For the factor MODALITY, the estimated mean

RTs for the spatially aligned and for the misaligned multisensory conditions were 268 ms 212

[CI: 249-287 ms] and 268 ms [CI: 250-287 ms] respectively. The estimated mean RTs for the unisensory auditory and for the tactile conditions were 289 ms [CI: 270-308 ms] and

341 ms [CI: 323 -360 ms] respectively. There were significant differences between the unisensory and multisensory conditions as well as between the two unisensory conditions.

RT differences were significant between the aligned multisensory and the unisensory auditory conditions (t = -5.53, p < 0.0001) as well as between the misaligned multisensory and the unisensory auditory conditions (t = -5.42, p < 0.0001). Significant RT differences were found between the aligned multisensory and the unisensory tactile conditions (t = -

19.41, p < 0.0001) and between the misaligned multisensory and the unisensory tactile conditions (t = -19.30, p < 0.0001). The RT difference between the two unisensory conditions was 52.46 ms (t = -13.88, p < 0.0001).

Figure 18. Reaction time (RT) for the factor MODALITY of SRT experiment

213

For the unisensory auditory condition, there was a significant LOCATION effect (t

= 4.90, p <0.0001). The estimated mean RTs for left and right LOCATION were 293 ms

[CI: 274-312 ms] and 285 ms [CI: 266-304 ms] respectively. For all other stimuli,

LOCATION differences were not significant.

Figure 19. RT for LOCATION:MODALITY interaction of SRT experiment

General Linear Mixed Model Analysis

I performed a general linear mixed model (GLMM) analysis on raw RT data with an inverse gaussian distribution and an identity link function (Lo & Andrews, 2015) using the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in R (R Core Team, 2018).

The inverse gaussian distribution matches reasonably to a unimodal skewed distribution with continuous responses greater than 0, which is characteristic of raw RT data. I used the identity link function because it yields coefficient estimates that are on the original scale 214 of raw data. If I choose link functions that transform the data (e.g., log), then the results would be difficult to interpret. In order to simplify my model I did not include the factor

GENDER that turned out not to be significant in the ANOVA. As fixed effects I include 1)

ARM, 2) LOCATION, 3) MODALITY, and 4) STATUS, and as random effect, I used the subject variable that is nested in STATUS. The best parsimonious model is:68

Reaction Time = β0 + β1*STATUS + β2*LOCATION + β3*MODALITY+ β4*STATUS:MODALITY +β5*ARM:MODALITY +β6*LOCATION:MODALITY + β7*STATUS:ARM:MODALITY + β8*STATUS:LOCATION:MODALITY + ε

Differently from ANOVA where MODALITY, LOCATION: MODALITY and

STATUS: MODALITY were significant, GLMM found three main factors (STATUS,

LOCATION, and MODALITY); two two-way interaction terms (STATUS: MODALITY and LOCATION:MODALITY); and two three-way interaction terms (STATUS:

ARM:MODALITY and STATUS:LOCATION: MODALITY) to be significant at a =0.05

Results of significant coefficient estimates contributing to the model are shown in Table 4.

68 The R command used for the model was: MODEL = glmer(Reaction Time ~ STATUS + LOCATION + MODALITY + SATUS:MODALITY + ARM:MODALITY + LOCATION:MODALITY + STATUS:ARM:MODALITY + STATUS:LOCATION: MODALITY+ (1|SUBJECT) + (1|SUBJECT:STATUS), family = inverse.gaussian(link = "identity"), data = experiment1) 215

Fixed effect estimate t value p value Intercept (β0) 292.585 108.405 < 0.0001 STATUS vocalist 27.347 7.248 < 0.0001 LOCATION right -3.597 -3.202 0.001 MODALITY auditory 21.160 12.578 < 0.0001 MODALITY tactile 58.745 34.178 < 0.0001 STATUS vocalist : MODALITY auditory -6.478 -2.555 0.010 STATUS non-musician : MODALITY tactile 15.310 5.976 < 0.0001 STATUS vocalist : MODALITY tactile 26.109 9.374 < 0.0001 ARM crossed : MODALITY aligned_multi - 3.109 -2.436 0.014 ARM crossed : MODALITY auditory 4.200 2.991 0.002 LOCATION right : MODLAITY auditory -6.812 -3.739 <0.001 STATUS non-musician : ARM crossed : MODALITY aligned_multi 5.097 2.616 0.008 STATUS non-musician : ARM crossed : MODALITY auditory -7.373 -3.295 <0.001 STATUS vocalist : ARM crossed : MODALITY auditory -5.864 -2.912 0.003 STATUS non-musician : LOCATION right : MODALITY auditory 5.563 2.402 0.016 STATUS non-musician : LOCATION right : MODALITY tactile 9.358 3.530 <0.001 STATUS vocalist : LOCATION right : MODALITY tactile 6.282 2.318 0.020

Table 4. Significant coefficient estimates of the parsimonious GLMM of SRT experiment

A post-hoc analysis was performed on significant factors. For the factors of

STATUS and MODALITY, there were significant differences between vocalists and two other participant groups for the all multisensory and unisensory conditions. For spatially aligned and misaligned multisensory stimuli, vocalists showed the same estimated mean

RT [318 ms, CI: 309-326 ms]. For spatially aligned multisensory stimuli, the estimated mean RTs of instrumentalists and non-musicians were 289 ms [CI: 284-295 ms] and 287 ms [CI: 276-297 ms] respectively. For spatially misaligned multisensory stimuli, the estimated mean RTs of instrumentalists and non-musicians were 290 ms [CI: 284-296 ms] and 287 ms [CI: 276-298 ms] respectively.

216

For unisensory auditory stimuli, an estimated RT mean of vocalists was 330 ms [CI:

322-339 ms]. The estimated mean RTs of instrumentalists and non-musicians were 311 ms

[CI: 305-317 ms] and 308 ms [CI: 297-319 ms] respectively.

For unisensory tactile stimuli, there were significant differences across three participant groups. An estimated mean RT of instrumentalists was 349 ms [CI: 343-355 ms]. The estimated mean RTs of non-musicians and vocalists were and 362 ms [CI: 351-

373 ms] and 404 ms [CI: 395-413 ms] respectively.

Figure 20. RT for STATUS:MODALITY interaction of SRT experiment

For the two multisensory and the unisensory auditory conditions, there were significant differences between vocalists and two other participant groups regardless of arm postures. 217

For the unisensory tactile condition, there were significant differences across three participant groups only when participants crossed their arms. An estimated mean RT of instrumentalists was 350 ms [CI: 344-365 ms]. The estimated mean RTs of non-musicians and vocalists were 364 ms [CI: 353-375 ms] and 406 ms [CI: 397-415 ms] respectively.

When participants uncrossed arms, significant differences were found between vocalists and two other participant groups.

For the factor ARM, instrumentalists showed a significant difference between crossed and uncrossed arms for the spatially aligned multisensory condition (Z = 2.43, p =

0.01) and for the unisensory auditory condition (Z = -2.99, p < 0.01). Vocalists showed a significant difference between crossed and uncrossed ARMs for the spatially misaligned multisensory condition (Z = 2.42, p = 0.01). Non-musicians showed a significant difference between crossed and uncrossed arms (Z = 1.78, p = 0.03) for the unisensory tactile condition.

Figure 21. RT for STATUS:ARM:MODALITY interaction of SRT experiment

218

For the unisensory auditory condition, there was a significant difference between left and right LOCATIONs (7.39 ms, Z = 7.45, p < 0.0001). Vocalists showed a significant difference between the two LOCATIONs only for the unisensory auditory condition (Z =

3.98, p < 0.001). Instrumentalists showed a significant difference in the two LOCATIONs for the spatially aligned multisensory condition (Z = 3.20, p < 0.01) and for the unisensory auditory condition (Z = 6.74, p < 0.0001). Non-musicians showed a significant difference for the unisensory auditory condition (Z = 2.73 p < 0.01) and for the unisensory tactile condition (Z = -2.88, p < 0.01).

Figure 22. RT for STATUS:LOCATION:MODALITY interaction of SRT experiment

Redundant Signal Effect

Redundant Signal Effect (RSE) was examined in three ways including 1) redundancy gain, 2) race model inequality, and 3) the cumulative probability analysis using

10 ms time bins. Redundant gain is a simple difference in RT between unisensory and 219 multisensory stimuli, so provides only descriptive information about RSE. In order to explain the RSE, two models have been developed. One model is known as the race model and the other is the coactivation model. Explaining the RSE as a statistical phenomenon, the former assumes that stimuli from different modalities are processed independently by separate channels. The latter posits some form of integrated processing that leads to speed gain. Miller (1982) proposed the race model inequality (RMI) to test whether RSE gain exceeds facilitation predicted a probability summation. While the RMI informs us whether the race model is supported or violated, cumulative probability analysis provides details about time points at which the race model is violated and is useful for group comparison.

Redundant Gain

Redundant gain (RG) is the difference in RT between for the audio-tactile multisensory and for the fastest unisensory conditions (see fig. 23). For unisensory stimulation, only one non-musician participant showed a faster RT for the tactile condition than for the auditory condition. The instrumentalists’ average RTs for multisensory and unisensory stimuli were 264.71 ms (SD = 51.51 ms) and 304.82 ms (SD = 51.96 ms), respectively. Vocalists responded to multisensory stimuli at 272.60 ms (SD = 62.91 ms) and to unisensory stimuli at 321.66 ms (SD = 63.67 ms). For non-musicians, RTs for multisensory and unisensory stimuli were 269.32 ms (SD = 39.97 ms) and 317.05 ms (SD

= 39.08 ms). Average RGs were 21.55 ms (SD = 21.47 ms), 19.80 ms (SD =16.93 ms), and

15.29 ms (SD = 31.05 ms) for instrumentalists, non-musicians and vocalists respectively.

There was no significant difference in RG across groups.

220

Multisensory Unisensory auditory Unisensory tactile

Redundancy gain

instrumentalist non-musician vocalists

Figure 23. Redundancy gain plot: RT for multisensory and unisensory conditions in three participant groups

Race Model Inequality

According to the race model proposed by Raab (1962), when multiple stimuli are presented each stimulus is detected separately. Thus it is also known as a separate activation model. A response is triggered as soon as the first stimulus is detected. The RT is determined by the latency of a single detection process for unisensory stimuli whereas it is determined by the faster one of two stimuli for multisensory stimuli (Ulich, Miller, &

Schröder, 2007). In short, signals from the different sensory modalities increases the likelihood of a faster motor response because each modality races to the behavioral task demand. In contrast, the co-activation model assumes that signals from different modalities contribute jointly to a common pool of activation before the initiation of motor response

(Miller, 1982).

221

The violations to the race model can be tested by the Race Model Inequality (RMI) proposed by Miller (1982). The RMI test is based on that the race model makes a prediction about the probability distributions of RT. Therefore, the RMI analysis can be tested by comparing combined the probability of RT for unimodal conditions with probability of RT for multimodal conditions. I have tested violations to the race model using the RIMTest algorithms provided by Ulrich et al. (2007). For this, individual RTs for auditory and tactile conditions were combined and sorted by ascending order. Individual RTs for multisensory audio-tactile condition were also arranged in an ascending order. Then the percentiles for two RTs of 1) audio-tactile combined and 2) multisensory were divided in 0.05, 0.15, 0.25,

0.35, 0.45, 0.55, 0.65, 0.75, 0.85 and 0.95 percentiles. For each percentile bin, the combined values for audio-tactile stimuli were compared with the values for multisensory stimuli using t-tests. The RMI analysis revealed violations only for instrumentalists. I found the significant violations of inequality at 0.05 (t11 =2.84, p = 0.01) and 0.15 (t11=1.90, p = 0.08) percentiles a=0.01.

Figure 24. Predicted multisensory facilitation violation of the RMI for instrumentalists: Significant violation (p < 0.1) marked by (*) 222

Cumulative probability analysis

Compared to the RMI test, which is based on the percentile bins, this analysis is based on RT bins (Landry & Champoux, 2017). An advantage of this analysis is that I can specify not only which time bin shows RSE in a group but also which time windows show a difference in RSE between groups. It is because each time bin provides independent measurements so I could look at differences between the joint probability of unisensory

RT and the probability of multisensory RT between groups. For a joint probability, I calculated an individual the cumulative probability of a RT for auditory and tactile conditions each at each 10 ms time bins between 80 ms and 1000ms. An audio-tactile joint probability is subtraction of the product of the probability for the auditory and the tactile condition from the sum of the probability for the auditory and the tactile (i.e., p(auditory) + p(tactile) – p(auditory) × p(tactile)). For a multisensory probability, I calculated an individual the cumulative probability of RT for the multisensory condition at each 10 ms time bin between 80 ms and 1000 m. Then the joint probability is subtracted from the multisensory probability at each reaction time bin, which is indicative of RSE.

The mean cumulative probabilities of the joint and multisensory conditions for instrumentalists, vocalists, and non-musicians were calculated. Lastly, an independent sample t-test was performed between the response probability in the joint and multisensory conditions for each group at all 10 ms time bins in order to identify group differences.

There was no significant difference in the joint and multisensory probabilities for instrumentalists. However, I found that significant differences in the joint and multisensory probabilities for vocalists at 80 ms (t = 2.29, p= 0.04), 90 ms (t = 2.38, p= 0.04), 100 ms (t

223

= 2.45, p= 0.03), 110 ms (t = 2.49, p= 0.03) , 120 ms (t = 2.49, p= 0.03) , 130 ms (t = 2.46, p= 0.03), 140 ms (t = 2.38, p= 0.03) , and 150 ms (t = 2.26, p= 0.04). This indicates significant effects in the joint probability of unisensory responses, rather than RSE.

Figure 25. Difference in the joint and multisensory cumulative probability for vocalists with standard error bars. Significant probability (p < 0.05) for multisensory ( > 0) or unisensory (<0) responses.

For non-musicians, the significant differences between the joint and multisensory probabilities are at 100 ms (t = 2.19, p= 0.04), 110 ms (t = 2.28, p= 0.04) , 120 ms (t = 2.37, p= 0.03) , 130 ms (t = 2.46, p= 0.02), 140 ms (t = 2.54, p= 0.02) , 150 ms (t = 2.25, p=

0.02), 160 ms (t = 2.57, p= 0.02), 170 ms (t = 2.47, p= 0.02) , and 180 ms (t = 2.26, p=

0.03). This indicates significant effects in the joint probability of unisensory responses, rather than RSE.

224

Figure 26. Difference in the joint and multisensory cumulative probability for non- musicians with standard error bars: Significant probability (p < 0.05) for multisensory ( > 0) or unisensory (< 0) responses.

In terms of group comparison, I found a significant difference between the joint and multisensory probabilities between instrumentalists and non-musicians at 200 ms (t =

2.36, p= 0.02) and 210 ms (t = 2.37, p= 0.02).

Figure 27. Group difference between instrumentalists and non-musicians (subtraction of probability difference between the joint and multisensory responses for non-musicians vs. that for instrumentalists: significant differences marked by (*)) 225

Discussion

The main aim of my simple reaction time experiment was to study the effect of specific musical training on participants’ reaction times to unisensory (auditory vs. tactile) and multisensory stimuli near and on hands while taking two external spatial factors (i.e.,

ALIGN and LOCATION) and one internal one (i.e., ARM) in consideration. I performed two different analyses on the data including ANOVA and GLMM. The differences between

ANOVA and GLMM are shown in their outputs. Compared to ANOVA where there were only MODALITY, LOCATION: MODALITY, and STATUS:MODALITY turned out to be significant, GLMM not only showed significant effects of STATUS and LOCATION in addition to MODALITY but also revealed more interaction effects including

LOCATION:MODALITY, STATUS:MODALITY, ARM:MODALITY, STATUS:ARM:

MODALITY and STATUS:LOCATION:MODALITY. Compared to GLMM, several main factors and interaction terms were not significant in ANOVA because it requires more assumptions to be satisfied, thus I had to use reaction time means. If I had a larger sample size (e.g., about 30 participants in each group), ANOVA would have given a more reliable output.

To summarize the results of simple reaction time experiment, there was a significant MODALITY effect. A significant group difference was found in tactile stimulation. Instrumentalists showed the fastest reaction time whereas vocalists were the slowest. Concerning the spatial factors, instrumentalists showed ARM and LOCATION effects for the unisensory auditory and the spatially aligned multisensory condition while non-musicians showed ARM and LOCATION effects only for the unisensory tactile

226 condition. However, there was no ALIGN effect. With regard to Redundant Signal Effect, all participant groups showed a reaction time facilitation for multisensory stimulation but redundancy gain differences among the groups were not significant. Only instrumentalists showed a significant violations of race model predictions for the first two percentile bins

(i.e., 0.05 & 0.15 bins). Additionally, the cumulative probability analysis revealed a significant difference in Redundant Signal Effect between instrumentalists and non- musicians at 200 ms and 210 ms bins.

I found a significant difference among the three groups for the tactile condition only.

This is in line with Landry & Champoux’s (2017) finding. Using a comparable simple reaction time task, Landry & Champoux (2017) measured auditory, tactile, and audio- tactile multisensory reaction times between musicians and non-musicians. They reported significantly faster reactions in musicians for unisensory auditory, unisensory tactile, and multisensory conditions. In my experiment, however, though instrumentalists showed the fastest mean reaction times, the differences among the three participant groups were not significant for unisensory auditory and multisensory conditions. This may be due to experimental differences. In this study, I included three spatial factors which lack in the study of Laundry & Champoux (2017). In their experiment, sounds were always presented bilaterally but tactile stimulation was presented to participants’ left hands only. Further, participants were asked to respond to stimuli with right hands. In my experiment, participants responded with their dominant foot. Spatial factors in my experiment requires an involvement of additional processes to perform the task, which may yield different results for unisensory auditory and multisensory conditions. In addition, most instrument

227 training requires multisensory perception that is coupled with motor movement of fingers and hands. Therefore, musicians in the Laundry & Champoux study (2017) may reflect this. In contrast, the benefits from hand and finger motor are not considered in my experiment. At any rate, this study is the first report about modulative effects of instrumental training, not vocal training, on tactile RT.

In terms of an interaction of the spatial factors, only instrumentalists showed a significant ARM effect for the spatially aligned multisensory and the auditory conditions.

This suggests that, when both auditory and tactile inputs are available, instrumentalists use spatial information arisen from the body when locating stimulus sources. Additionally, absence of tactile inputs would not affect the instrumentalists’ use of internal spatial information. Instrumental training seems to establish a strong association especially between audio-tactile perception and bodily movements. In an action-perception feedback loop, the instrumentalists may internalize matching external and internal spatial cues.

Kuchenbuch et al. (2014) also noted the musical training effects in multisensory and auditory conditions in their study on audio-tactile integration. In their study, the participants were presented with five-tone melodies with synchronous stimulations on their fingertips. The tones matched with specific fingers based on fingerings. They had four different conditions including 1) audio-tactile congruent (all five tones matches corresponding fingertips), 2) audio-tactile incongruent (one of five tones does not match corresponding fingertips), 3) tactile deviant (one of five tactile stimulation is delivered to the second phalanx rather than the finger tip), and 4) audio deviant (one of 5 tones is shifted to sawtooth timbre). Using a magnetoencephalography (MEG) imaging technique, the

228 authors found not only that, compared to non-musicians, musicians were more accurate in identifying incongruent audio-tactile stimuli but also that the musicians’ greater neural response to incongruent stimuli is associated with the left uncus, the left premotor gyrus, and the left cerebellum. Kuchenbuch et al. (2014) found an auditory mismatch response in musicians in the left superior temporal gyrus and the left cuneus that extends to the left medial occipital gyrus. According to the authors, the influence of musical training in the audio-tactile and in the auditory conditions indicates enhanced higher-order processing.

Kuchenbuch et al. (2014) further argued that musical training not only stabilizes bottom- up sensory processing but also modulates top-down processing with a particular emphasis on multi-sensory integration.

On the basis of Kuchenbuch et al. (2014), my experiment provides behavioral evidence for modulatory effects of instrumental training on multi-sensory integration with a specific emphasis on spatial processing. Then how could instrumentalists recruit external and internal spatial information and use the inputs for source localization only with auditory signals? This may be because kinesthetic imagery, rehearsal strategies predominantly found in professional musicians to complement active training, plays a role in using multisensorial spatial cues mentally in terms of a specific arm posture given

Lotze’s (2013) argument is that musicians’ mental imagery is not necessarily specific to auditory, tactile, visual, or motor aspects of imagery but rather integrates different systems.

In terms of the facilitation effect of multisensory information, in my experiment, instrumentalists’ violation of the race model indicates that their faster reaction times for multisensory stimuli may be due to co-activation. The cumulative probability analysis

229 showed that instrumentalists have an advantage for multisensory stimuli compared to non- musicians in a time window between 200 ms and 220 ms. Using an MEG, Schultz, Ross,

& Pantev (2003) studied the effects of musical training on audio-tactile processing with trumpet players. They had three experimental conditions including three unimodal conditions (i.e., the lower lip stimulation, index finger stimulation, or trumpet tone presentation), and two multisensory conditions (i.e., trumpet tone with lower lip stimulation vs. trumpet tone with index finger stimulation). In line with Kuchenbuch et al.

(2014), Schultz et al. (2003) did not find a significant tactile effect but observed the significantly larger N100m component in trumpet players for the auditory stimuli. The

N100m, a well-known neuromagnetic evoked response signal, is sensitive to the fundamental properties of auditory stimuli such as pitch or timbre. More importantly, the authors reported that musicians show a pronounced cortical response to the multimodal tone and lip stimulation at 33 ms latency, which was not observed in non-musicians. This suggests that musicians’ multisensory processing is different from non-musicians’ and that integrative processing of audio-tactile signals starts as early at 33 ms at a cortical level, which is in line with my findings on instrumentalists’ violations of the race model via the

RMI analysis.

Cognitive scientists noted a complex involvement of several sensory systems and the motor systems in music playing and expressed the necessity of investigating the effects of different types of training on an interaction between the systems in terms of neuroplasticity (Herholz & Zatorre, 2012). However, most research has focused mainly on an interaction between auditory and motor systems. Therefore, how the somatosensory

230 system interplays with the audio-motor systems is not known. Although it is difficult to understand the instrumentalists’ faster reaction to tactile stimulation as Landry &

Champoux (2017) noted, it seems that specific training and experiences may contribute to increased integration via co-activation between modalities.

Temporal Order Judgment (TOJ) experiment

Compared to the simple reaction time experiment where I investigated the effects of specific musical training on unisensory and multisensory audio-tactile perception (e.g., event detection) and its interaction with spatial factors, the temporal order judgment (TOJ) task allows me to study the effects of specific musical training on complex temporal processes (e.g., perception of temporal order) and their interaction with spatial factors. In chapter 2, I have discussed two independent thresholds in temporal processing. One is the simultaneity threshold. When events reach beyond this spacing, we can identify them as separate events. The other is the temporal order threshold. This is amount of time required to determine the order of events. Determining the temporal order of events involves several processes including event detection, attention, etc. In a classical TOJ setting, two successive events are presented by different sensory modality parings (e.g., audio-tactile parings) with various stimulus onset asynchronies (SOAs) and participants are asked to judge the order of the stimuli. Here I look into how specific musical training modifies the processes involved in audio-tactile TOJ and spatial processing. I hypothesize, if instrumental training contributes to better audio-tactile representation of the peripersonal

231 space near the hands (i.e., perihand space), instrumentalists would show the earliest Point of Subjective Equality (PSE) and the smallest Just Noticeable Difference (JND). PSE is the value of the comparison size at which size judgment performance was at chance level. JND is the smallest difference between two stimuli that can be accurately detected at least fifty percent of the time.

Methods

Participants

The participants’ STATUS was determined to be the same as in the simple reaction time (SRT) experiment. Thirty-five participants took part in the TOJ experiment which took approximately 90 minutes to complete. Thirteen non-musicians (6 males; 13 right- handed; avg. age = 31.50 years, min. 19, max. 62), 13 instrumentalists (6 males; 12 right- handed; avg. age = 32.23 years, min. 19, max.70), and 9 singers (2 males; 9 right-handed; avg. age = 23.88 years, min. 18, max. 24). The instrumentalists’ primary instruments included cello (1), flute (1), (1), percussion (3), piano (3), guqin (1), saxophone (2), and trumpet (1). The participants completed a self-reported questionnaire about musical training. Fourteen participants also took part in the SRT experiment. However, their participation was at least two weeks apart. All participants are from the Columbus, Ohio area and reported normal hearing and touch. One participant who completed only four blocks was excluded for analyses. Having been provided with a written experimental procedure, the participants gave verbal consent to proceed with the experiment. This research was approved by The Ohio State University Institutional Review Board for

Human Research. 232

Stimuli

In a TOJ experiment, sound and touch were always presented in a pair but onsets between the two stimuli of a pair vary depending on 5 different stimulus onset asynchronies

(SOAs: 20ms, 30ms, 55ms, 90ms, and 180ms). There were two different conditions of whether touch comes first or sound comes first (i.e., the factor MODALITY). Additional spatial factors including ALIGN (i.e., stimulus positions for sound and touch are either the same or different), LOCATION (the location of the first stimulus) and ARM (e.g., crossed or uncrossed arms) were also considered. In total, there were the 40 possible conditions (5

SOAs x 2 MODALITY x 2 ALIGN x 2 LOCATION x 2 ARM) (see fig.28). Left- and right- sided stimuli were separated by 100 ° within azimuth.

Figure 28. Eighty stimulus configurations of the TOJ experiment

233

Equipment

In principle, the equipment was as the same as in the SRT experiment. For light illumination to inform the start of stimulus presentation to which a participant pays attention, an LED was added to the presentation microcontroller. For registration of responses, two response buttons were connected to the registration microcontroller (see fig.

29).

Figure 29. Presentation and registration microcontrollers for TOJ experiment

Procedure

Participants were seated comfortably in a dark room and wore ear plugs. The fixation LED light was illuminated at the beginning of each trial. Participants are asked to look at the light throughout the experiment. The first stimulus was presented 750 ms after the LED activation and then the second stimulus was presented with the SOA specified for the trial. 234

Each subject completed one practice block and 12 experimental blocks. A practice block has 32 trials. The SOAs of the practice blocks were twice as long as in the experimental trials to facilitate the acquisition of the task by participants. Participants could ask the experimenter to repeat the practice block until they felt comfortable enough to perform the experimental task. Forty different conditions were presented twice within one block. One experimental block consists of 80 trials. For six blocks, participants placed their arms in an uncrossed normal position. For the other six blocks, participants crossed their arms. The arm positions were counterbalanced within a subject. The inter-stimulus interval varies randomly between 750 ms and 2500 ms. The participants were asked to place their hands near the speakers on two wrist rest sponges and then to decide whether the firstly perceived stimulus was either sound or touch. The participants were also asked to place their dominant foot in the middle of two separate response buttons and then move their big toe laterally in order to press a button of a BBTK response pad. For the sound-first response, the participants were asked to press the right green button whereas, for the touch first response, they were asked to press the left red button (see fig. 30). In addition to the timing information of stimulus presentation, the participants’ RT and accuracy were recorded in the registration microcontroller (see fig. 29). After the experiment, participants had a short debriefing interview, which was recorded.

235

Figure 30. TOJ experiment setup

Results

Reaction Time (RT)

RTs for the temporal order judgment experiment are calculated as the onset differences between the second stimulus of each pair and the participant’s responses. RTs below 75ms and above 2500 ms were eliminated (Whelan, 2008). Two analyses for RT were performed: 1) ANOVA with the means of individual RTs and 2) General Linear

Mixed Model (GLMM) analysis with raw RT data.

ANOVA

I performed a repeated measurement ANOVA on mean reaction time with within- subject factors of 1) ARM (crossed vs. uncrossed arms), 2) LOCATION (left first vs. right),

236

3) SOAs (20ms, 30ms, 55ms, 90ms, and 180 ms), 4) MODALITY (touch first vs. sound first), 5) ALIGN (spatially aligned vs. spatially misaligned audio-tactile), and between- subject factors of 1) GENDER (female vs. male), and 2) STATUS (instrumentalist, vocalist, or non-musicians).

2 Three main factors of STATUS (F2,29 = 4.38, p =0.02, ηp = 0.97), ALIGN (F1,29

2 2 = 5.95, p =0.02, ηp = 0.71), and SOA (F4,116 = 93.09, p <0.0001, ηp = 0.95) were found to be significant at a=0.05.

Significant two-way interaction terms include STATUS:SOA (F8,116 = 2.48, p

2 2 =0.01, ηp = 0.52) , MODALITY: LOCATION (F1, 29 = 22.91, p <0.0001, ηp = 0.22),

2 SOA : LOCATION (F4,29 = 4.15, p <0.01, ηp = 0.09), and LOCATION:ALIGN (F1,29 =

2 19.35, p < 0.01, ηp = 0.26).

Significant three-way interaction terms include STATUS: ARM:ALIGN (F2,29 =

2 2 3.61 , p = 0.03, ηp = 0.06) , STATUS:GENDER:LOCATION (F2,29 = 3.651, p =0.03, ηp

2 = 0.14 ), STATUS:GENDER:MODALITY (F2,29 = 6.05, p <0.01, ηp = 0.35),

2 GENDER:SOA:ALIGN (F4,116 = 2.75, p =0.03, ηp = 0.06), MODALITY:SOA:

2 LOCATION (F4,116 = 2.71, p = 0.03, ηp = 0.12), and SOA:LOCATION:ALIGN (F4,116 =

2 3.77, p < 0.01, ηp = 0.08).

Significant four-way interaction terms include STATUS:

2 GENDER:MODALITY:LOCATION (F2,29 = 4.64, p =0.01, ηp = 0.10), STATUS:SOA:

2 LOCATION:ALIGN (F8,116 = 2.44, p =0.01, ηp = 0.11), GENDER:SOA:LOCATION:

2 ALIGN (F4,116 = 3.77, p < 0.01, ηp = 0.08).

237

The five-way interaction term STATUS:GENDER:MODALITY:LOCATION:

2 ALIGN (F2, 29 = 4.12, p =0.02, ηp = 0.12) was found to be significant at a=0.05.

Factor F-value p-value

STATUS F 2,29 = 4.383 0.0217 ALIGN F 1,29 = 5.951 0.0211 SOA F 4,116 = 93.095 <0.0001 STATUS:SOA F 8,116 = 2.486 0.015 MODALITY:LOCATION F 1,29 = 22.911 <0.0001 SOA:LOCATION F 4,29 = 4.154 0.003 LOCATION:ALIGN F 1,29 =19.35 0.0001 STATUS:ARM:ALIGN F 2,29= 3.61 0.039 STATUS:GENDER:LOCATION F 2,29= 3.651 0.038 STATUS:GENDER:MODALITY F 2,29= 6.05 0.006 GENDER:SOA:ALIGN F 4,116 = 2.752 0.031 MODALITY:SOA:LOCATION F 4,116 = 2.715 0.033 SOA:LOCATION:ALIGN F 4,116 = 3.770 0.006 STATUS:SOA:LOCATION:ALIGN F 8,116 = 2.443 0.017 GENDER:SOA:LOCATION : ALIGN F 4,116 = 3.77 0.006 STATUS:GENDER:MODALITY:LOCATION F 2,29 = 4.645 0.017 STATUS:GENDER:MODALITY:LOCATION:ALIGN F 2,29= 4.129 0.026

Table 5. Temporal Order Judgment (TOJ) experiment reaction time (RT) ANOVA summary, showing only the main factors and interaction terms significant at a =0.05.

A post-hoc pairwise comparison was performed on the significant factors. The p values were adjusted with the Bonferroni correction. For the factor SOA, there were significant differences between multiple levels. An estimated mean RT for 20 ms SOA was

811.25 ms [CI: 747.76-874.75 ms]. The estimated mean RTs for 30 ms and for 55 ms SOAs were 796.05 ms [CI: 732.56-859.55 ms] and 769.09 ms [CI: 705.60-832.59 ms] respectively. The estimated mean RTs for 90 ms and for 180 ms SOAs were 727.12 ms

[CI: 663.63-790.61 ms] and 634.86 ms [CI: 571.36-698.35 ms] respectively.

238

Figure 31. RT for the factor SOA of TOJ experiment

The interaction results showed significant differences between the left and the right

LOCATIONs for the sound first (t = 2.70, p < 0.01) and for the touch-first (t = 3.79, p <

0.001) conditions.

Figure 32. RT for LOCATION:MODALITY interaction of TOJ experiment

239

The interaction results showed significant differences in estimated mean RTs between the left and the right LOCATION for the aligned (t = 2.03, p =0.04) and the misaligned (t = -3.03, p < 0.01) conditions.

Figure 33. RT for ALIGN: MODALITY interaction of TOJ experiment

General Linear Mixed Model Analysis (GLMM)

I performed a GLMM analysis on RT data with an inverse gaussian distribution and an identity link function (Lo & Andrews, 2015). In order to simplify my model, I did not include the factors GENDER and LOCATION which turned out not to be significant in the

ANOVA. As fixed effects I included 1) ARM, 2) SOAs, 3) MODALITY, 4) ALIGN, and

5) STATUS, and as a random effect, I used the subject variable that is nested in STATUS.

The best parsimonious model is: 69

69 The R command used for the model was: 240

Reaction Time = β0 + β1*STATUS + β2*ARM + β3*SOA + β4*STATUS:SOA + β5 * STATUS : ALIGN : ARM + ε

GLMM found three main factors (STATUS, ARM, and SOA) and two interaction terms (STATUS:SOA and STATUS:ALIGN: ARM) to be significant at a =0.05. Results of significant coefficient estimates of fixed effect levels contributing to the model are shown in Table 6.

Fixed effect estimate t value p value Intercept (β0) 721.355 172.913 < 0.0001 STATUS non-musician 190.271 30.209 < 0.0001 STATUS vocalist -143.952 -21.488 < 0.0001 ARM crossed -18.167 -6.942 < 0.0001 SOA 20ms 182.683 60.622 < 0.0001 SOA 30ms 172.164 54.531 < 0.0001 SOA 55ms 140.912 45.594 < 0.0001 SOA 90ms 92.527 31.184 < 0.0001 STATUS non-musician : SOA 20ms - 54.304 -10.918 < 0.0001 STATUS vocalist: SOA 20ms -71.032 -17.598 < 0.0001 STATUS non-musician : SOA 30ms - 54.235 -8.380 < 0.0001 STATUS vocalist: SOA 30ms - 73.087 -16.439 < 0.0001 STATUS non-musician : SOA 55ms -28.669 -5.584 < 0.0001 STATUS vocalist: SOA 55ms -59.770 -12.789 < 0.0001 STATUS non-musician : SOA 90ms -22.162 -4.001 < 0.0001 STATUS vocalist: SOA 90ms -32.941 -7.047 < 0.0001 STATUS instrumentalist : ALIGN aligned : ARM crossed -25.307 -5.504 < 0.0001 STATUS non-musician : ALIGN aligned : ARM crossed -51.526 -8.869 < 0.0001 STATUS vocalist : ALIGN aligned : ARM crossed -10.825 -2.999 0.002 STATUS non-musician : ALIGN aligned : ARM uncrossed 5.166 -7.021 <0.0001 STATUS instrumentalist : ALIGN misaligned : ARM crossed 4.473 -2.414 0.015 STATUS non-musician : ALIGN misaligned : ARM crossed 5.086 -6.694 <0.0001

Table 6. Significant coefficient estimates of the parsimonious GLMM and t values of TOJ experiment RT

MODEL = glmer(Reaction Time ~ STATUS + ARM + SOA + STATUS:SOA + STATUS: ALIGN:ARM+ (1|SUBJECT) + (1|SUBJECT:STATUS), family = inverse.gaussian(link = "identity"), data = experiment2) 241

A post-hoc analysis was performed on significant factors. For the factor STATUS, there were significant differences across instrumentalists [821.68 ms, CI: 810.70-832.67 ms], non-musicians [957.86 ms, CI: 941.18-974.54 ms], and vocalists [634.57 ms, CI:

618.66-650.48 ms].

Figure 34. RT for STATUS of TOJ experiment

For the factor SOA, there were significant differences at five SOA levels except for those between 20 ms and 30ms SOAs. An estimated RT mean for 20 ms SOA was 854.37 ms [CI: 842.75 - 865.98 ms]. The estimated mean RTs for 30 ms and for 55 ms SOAs were

843.19 ms [CI: 831.50 - 854.87 ms] and 824.89 ms [CI: 812.97 - 836.82 ms] respectively.

The estimated mean RTs for 90 ms and for 180 ms SOAs were 787.62 ms [CI: 775.57 -

799.67 ms] and 713.56 ms [CI: 702.66 - 724.26 ms] respectively (see fig. 34). In addition, there were significant participant group differences at each SOA level. There were also significant participant group differences at each SOA level for both the spatially aligned and the misaligned conditions.

242

Figure 35. RT for STATUS:SOA interaction of TOJ experiment

Accuracy

ANOVA

I performed a repeated measurement ANOVA on % correct data with within- subject factors of 1) ARM, 2) LOCATION, 3) SOAs, 4) MODALITY, 5) ALIGN, and between-subject factors of 1) GENDER and 2) STATUS.

2 The main factors of MODALITY (F1, 29 = 17.29, p < 0.01, ηp = 0.85), SOA (F4,116

2 2 = 229.18, p <0.0001, ηp = 0.95), LOCATION (F1, 29 = 22.10, p < 0.01, ηp = 0.55) were found to be significant at a=0.05.

Two-way interaction terms of MODALITY:LOCATION (F1,29 = 29.76, p <

2 2 0.0001, ηp = 0.70), and LOCATION:ALIGN (F 1,29 = 24.18, p = 0.001, ηp = 0.48) were

243 found to be significant at a=0.05. Significant three-way interaction terms include

2 MODALITY:SOA:LOCATION (F4,116 = 4.62, p = 0.03, ηp = 0.14) and MODALITY:

2 LOCATION:ALIGN (F1,29 = 17.22, p =<0.01, ηp = 0.51). One four-way interaction term

2 of MODALITY:SOA:LOCATION:ALIGN (F4,116 = 6.6, p < 0.01, ηp = 0.23) was found to be significant.

Factor F-value p-value

STATUS F 2,29 = 5.235 0.011 MODALITY F 1,29 = 17.291 <0.001 SOA F 4,116 = 229.28 <0.0001 LOCATION F 1,29 = 22.108 <0.0001 STATUS:SOA F 8,116 = 3.186 0.002 ARM:LOCATION F 1,29 = 8.189 0.007 MODALITY:LOCATION F 1,29 =29.763 <0.0001 LOCATION:ALIGN F 1,29 =24.185 <0.0001 STATUS:GENDER:MODALITY F 2,29= 3.602 0.040 STATUS:GENDER:ALIGN F 2,29= 3.583 0.040 STATUS:MODALITY:SOA F 8,116 = 2.041 0.047 MODALITY:SOA:LOCATION F 4,116 = 5.621 0.001 MODALITY:LOCAITON:ALIGN F 1,29 =17.222 0.0002 ARM:LOCATION:ALIGN F 1,29 =9.045 0.005 SOA:LOCATION:ALIGN F 4,116 = 4.281 0.002 STATUS:MODALITY:SOA:LOCATION F 8,116 = 2.054 0.045 STATUS:MODALITY:LOCATION:ALIGN F 2,29 =3.496 0.043 ARM:MODALITY:LOCATION:ALIGN F 1,29 =8.846 0.005 MODALITY:SOA:LOCATION:ALIGN F 4,116 = 6.6 <0.0001 STATUS:ARM: SOA:LOCATION:ALIGN F 8,116 = 2.127 0.038 GENDER:ARM: MODALITY:SOA:LOCATION F 4,116 = 2.803 0.029

Table 7. TOJ experiment accuracy ANOVA summary, showing only the main factors and interaction terms significant at a =0.05. .

244

A post-hoc pairwise comparison was performed on the significant factors with the adjusted p values with a Bonferroni correction. For the factor MODALITY, there was a significant difference between the sound first and touch first conditions (t = 2.99, p < 0.01).

Figure 36. % correct for the factor MODALITY of TOJ experiment

There was significant difference between the left and right LOCATIONs (t = 4.67, p < 0.0001).

Figure 37. % correct for LOCATION of TOJ experiment

245

The interaction results showed a significant difference between the left and right

LOCATIONs only for the misaligned condition (t = 6.39, p < 0.0001). A difference between the spatially aligned and misaligned conditions turned out to be significant for the left (t = - 4.63, p < 0.0001) and right LOCATIONs (t = 3.18, p = 0.002) respectively.

Figure 38. % correct for ALIG :LOCATION interaction of TOJ experiment

The analysis showed a significant difference between the sound first and touch first only for the right LOCATION (t = 5.02, p < 0.0001), and a significant difference between the left and right LOCATIONs for the touch-first condition (t = 7.10, p < 0.0001).

246

Figure 39. % correct for LOCATION:MODALITY interaction of TOJ experiment

For the factor SOA, there were significant differences between multiple levels. As

SOA got longer, the participants showed an improved performance at determining the temporal order of stimuli. An estimated mean in % correct at 20 ms SOA was 56.56% [CI:

53.66 -59.46%]. The estimated means in % correct at 30 ms and 55 ms SOAs were 58.49%

[CI: 55.59 -61.39 %] and 67.42% [CI: 64.52 – 70.32%] respectively. The estimated means in % correct at 90 ms and 180ms SOAs were 75.09% [CI: 72.96 -77.99 %] and 85.80%

[CI: 82.90% - 88.70] respectively.

247

Figure 40. % correct for the factor SOA of TOJ experiment

Only when sound was presented first, there were significant differences between the spatially aligned and misaligned conditions for both the left (t = -5.82, p <0.0001) and right (t = 5.21, p <0.0001) LOCATIONs.

Figure 41. % correct for LOCATION:ALIGN:MODALITY interaction of TOJ experiment

248

General Linear Mixed Model Analysis (GLMM)

I performed a GLMM analysis on the participants’ correct vs. incorrect responses with a binomial error distribution and a probit link function. In order to simplify my model,

I did not include the factors GENDER and ALIGN that turned out not to be significant in the ANOVA. As fixed effects I included 1) ARM, 2) LOCATION, 3) SOAs, 4)

MODALITY, and 5) STATUS, and as a random effect, I used the subject variable that is nested in STATUS. The best parsimonious model is: 70

ACCURACY = β0 + β 1*STATUS + β 2*MODALITY + β 3*SOA + β 4 * STATUS:ARM+ β 5 * STATUS:LOCATION + β 6 * STATUS:MODALITY + β7 * STATUS:SOA +β8*LOCATION:MODALITY + β9 * STATUS:LOCATION : MODALITY +β10 * STATUS:MODALITY:SOA + β 11 * ARM: LOCATION:MODALITY:SOA + ε

Three main factors (STATUS, MODALITY, and SOA), five two-way interaction terms (STATUS:ARM, STATUS:LOCATION, STATUS:MODALITY, STATUS:SOA, and LOCATION:MODALITY), and three three-way interaction terms (STATUS:

LOCATION:MODALITY, STATUS:MODALITY:SOA, and ARM:LOCATION:

70 The R command used for the model was: MODEL = glmer(ACCURACY ~ STATUS + MODALITY + SOA + STATUS:LOCATION + STATUS:ARM + STATUS:MODALITY + STATUS:SOA + MODALITY:LOCATION + STATUS: MODALITY:LOCATION + STATUS:MODALITY:SOA + MODALITY:LOCATION:ARM + (1|SUBJECT) + (1|SUBJECT:STATUS), family = binomial(link = “probit”), data = experiment 2)

249

MODALITY) were found to be significant at a =0.05. The results of significant coefficient estimates of fixed effect levels contributing to the model are shown in Table 8.

Fixed effect estimate z value p value Intercept (β0) 1.833 19.478 < 0.0001 STATUS non-musician -0.707 -5.797 < 0.0001 STATUS vocalist -0.698 -5.248 < 0.0001 MODALITY touch -0.479 -5.438 <0.0001 SOA 20ms -1.369 -18.186 < 0.0001 SOA 30ms -1.241 -16.394 < 0.0001 SOA 55ms -0.861 -11.063 < 0.0001 SOA 90ms -0.576 -7.140 < 0.0001 STATUS instrumentalist : ARM crossed -0.116 -3.370 < 0.001 STATUS non-musician : ARM crossed -0.129 -3.822 < 0.001 STATUS vocalist : ARM crossed -0.083 -2.233 0.025 STATUS non-musician : LOCATION right 0.145 3.569 < 0.001 STATUS vocalist : LOCATION right 0.132 2.851 0.004 STATUS non-musician : MODALITY touch 0.409 3.791 <0.001 STATUS non-musician : SOA 20ms 0.514 5.420 < 0.0001 STATUS vocalist: SOA 20ms 0.323 3.162 0.001 STATUS non-musician : SOA 30ms 0.345 3.629 <0.001 STATUS vocalist: SOA 30ms 0.339 3.299 < 0.001 STATUS non-musician : SOA 55ms 0.232 2.389 0.016 STATUS non-musician : SOA 90ms 0.273 2.701 0.006 MODALITY touch : LOCATION right -0.194 -3.278 0.001 STATUS vocalist: MODALITY touch : SOA 20ms 0.486 5.175 <0.0001 STATUS vocalist: MODALITY touch : SOA 30ms 0.347 3.671 <0.001 STATUS vocalist: MODALITY touch : SOA 55ms 0.353 3.693 <0.001 ARM crossed : LOCATION left : MODALITY touch 0.214 5.270 <0.0001 STATUS non-musician: MODALITY touch : SOA 90ms -0.214 -2.619 0.008 STATUS non-musician : LOCATION right : MODALITY touch -0.286 -4.090 <0.0001 STATUS vocalist : LOCATION right: MODALITY touch -0.177 -2.320 0.020

Table 8. Significant coefficient estimates of for the parsimonious GLMM and t values of TOJ experiment accuracy

250

A post-hoc analysis showed significant differences in levels in MODALITY and

SOA as ANOVA presented. Additionally, there were significant group differences for the factor STATUS. Instrumentalists showed 74.4 % accuracy [CI: 62.1-86.7 %]. Non- musicians and vocalists showed 44.6 % [CI: 32.3 -56.8%] and 45.4% [CI : 30.7 - 60.1 %].

Figure 42. % correct for the factor STATUS of TOJ experiment

The analysis showed a significant difference between the left and right LOCATION for all three participant groups.

The analysis showed significant group differences for the sound first conditions.

The estimated mean accuracy of instrumentalists was 99.5 % [CI: 86.8 -112.2%].71 Non- musicians and vocalists showed 62.1% [CI: 49.6-74.6 %] and 56.0% [41.0-71.0%].

For the factors of SOA and STATUS, there were significant differences in estimated mean % correct in multiple levels. There was a significant difference between instrumentalists and vocalists (Z = 53.67, p = 0.02) for 90 ms SOA. The analysis also showed significant differences between instrumentalists and non-musicians (Z = 3.71, p =

71 The results are given not on response but on the probit scale. Therefore the value exceeds 100 %. 251

0.02) as well as between instrumentalists and vocalists (Z = 5.80, p < 0.001) for 180 ms

SOA.

Figure 43.% correct for STATUS:SOA interaction of TOJ experiment

Additionally, for the sound first condition, instrumentalists showed more correct responses at five different SOA levels.

Figure 44. % correct for STATUS: MODALITY:SOA interaction of TOJ experiment

252

Point of Subjective Equivalence & Just Noticeable Difference

For the PSE and JND analysis, the accuracy data was recoded. The proportion of the ‘sound first’ response was converted to its equivalent Z-score assuming a cumulative normal distribution (Zampini et al., 2005). I combined the variables MODALITY and

SOAs into one, and assigned values (i.e., -180ms, -90ms, -55ms, -30 ms, -20 ms, +20 ms,

+30 ms, +55 ms, +90 ms, and +180 ms) depending on whether the touch stimulus came first (negative SOAs), or the sound stimulus came first (positive SOAs). I modeled the response of the whole population by means of a GLMM with a binomial error distribution and a probit link function (Moscatelli, Mezzetti, & Lacquaniti, 2012). In order to estimate

PSE and JND of three participant groups, I fitted two fixed effects of STATUS, SOA, and their interaction of STATUS:SOA as well as two random effects simultaneously. As a random effect, I used the subjects variable. This model is the following:72

ACCURACY = β0 + β1*STATUS + β2*SOA + β3*STATUS:SOA + ε

From the parameters of the psychometric function, the PSE is:

β0 PSE = − β1

72 The R command used was : MODEL = glmer(ACCURACY ~ STATUS + SOA + STATUS:SOA + (1|SUBJECT), family = binomial (link = “probit”), data = experiment 2) 253

From the parameters of the psychometric function, the JND73 is an inverse function of the slope:

1 JND = β0

Using the bootstrap (B=500) method, I estimated the PSEs and JNDs for instrumentalists, non-musicians, and vocalists. There were significant differences in PSE and between instrumentalists [CI: 20.06-32.26 %] and non-musicians [CI: 34.39-51.08 %] as well as between instrumentalists and vocalists [CI: 35.30-57.67 %]. The significant differences also found in JND and between instrumentalists [CI: 26.38-28.20 %] and non- musicians [CI: 38.84-41.60 %] as well as between instrumentalists and vocalists [CI: 39.31

- 42.46 %].

PSE JND STATUS Estimate SE Estimate SE Instrumentalists 25.895 2.996 27.227 0.470 Non-musicians 42.782 4.274 40.284 0.684 Vocalists 46.487 5.582 40.862 0.787

Table 9. PSE and JND estimates for instrumentalists, non-musicians and vocalists

73 Alternatively, the JND is defined as the smallest temporal interval at which participants can accurately discriminate the temporal order of the stimuli on 75% of the trials (Occelli, Spencer, & Zampini, 2011). 254

Figure 45.% correct of the ‘sound first’ responses of instrumentalists, non-musicians, and vocalists; circle = PSE; square = PSE+JND.

In order to test external (i.e., ALIGN and LOCATION) and internal (i.e., ARM) spatial factor effects, I modeled the response of these factors by means of a GLMM for each group. 74 Musicians showed no significant effects of any spatial factors. Non- musicians showed a significant difference between the crossed and uncrossed ARM conditions.

Spatial PSE JND factors Estimate SE Estimate SE aligned 26.005 2.889 26.427 0.659 ALIGN misaligned 25.768 3.092 27.996 0.653 left 25.629 3.015 27.012 0.645 LOCATION right 26.157 2.986 27.429 0.670 crossed 25.169 3.094 26.569 0.659 ARM uncrossed 26.625 3.207 27.851 0.696

Table 10. PSE and JND estimates for spatial factors of ALIGN, LOCATION, and ARM in instrumentalists

74 The R command used was : MODEL = glmer(ACCURACY ~ SOA * spatial factor (e.g., ALIGN, LOCATION, ARM) + (1|SUBJECT), family = binomial(link = “probit”), data = STATUS subset of experiment 2) 255

Spatial PSE JND factors Estimate SE Estimate SE aligned 46.993 5.792 42.131 1.286 ALIGN misaligned 45.937 5.407 39.530 1.204 left 49.114 5.627 41.369 1.302 LOCATION right 43.894 5.368 40.310 1.264 crossed 45.793 5.826 41.065 1.215 ARM uncrossed 47.170 5.830 40.643 1.163

Table 11. PSE and JND estimates for spatial factors of ALIGN, LOCATION, and ARM in vocalists

Spatial PSE JND factors Estimate SE Estimate SE aligned 43.688 4.285 40.975 0.992 ALIGN misaligned 41.883 4.192 39.606 0.954 left 46.072 4.434 41.709 1.059 LOCATION right 39.543 4.110 38.772 0.924 crossed 38.776 3.990 37.137 0.910 ARM uncrossed 46.822 4.585 43.128 1.064

Table 12. PSE and JND estimates for spatial factors of ALIGN, LOCATION, and ARM in non-musicians

Figure 46. % correct of the ‘sound first’ responses for ARM in non-musicians; circle = PSE; square = PSE+JND.

256

Discussion

In this study, I examined the effects of musical training on unspeeded RTs and accuracy in a temporal order judgment (TOJ) task for audio-tactile stimuli while taking three spatial factors (i.e., ALIGN, LOCATION, and ARM) into consideration. For the same reason discussed in the simple reaction time experiment, I performed both ANOVA and GLMM analyses on reaction time and accuracy data for the TOJ experiment. These two reaction time analyses showed a significant effect of the main factor SOA. GLMM showed significant effects of two additional main factors of STATUS and an internal spatial factor ARM. Two interaction terms (i.e., MODALITY:LOCATION and

LOCATION:ALIGN) turned out to be significant in ANOVA. GLMM had two significant interaction terms that include STATUS:SOA and STATUS:ALIGN:ARM.

For accuracy, ANOVA and GLMM analyses showed significant effects for the main factors of MODALITY and SOA. An external spatial factor LOCATION was also significant in ANOVA. GLMM showed significant effects for the main factor STATUS.

ANOVA and GLMM showed a significant two-way interaction term of MODALITY:

LOCATION. Several interaction terms turned out to be significant only for ANOVA, and the main factor MODALITY played a role in the interactions (e.g., MODALITY:SOA:

LOCATION, MODALITY:LOCATION:ALIGN, and MODALITY:SOA:LOCATION:

ALIGN). Several interaction terms turned out to be significant only for GLMM, the main factor STATUS played a role in the interactions (e.g., STATUS:ARM, STATUS:

LOCATION, STATUS:MODALITY, STATUS:SOA, etc.). As discussed earlier, the difference in the results between ANOVA and GLMM may be due to different distributions

257 of data. ANOVA was based on normally distributed mean data while GLMM was based on the raw reaction time data.

To summarize, the smaller the SOAs, the slower RTs, and the more incorrect the responses. Musicians were significant faster than non-musicians and, in the musician group, vocalists were faster than instrumentalists. Instrumentalists were more accurate than the other two participant groups. However, vocalists and non-musicians had no significant differences in accuracy. In comparison with the other two groups, instrumentalists showed lower PSE and JND at performing an audio-tactile TOJ task. Given the higher accuracy in the instrumentalists found for the sound-first condition (see fig. 42) at all five SOA (see fig. 43) the instrumentalists’ low temporal order threshold, compared to the non-musicians and vocalists, may be associated with auditory processing.

With regard to the effects of spatial factors, there was no significant difference between crossed and uncrossed arms for two musician groups. Interestingly, only non- musicians showed a significant improvement at determining temporal order when crossing their arms (see fig. 46). When the first stimulus was delivered on the left side, there was no significant difference in reaction time between auditory and tactile presentation.

However, when the first stimulus is tactile and is presented on the right side, the participants responded significantly more slowly and less correctly than when it was auditory. The participants also responded more correctly when sounds were presented first.

Zampini et al. (2005) pointed out significant performance differences in accuracy between TOJ tasks for experienced (e.g., the experimenter) and inexperienced participants.

This implies an effect of training on performing TOJ experiments. In my experiment, all

258 participants were inexperienced, however, there was a group difference in performing the

TOJ tasks. Why would it be the case? We may find plausible explanations from cognitive science studies that investigate neuroplasticity with musicians. In his review, Schlaug

(2015) pointed out that long-term musical training is an excellent model to study brain plasticity because music-making provides a multisensorially rich environment that is strongly coupled with motor movements. Wan & Schlaug (2010) argued that musical training not only modifies regions surrounding the intraparietal sulcus, known as multimodal integration regions but also may alter the performance of experimental tasks testing cognitive functions (e.g., language). It might be therefore possible that musical training affects performance in TOJ tasks. Using an electroencephalography (EEG) technique, Bernasconi, Grivel, Murray, & Spierer (2010) investigated underlying mechanisms for temporal perception and the effect of short-term training on an auditory

TOJ paradigm in terms of short-term learning-induced plasticity. Behaviorally, Bernasconi et al. (2010) found a significant improvement in an auditory TOJ task with 30 mins of training. An auditory evoked potential waveform showed a significant difference before and after training in a right fronto-central electrode (i.e., FC4) within 43-75 ms post- stimulus interval. In addition, an early training period was associated with activities in the bilateral posterior sylvian regions whereas a late training period showed stronger activities in the left posterior sylvian regions. However, these studies on brain plasticity and musical training do not provide clear explanations about how musical training changes brain regions for audio-tactile integration that then would be able to affect TOJ tasks. Given not only that there were behavioral response differences between vocalists and instrumentalists

259 in my experiment, but also that there seem to be differences in the musical training of the two groups depending on an involvement of touch, I propose that, at least at cortical level, the vocalists’ faster reaction may be due to modified representation of sensory inputs in the posterior lateral sulcus whereas the instrumentalists’ superior performance in an audio- tactile TOJ is probably due to modified representation in both the intraparietal and posterior lateral sulci. This proposal seems worthwhile to be further investigated.

Zampini et al. (2005)’s study is the first work integrating a spatial factor into an audio-tactile TOJ. As I noted earlier, however, they did not investigate the effect of an internal spatial factor (i.e., crossed vs. uncrossed arms). To my knowledge, there are several tactile TOJ experiments using different arm posture but there is no audio-tactile TOJ experiment using crossed arms. Landry & Champoux (2017) investigated the effect of musical training in a tactile TOJ task using crossed arms. They reported that musicians reacted faster but more incorrectly when crossing the arms. This is quite different from my results, where musicians were faster than non-musicians, and instrumentalists were more correct compared to the two other groups, regardless of arm postures. Additionally, crossed arm did not affect musicians but led to a significant improvement in non-musicians’ accuracy. Some of the participants reported that crossed arm posture made them pay more attention to the task. At any rate, the different findings in Landry & Champoux’s (2017) report and my experiment seem to be due to multisensory effects on TOJ.

In Zampini et al. (2005), their factor ‘relative spatial position’ is equivalent to the spatially aligned condition with uncrossed arms in my experiment. The authors observed no significant effect in the relative position. According to Zampini et al. (2005), this null

260 effect suggests that audio-tactile pairing is ‘less spatial’ than other multisensory pairing. In line with their argument, my accuracy data did not show a significant effect of ALIGN.

However, it is important to note that Zampini and his colleagues did not report RTs. RT data in my TOJ experiment suggest that ALIGN had a significant effect at least for non- musicians when they uncrossed their arms. Therefore, a different measurement (e.g., RT) may lead to different interpretation of the same phenomenon.

In addition to the factor ALIGN, my experiment also considers the external spatial factor, LOCATION. The participants showed a significantly more correct response when a touch stimulus was presented first on the left side compared to the right. Given Rizzolatti and his colleagues’ findings not only that bimodal neurons responds both to visual stimuli and to tactile stimuli when stimuli were delivered to an animal’s face or arms but also that visual receptive fields are closely linked with tactile receptive fields, it might be a possible explanation for similar perceptual bias patterns between visual and tactile systems. In research, a left side underestimation phenomenon was found using a line bisection test, a type of simple visuospatial tasks, which has been developed to detect unilateral spatial neglect patients. In this test, a participant is asked to place a mark the center of a series of straight, horizontal lines that are visually or tactilely perceived. It showed that healthy people tend to judge the center of line to the left of objective center.

Both visual and tactile modalities showed left side underestimation. Another spatial bias was found in visual scanning tasks. The term ‘scanning’ refers to an ability to use vision efficiently and actively to look for information in the environment. Researchers found that some participants showed a left-to-right direction of scanning of visual stimuli while others

261 showed right-to-left. They also argued that the scanning direction related to reading direction. For instance, French showed left to right scanning pattern and Israeli showed the opposite. (for a review see Chokron, Kazandjian, & Agostini, 2009). Given these perceptual biases found by vision researchers and the fact that my participants are all

English readers, it might be that my participants use the left-to-right tactile scanning system thus showed better accuracy at least for tactile stimuli.

General Discussion

This is the first study about different perihand representations of musicians and non-musicians as well as instrumentalists and vocalists. Both simple reaction time and temporal order judgment experiments showed musical training effects on audio-tactile perihand space. The first experiment showed the effects of musical training on spatial processing and audio-tactile multisensory integration. The results of the simple reaction time experiment suggest not only that instrumental training shifts from tactile perihend space (i.e., ARM and LOCATION effects for the unisensory tactile condition in non- musicians) to the audio or audio-tactile one (ARM and LOCATION effects for both the unisensory auditory and multisensory conditions in instrumentalists) but also that this shift may be due to multisensory co-activation at an early stage of perception. The second experiment demonstrated the effects of musical training on temporal processing in addition to spatial processing in terms of audio-tactile perception. The experimental finding showed that musical training alters not only several processes involved in the TOJ task but also multiple stages of perception. Furthermore, different types of musical training have

262 different effects on perception (i.e., significant different PSE and JND between instrumentalists and vocalists).

Musical training

One of the main discoveries in these two experiments is an effect of specific musical training on audio-tactile perception. For the past decades, cognitive scientists have found that musical training leads to both structural and functional changes of the brain.

This is summarized in transformative technology of the mind theory where Patel (2008,

2010) pointed out that music can have a powerful effect on pre-existing brain function.

Structurally, musicians have a larger anterior corpus callosum in non-musicians (Schlaug,

Jäncke, Huang, & Steinmetz, 1995), and there was a significant volume difference in gray matter between musicians and non-musicians (Gaser & Schlaug, 2003). In terms of specific musical instrument training, Bengtsson, Nagy, Skare, Forsman, Forssbert, & Ullén (2005) found that piano practicing leads changes in white matter structure in internal capsule, isthmus and splenium pars of corpus callosum, and arcuate fasciculus. Functionally,

Schneider, Scherg, Dosch, Specht, Gutschalk, & Rupp (2002) showed an enhanced activation of the heschl’s gyrus in the auditory cortex of musicians. Pantev and his colleagues also found modified auditory and somatosensory representation in musicians

(Elbert, Pantev, Christian Wienbruch, Rockstroh, & Taub, 1995; Hund-Georgiadis & von

Cramon, 1999; Pantev, Oostenveld, Engelien, Ross, Roberts, & Hoke, 1998; Pantev,

Engelien, Candia, & Elbert, 2001). Using an MEG method, Lappe, Herholz, Trainor, &

Pantev (2008) found difference in cortical responses, that is, mismatch negativity (MMN),

263 between listening-based and instrument-playing based musical training over two weeks.

MMN is a pre-attentive frontocentral negative component of the event-related potential or field, measured at latencies of 120 - 250 ms after the stimulus onset, in the auditory cortex.

Both musical training groups showed significant difference in MMN before and after training. However, instrument playing training showed a significant enlargement of MMN compared to listening-based training. The authors concluded that sensorimotor-auditory musical training (i.e., instrument playing) enhances music representation in the auditory cortex significantly more than mere auditory training (i.e., listening). Promisingly more and more researchers seem to not only focus on effects of specific musical training rather than effect of musical training in a comprehensive way but also look at effects of musical training on multisensory perception rather than auditory perception. However, the neural substrates responsible for integrating audio-tactile inputs and effects of instrumental training on these neural correlates are not clearly known.

Multisensory integration

The simple reaction time experiment suggests an effect of instrumental training on audio-tactile multisensory integration. Among three participant groups, only instrumentalists showed a significant violation of race model inequality. Also there were significant group differences in the cumulative probability analysis between instrumentalists and non-musicians. This indicates that musical training affects processes responsible for enhanced multisensory integration at various levels. Supporting Miller’s

(1982) coactivation model, at neuronal level (see chapter 3), Stein & Stanford (2008)

264 defined multisensory enhancement as “a statistically significant difference between the number of impulses evoked by a cross modal combination of stimuli and the number of evoked by the most effective of these stimuli individually” (p. 255). Stein and his colleagues reviewed the studies investigating multisensory integration since 1970’s and noted the role of the superior colliculus in multisensory enhancement (Stein et al., 2014).

They further described the deep layers of the superior colliculus as the first primary location where multisensory information converges and discussed that visual, auditory, and somatosensory representations are overlapped spatially with one another.

Using an EEG, Murray et al. (2005) conducted the same experiment to my simple reaction time experiment. The authors found electrophysiological evidence for the violation of the race model. They reported neural responses to an audio-tactile multisensory condition were significantly different at 50 ms and at 200 ms period after a stimulus presentation than the summed responses of unisensory conditions and observed neural activities in the posterior auditory cortex and the posterior superior temporal gyrus contralateral to the hand stimulation. These regions are thought to be the homologues of macaque area caudomedial belt region surrounding primary auditory cortex. As discussed previously, musical training modifies a representation of the auditory cortex. Given Murray et al. (2005), it is plausible that a different response of my instrumentalist participants in the simple reaction time experiment might be due to changes in the posterior auditory cortex and the posterior superior temporal gyrus. This should be investigated further.

265

Spatiotemporal processing

The temporal order judgment experiment suggests an effect of instrumental training on spatio-temporal processing. To my knowledge there is no brain imaging study exploring neural substrates of an audio-tactile temporal order judgment task while considering spatial factors. Recently, researchers identified neural correlates for temporal processing using audio-visual stimuli, but without consideration of spatial factors. Using an fMRI, Binder

(2015) performed audiovisual temporal order judgment and simultaneity judgment experiments as a control. He reported that temporal order judgment showed additional significant activations in the posterior inferior and the middle frontal gyrus, the junction of temporal occipital lobes, the left superior and inferior parietal lobule compared to the control experiment. Given these substrates are known for encoding salience in space and time, etc., Binder (2015) concluded audiovisual temporal order judgment involves additional processes. In line, Love, Petrini, Pernet, Latinus, & Pollick (2018) also conducted an fMIRI experiment with audiovisual temporal order judgment task and simultaneity judgment task as control. They found clear differences in BOLD signals between the two tasks. Temporal order judgment experiment showed transient but increased significant activations in the left hemisphere including the middle occipital, the middle frontal, the precuneus and the superior medial frontal cortex, which suggests the involvement of additional processes due to task difficulty.

In terms of temporal order judgment and the effect of a spatial factor (crossed vs. uncrossed arms), Takahashi, Kansaku, Wada, Shibuya & Kitazawa (2013) conducted a tactile temporal order judgment experiment and, as a control, a numerosity judgment

266 experiment for tactile stimulation in an fMRI study. They found not only stronger activations bilaterally in the premotor cortices, the middle frontal gyri, the inferior parietal cortices and the supramarginal gyri, and the posterior part of the superior and middle temporal gyri for temporal order judgment but also significant activation in the perisylvian areas for crossed arms. They proposed that an interaction of the perisylvian cortex with the prefrontal and the parietal cortices combines tactile spatial representation. Putting

Takahashi, et al. (2013) with audiovisual temporal order studies (e.g., Binder, 2015; Love et al., 2018) together, temporal order judgment seems to require activation for neural substrates for selective attention in general. However, additional spatial processing due to internal spatial factor and/or tactile processing may be associated with an activation of the perisylvian cortex. This reminds us of an EEG experiment by Bernasconi et al. (2010) where short-term training of auditory temporal order judgment task modifies neural response of the posterior perisylvian area.

267

Chapter 7. Conclusion and suggestions for future research

The overall goal of this dissertation is to work towards a cross-cultural model that can explain the human sense of time and space and role of music in our spatiotemporal experience in both perceptual and conceptual aspects. For this, I combined several different methodologies. I also provided behavioral and textual evidence. In order to have a more comprehensive understanding of it, this model should be supported by including future physiological (see experimental discussion section of chapter 6) and philosophical (e.g., phenomenology, embodiment, etc.) studies.

In this dissertation, I looked at how different ways of music-making shape our bodies, and further affect our experience of time and space. I also introduced how time and space have been studied in the field of the cognitive science. In chapter 2, I discussed that our temporal experience is shaped by 1) event detection, 2) perception of temporal order: simultaneity vs. succession, 3) duration perception, 4) duration estimation, 5) the psychological present, and 6) rhythm perception.

Chapter 3 dealt with two components of space pertaining to the body, namely, body space and peripersonal space (i.e., space adjacent to the body). Body space consists of postural and superficial (i.e., body surface related) schemata, which are associated with proprioception and touch respectively. Peripersonal space is characterized by various 268 features including 1) multisensory integration, 2) body-part centered specificity, 3) sensorimotor coupling, and 4) plasticity. Further, I discussed potential differences between singing and instrument-playing bodies and recognized audio-tactile perception and perihand space as conspicuous differences between them.

In chapter 4, I explored how various cultures understand time and space differently and how specific experience of the environment contributes differently to conceptualization of time and space. My comparative analysis of ancient texts concerning the origins of time and space concepts showed an intimate connection between cultural factors and these cognitive constructs. Although the four selected cultures showed differences depending on how the ancient people experienced their environments, they shared similar understandings of the world (i.e., eternity, ephemerality, & cyclicity). Above all, these cultures implied an importance of human action via music-making in a ritual context in order to recover cosmic order.

Although music-making includes all kinds of different human behaviors, here I have looked only at our activities in connection with sound production. In music performance, we sing and play an instrument. In chapter 5, I examined how these two modes of music performance transforms our experience and traced back their different evolutionary trajectories. I also discussed another human communication system that uses both vocal and non-vocal sounds, language. The of two ways of acoustic communication (i.e., vocal vs. non-vocal) in two behaviors (i.e., music and language) is a human specific phenomenon.

269

In the Hear Your Touch project (chapter 6), I focused on the first two constituents of temporal experience, that is, event detection and perception of temporal order. They are the fundamentals that allow us to have experience of time. As I discussed in chapter 2, stimulus features are perceived as changes happening around us. Once events are detected, our minds tend to organize them in terms of whether they occur at the same time or successively. In addition to these two psychological building blocks of time, I took the spatial factors that may show potential differences between two behaviors in music performance to conduct two behavioral experiments. The simple reaction time experiment focuses on event detection while the temporal order judgment experiment concerns the perception of the successive temporal order of events in terms of the audio-tactile integration. The experiments yielded supporting evidence for differential spatiotemporal processing not only between musicians and non-musicians but also between vocalists and instrumentalists. This further implies a necessity to recognize behavioral diversity in music-making in the field of music research.

My experimental findings demonstrate how cultural factors such as training in and the experience of vocal vs. instrumental music influence our perceptions of time and space differently. Putting a cross-cultural analysis (chapter 4) and experimental results together,

I found space and time are often seen as merely distinctive conceptual entities, however, they are closely linked to the way we experience and interact with our environment. This reminds me of Will (2017)’s argument that cognitive constructs such as time and space need to be discussed and understood at biological, experiential, and environmental levels.

270

While being fascinated by witnessing different transformative effects of distinctive musical behaviors, I noticed that music often becomes an ambiguous word when we talk about it. Although there were historical traditions that made a clear distinction between singing and instrument playing (chapter 5), many contemporary music researchers do not make this distinction clearly in their scholarly works. Although I looked only at sound production aspect of music-making, there are also listening, dancing, and ceremony (e.g., the Blackfoot Indians saapup), etc.

Even though this dissertation examined only the transformative processes of music- making at the level of the individual participant, we make music together in large and small groups or even alone. We sing together, we play together, and we dance together (e.g., Ewe agbekor). We also listen together (e.g., concert goers). Remarking the complexity in the dynamics of embodiment process, Laroche, Berardi & Brangier (2014) noted the emergence of a sense of ‘being-together-in-time’ from embodied interactions. For a better understanding of the relationship of embodiment and spatiotemporal experience, it is worthwhile to examine how individuals interact in music-making. Music-making as social behavior can be approached as joint action, that is, any form of social interaction in which individuals coordinate actions in space and time to change the environment (Sebanz,

Bekkering, & Knoblich, 2006). Although joint action as embodied interaction has been shown to modify our experience of time and space, music researchers have mainly studied its temporal aspect (Keller, Novembre, & Hove, 2014), while researchers investigating space have not paid attention to music-making. Studying multisensory interactions in space perception, Teneggi, Canzoneri, di Pellegrino, & Serino (2013) found that peripersonal

271 space is sensitive to social modulation. Given that perception is critical for embodiment and also that multisensory processing may aid in coordination demands of joint action

(Vespers et al., 2017), it would be worthwhile to investigate in detail how multisensory perception affects joint music-making and how embodied interactions in music-making changes our experience of time and space. Understanding the behavioral diversity of music-making helps us to explore how mind and body work together and further shows how many worlds are established by different behaviors of music-making.

272

References

Adelstein, B. D., Begault, D. R., Anderson, M. R., & Wenzel, E. M. (2003). Sensitivity to haptic-audio asynchrony. Proceedings of the 5th International Conference on Multimodal Interfaces, pp. 73–76.

Aho, M. (2017). Tangible in Music: The Tactile Learning of a Musical Instrument. New York, NY: Routledge.

Algra, K. (1995). Topos, Chôra, Kenon: Some case studies. In Concepts of Space in Greek Thought. (pp.31-71), Leiden: E.J. Brill.

Allman, M. J., & Mareschal, D. (2016). Possible evolutionary and developmental mechanisms of mental time travel (and implications for autism). Current Opinion in Behavioral Sciences, 8(2), 220-225. doi: 10.1016/j.cobeha.2016.02.018

Altenmüller, E., Kopiez, R. & Grewe, O. (2013). “A contribution to the evolutionary basis of music: lessons from the chill response,” In Altenmüller, E., Schmidt, S., & Zimmermann, E. (Eds.), Evolution of Emotional Communication: From Sounds in Nonhuman Mammals to Speech and Music in Man (pp.313-335). Oxford: Oxford University Press.

Anderson-Barnes, V. C., McAuliffe, C., Swanberg, K. M., & Tsao, J. W. (2009). Phantom limb pain-a phenomenon of proprioceptive memory?. Medical Hypotheses, 73(4), 555-558.

Arnold, K., & Zuberbühler, K. (2006). Semantic combinations in primate calls. Nature, 441, 7091. doi:10.1038/441303a

Atkin, A. (2013). Peirce's theory of signs. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy. Retrieved from https://plato.stanford.edu/archives/sum2013/ entries/peirce-semiotics/

Baily, J., & Driver, P. (1992). Spatio-motor thinking in playing folk guitar. The World of Music, 34(3), 57-71.

Balslev, A. N. (2009). A Study of Time in Indian Philosophy. Delhi: Motilal Banarsidass. 273

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss. v067.i01

Baumann, M. P. (1985). The Kantu ensemble of the Kallawaya at Charazani (). Yearbook for Traditional Music 17, 146 -166.

Bautista, D.M., & Lumpkin, E. A. (2011) Perspectives on information and coding in mammalian sensory physiology: Probing mammalian touch transduction. Journal of General Physiolgy, 138(3), 291-301. doi: 10.1085/jgp.201110637

Belin, P., & Zatorre, R. J. (2003). Adaptation to speaker’s voice in right anterior temporal lobe. Neuroreport, 14(16), 2105-2109. doi:10.1097/00001756-200311140-00019

Belin, P., Zatorre, R. J., & Ahad, P. (2002). Human temporal-lobe response to vocal sounds. Cognitive Brain Research, 13(1), 17-26. doi:10.1016/s0926-6410(01)00084-2

Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403, 6767, 309-312.

Berent, I. (2013). The phonological mind. Trends in Cognitive Sciences, 17(7), 319-27. doi: 10.1016/j.tics.2013.05.004

Berent, I. (2016). Commentary: “An evaluation of universal grammar and the Phonological mind ”-UG Is still a viable hypothesis. Frontiers in Psychology, 7(1029). doi: 10.3389/fpsyg.2016.01029

Berlucchi, G., & Aglioti, S. M. (2010). The body in the brain revisited. Experimental Brain Research, 200(1), 25-35. doi: 10.1007/s00221-009-1970-7

Bernasconi, F., Grivel, J., Murray, M. M., & Spierer, L. (2010). Plastic brain mechanisms for attaining auditory temporal order judgment proficiency. NeuroImage,50(3), 1271-1279. doi:10.1016/j.neuroimage.2010.01.016

Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssgerg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8(9),1148-50.

Bharata, M., & Ghosh, M. (1961). The Nāṭyasāstra: A Treatise on Ancient Indian Dramaturgy and Histrionics, Ascribed to Bharata Muni. Calcutta: Manisha Granthalaya.

Bidmead, J. (2014). The Akītu Festival: Religious Continuity and Royal Legitimation in Mesopotamia. Piscataway, NJ: Gorgias Press. 274

Binder, M. (2015). Neural correlates of audiovisual temporal processing – comparison of temporal order and simultaneity judgments. Neuroscience 300, 432–447. doi: 10.1016/j.neuroscience.2015.05.011

Blacker, C., Loewe, M., & Plumley, J. M. (1975). Ancient Cosmologies. London: Allen & Unwin.

Blacking, J. (1977). How Musical Is Man?. Seattle: University of Washington Press.

Blake, E. C., & Cross, I. (2008). Flint tools as portable sound-producing objects in the Upper Palaeolithic context: an experimental study. In P. Cunningham, J. Heep, & R. Paardekooper. (Eds.), Experiencing Archaeology by Experiment (pp. 1-19). Oxford: Oxbow.

Bodde, D. (1991). The orderings of space time and things. In Chinese Thought, Society, and Science: The Intellectual and Social Background of Science and Technology in Pre-modern China. (pp. 97-147), Honolulu: University of Hawaii Press.

Boroditsky, L., & Gaby, A. (2010). Remembrances of times east: Absolute Spatial Representations of Time in an Australian Aboriginal Community. Psychological Science,21(11), 1635-1639. doi:10.1177/0956797610386621

Bower, C. (2001). Boethius In Grove Music Online. doi:10.1093/gmo/9781561592630. article.03386

Bregman, M.R., Iversen, J.R., Lichman, D., Reinhart, M. & Patel, A.D. (2013). A method for testing synchronization to a musical beat in domestic horses (Equus ferus caballus). Empirical Musicology Review, 7, 144-156.

Brenowitz, E. A., & Beecher, M. D. (2005). Song learning in birds: Diversity and plasticity, opportunities and challenges. Trends in Neurosciences, 28(3), 127-132. doi:10.1016/j.tins.2005.01.004

Brown, D. (2000). The cuneiform conception of celestial space and time. Cambridge Archaeological Journal, 10 (1), 103-122.

Brown, L. E., & Goodale, M. A. (2013). A brief review of the role of training in near-tool effects. Frontiers in Psychology, 4, 1-5. doi: 10.3389/fpsyg.2013.00576

Brown, S. (2000). The “Musilanguage” Model of Music Evolution. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 271-300). Cambridge, Mass: MIT Press.

275

Brown, S. (2019, February 6). Origins of the Vocal Brain in Humans. Lecture presented at Spring School - Language and Music in Cognition: Integrated Approaches to Cognitive Systems at University of Cologne, Germany.

Brown, S., Merker, B., & W. N.L. (2000). An introduction to evolutionary musicology. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 3-24). Cambridge, Mass.: MIT Press.

Brown, S., Ngan, E., & Liotti, M. (2008). A larynx area in the human motor cortex. Cerebral Cortex,18(4), 837-845. doi:10.1093/cercor/bhm131

Brown, W. N. (1968). Agni, sun, sacrifice, and vāc: A sacerdotal ode by dīrghatamas (Rig Veda 1.164). Journal of the American Oriental Society, 88(2), 199-218.

Brozzoli, C., Makin, T.R., Cardinali, L., Holmes, N.P., & Farne, A. (2012). Peripersonal space: a multisensory interface for body–object interactions. In Murray, M.M., Wallace, M.T. (Eds.), The Neural Bases of Multisensory Processes, Boca Raton, Taylor & Francis. https://www.ncbi.nlm.nih.gov/books/NBK92879/

Busnel, R.-G. & Classe, A. (1976). Whistled Languages. :Springer-Verlag.

Cardinali, L., Brozzoli, C., & Farnáe, A. (2009). Peripersonal space and body schema: Two labels for the same concept?. Brain Topography, 21, 3-4. doi: 10.1007/s10548- 009-0092-7

Carrington, J. F. Talking drums of Africa. In Sebeok, T. A., & In Umiker-Sebeok, D. J. (Eds.), Speech Surrogates: Drum and Whistle Systems (pp. 591-668). Hauge: Mouton.

Chan, W. (1963). A Source Book in Chinese Philosophy. Princeton, N.J: Princeton University Press.

Cheney, D. L. & Seyfarth, R. M. (1985). Vervet monkey alarm calls: Manipulation through shared information?. Behaviour, 94(1), 150-166. doi: 10.1163/156853985X00316

Cheong, Y. J., & Will, U. (2018). Music, space and body: the evolutionary history of vocal and instrumental music. Proceedings of 15th International Conference on Music Perception and Cognition 10th triennial conference of the European Society for the Cognitive Sciences of Music. Montréal, Canada: Concordia University

Cheong, Y. J., Will, U., & Lin, Y-Y. (2017). Do vocal and instrumental primes affect word processing differently: An fMRI study on the influence of melodic primes on word processing in Chinese musicians and non-musicians. Proceedings of 25th Anniversary

276

Conference of the European Society for the Cognitive Sciences of Music, 35-39. Ghent, Belgium: University of Ghent

Chokron, S., Kazandjian, S., & De Agostini, M. (2009). Effects of reading direction on visuospatial organization: A critical review. In G. Aikaterini & K. Mylonas (Eds.), Quod Erat Demonstrandum: From Herodotus’ ethnographic journeys to cross- cultural research: Proceedings from the 18th International Congress of the International Association for Cross-Cultural Psychology.

Cirelli, L. K., Einarson, K. M., & Trainor, L. J. (2014). Interpersonal synchrony increases prosocial behavior in infants. Developmental Science, 17(6), 1003-1011.

Cissewski, J., & Boesch, C. (2016). Communication without language: How great apes may cover crucial advantages of language without creating a system of symbolic communication. Gesture, 15 (2), 224-249. doi: 10.1075/gest.15.2.04cis

Clay, J. S. (2003). Hesiod's Cosmos. Cambridge, U.K: Cambridge University Press.

Clayton, M. (2000). Theoretical perspectives I. In Time in Indian music: Rhythm, Metre, and Form in North Indian Rāg Performance. (pp.10-26), Oxford: Oxford University Press.

Clayton, M., Sager, R., & Will, U. (2005). In time with the music: The concept of entrainment and its significance for ethnomusicology. European Meetings in Ethnomusicology, 11, 3-142.

Cléry, J., Guipponi, O., Wardak, C., & Hamed, B. S. (2015). Neuronal bases of peripersonal and extrapersonal spaces, their plasticity and their dynamics: Knowns and unknowns. Neuropsychologia, 70(5), 313-326. doi: 10.1016/j.neuropsychologia. 2014.10.022

Coello, Y., Bourgeois, J., & Iachini, T. (2012). Embodied perception of reachable space: how do we manage threatening objects?. Cognitive Processing: International Quarterly of Cognitive Science, 13(1), 131-135. doi: 10.1007/s10339-012-0470-z

Cohen, Y. E., & Andersen, R. A. (2002). A common reference frame for movement plans in the posterior parietal cortex. Nature Reviews Neuroscience, 3(7), 553-562. doi: 10.1038/nrn873

Conard, N. J., Malina, M., & Münzel, S. C. (2009). New flutes document the earliest musical tradition in southwestern Germany. Nature, 460 (7256), 738-740. doi:10.1038/nature08169

277

Cook, P., Rouse, A., Wilson, M., & Reichmuth, C. (2013). A california sea lion (Zalophus californianus) can keep the beat: motor entrainment to rhythmic auditory stimuli in a nonvocal mimic. Journal of Comparative Psychology, 127(4), 412-427.

Cook, S. (1995). "Yue Ji" 樂記: record of music: Introduction, translation, notes, and commentary. Asian Music, 26(2), 1-96.

Corish, D. (1986). The emergence of time. In Fraser, J. T., Lawrence, N. M., & Haber, F. C. (Eds.), Time, Science, and Society in China and the West. (pp.67-78), Amherst: University of Massachusetts Press.

Cornford, F.M. (1957). Pattern of Ionian cosmogony. In Munitz, M. K. (Ed.), Theories of the Universe: From Babylonian Myth to Modern Science. (pp.21-31), Glencoe, Ill: Free Press.

Cowan, G.M. (1976). Mazateco whistle speech. In Sebeok, T. A., & In Umiker-Sebeok, D. J. (Eds). Speech Surrogates: Drum and Whistle systems (pp. .1386-1393). Hauge: Mouton.

Cross, I. (2001a). Music, cognition, culture and evolution. Annals of the New York Academy of Sciences, 930, 28-42.

Cross, I. (2001b). Music, mind and evolution. Psychology of Music, 29(1), 95-102.

Cross, I. (2007). Music and cognitive evolution. In R. Dunbar & L. Barrett (Eds.) Oxford Handbook of (pp.649-667), Oxford, Oxford University Press.

Cross, I., Zubrow, E. & Cowan, F. (2002) Musical behaviours and the archaeological record: a preliminary study. In J. Mathieu (Ed.), Experimental Archaeology. British Archaeological Reports International Series 1035, 25-34.

Csordas, T. J. (1990). Embodiment as a paradigm for anthropology. Ethos,18(1), 5-47. doi:10.1525/eth.1990.18.1.02a00010

Curd, P. (2016). Presocratic philosophy. In Zalta, E. N. (Ed)., The Stanford Encyclopedia of Philosophy. Retrieved from https://plato.stanford.edu/archives/win2016/entries/ presocratics/

Curd, P., & McKirahan, R. D. (2011). A Presocratics Reader: Selected Fragments and Testimonia. Indianapolis: Hackett Pub. Co.

Dalley, S. (2008). The epic of creation. In Myths from Mesopotamia: Creation, the flood, Gilgamesh, and Others. (pp.228-277). Oxford: Oxford University Press. 278

D’Andrade, R. (2001). A Cognitivist’s View of the Units Debate in Cultural Anthropology. Cross-Cultural Research,35(2), 242-257. doi:10.1177/1069397101 03500208

Danielson, D. R. (2000). The Book of the Cosmos. Cambridge, MA: Perseus Publishing.

Darwin, C. (1871). The Descent of Man, and Selection in Relation to Sex. Princeton, N.J: Princeton University Press.

De Souza. J. (2017). Music at Hand: Instruments, Bodies, and Cognition. Oxford: Oxford University Press.

DeWoskin, K. J. (1982). A Song for One or Two: Music and the Concept of Art in Early China. Ann Arbor: Center for Chinese Studies, University of Michigan.

DeWoskin, K. J. (1985). Philosophers on music in early China. The World of Music, 27(1), 33-47.

Deutsch, D. Absolute Pitch. In D. Deutsch (Ed). The Psychology of Music (3rd ed.), (pp. 141-182). Cambridge, MA: Academic Press. doi:10.1016/B978-0-12-381460- 9.00005-5

Diedrich, C. G. (2015). 'Neanderthal bone flutes': simply products of Ice Age spotted hyena scavenging activities on cave bear cubs in European cave bear dens. Royal Society Open Science, 2(4), 140022. doi:10.1098/rsos.140022

Dissanayake, E. (2000). Antecedents of the temporal arts in early mother–infant interaction. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 389-410). Cambridge, MA: The MIT Press.

Dissanayake, E. (2009 a). Root, leaf, blossom, or bole: Concerning the origin and adaptive function of music. In S. Malloch, & C. Trevarthen (Eds.), Communicative : Exploring the basis of human companionship, (pp. 17-30). Oxford: Oxford University Press.

Dissanayake, E. (2009 b). Bodies swayed to music: The temporal arts as integral to ceremonial ritual. In S. Malloch, & C. Trevarthen (Eds.), Communicative musicality: Exploring the basis of human companionship, (pp. 533-544). Oxford: Oxford University Press.

Dijkerman, H. C. (2017) On feeling and reaching: Touch, action, and body space, In A. Postma, & I. J. M. van der Ham (Eds.), Neuropsychology of Space, (pp. 77-122), San Diego: Academic Press, doi:10.1016/B978-0-12-801638-1.00003-3.

279

Dufour, V., Poulin, N., Curé, C., & Sterck, E. H. (2015). Chimpanzee drumming: A spontaneous performance with characteristics of human musical drumming. Scientific Reports,5(1). doi:10.1038/srep11320

Dunn, J. C., & Smaers, J. B. (2018). Neural correlates of vocal repertoire in primates. Frontiers in Neuroscience,12. doi:10.3389/fnins.2018.00534

Edwards, E. D. (1957). 'Principles of whistling'-嘯旨 Hsiao Chih--Anonymous. Bulletin of the School of Oriental and African Studies, University of London, 20, 217-229.

Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., & Taub, E. (1995). Increased cortical representation of the fingers of the left hand in string players. Science,270(5234), 305-307. doi:10.1126/science.270.5234.305

Eliade, M. (1992). Time and eternity in Indian thought. In H.S. Prasad (Ed.), Time in Indian Philosophy: A Collection of Essays. (pp.97-124), Delhi: Prasad.

Everett, D. L. (1985). Syllable weight, sloppy phonemes, and channels in Pirahã discourse. In Proceedings of the Eleventh Annual Meeting of the Berkeley Linguistics Society (pp. 408-416).

Everett, D. L. (2005). Cultural constraints on grammar and cognition in Pirahã. Current Anthropology, 46(4), 621-646. doi: 10.1086/431525

Everett, D. L. (2016). An evaluation of universal grammar and the phonological mind. Frontiers in Psychology, 7(15). doi: 10.3389/fpsyg.2016.00015

Everett, D. L. (2017). How Language Began: The Story of Humanity's Greatest Invention. New York : Liveright. d'Errico, F., Villa, P., Llona, A. C., & Idarraga, R. R. (1998). A Middle Palaeolithic origin of music?:Using cave-bear bone accumulations to assess the Divje Babe I bone ‘flute’. Antiquity,72(275), 65-79. doi:10.1017/s0003598x00086282 d'Errico, F., Henshilwood, C., Lawson, G., Vanhaeren, M., Tillier, A.-M., Soressi, M., Bresson, F., Maureille, B., Nowell, A., Lakarra, J., Backwell, L., & Julien, M. (2003). Archaeological evidence for the emergence of language, symbolism, and music: An alternative multidisciplinary perspective. Journal of World Prehistory, 17(1), 1-70.

Fagg, B. (1956). The discovery of multiple rock gong in Nigeria. African Music, 1(3), 6-9.

280

Feld, S. (1982). The boy who became a Muni bird. In Sound and Sentiment: Birds, Weeping, Poetics, and Song in Kaluli Expression. Philadelphia: University of Pennsylvania Press, (pp. 20-43). doi:10.1215/9780822395898

Fitch, W. T. (2005). The evolution of music in comparative perspective. Annals of the New York Academy of Sciences, 1060(1) 29-49. doi: 10.1196/annals.1360.004

Fitch, W. T. (2006). On the biology and evolution of music. Music Perception, 24(1), 85- 88. doi:10.1525/mp.2006.24.1.85

Fitch, W. T. (2010). The Evolution of Language. Cambridge: Cambridge University Press.

Fitch, W. T. (2011). The biology and evolution of rhythm: unravelling a paradox. In Rebuschat, P. Rohmeier, M., Hawkins, J. A. & Cross, I. (Eds.), Language and Music as Cognitive Systems. (pp.73-95). Oxford: Oxford University Press. doi: 10.1093/acprof:oso/9780199553426.001.0001

Fitch, W. T. (2018). The Biology and Evolution of Speech: A Comparative Analysis. Annual Review of Linguistics,4(1), 255-279. doi:10.1146/annurev- linguistics-011817-045748

Fitch, W. T., Boer, B. D., Mathur, N., & Ghazanfar, A. A. (2016). Monkey vocal tracts are speech-ready. Science Advances,2(12). doi:10.1126/sciadv.1600723

Fitzroy, A., Lobdell, L., Norman, S., Bolognese, L., Patel, A., & Breen, M. (2018, July). Hoses do not spontaneously engage in tempo-flexible synchronization to a musical beat. Poster session presented at 15th International Conference on Music Perception and Cognition 10th triennial conference of the European Society for the Cognitive Sciences of Music. Montréal, Canada: Concordia University

Fraisse, P. (1963). The Psychology of Time. New York: Harper & Row.

Fraisse, P. (1982). Rhythm and tempo. In D. Deutsch (Ed.), The Psychology of Music, (pp. 149–180). New York: Academic Press.

Fraisse, P. (1984). Perception and estimation of time. Annual Review of Psychology, 35(1), 1-36

Francesca, R. S. L. (2014). Is music an adaptation or a technology?: Ethnomusicological perspectives from the analysis of chinese Shuochang. Ethnomusicology Forum, 23(1), 3-26. doi: 10.1080/17411912.2013.875786

281

Frayer, D. W. & Nicolay, C. (2000). Fossil evidence for the origins of speech sounds. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp.217-34). Cambridge, Mass: MIT Press.

Freeman, K. (1953). The pre-Socratic Philosophers. Oxford: B. Blackwell.

Friedman, W. J. (2000). Time in psychology. In P. J. N. Baert (Ed.), Time in Contemporary Intellectual Thought (pp. 295-314). Amsterdam: Elsevier.

Fujisaki, W. & Nishida, N. (2009). Audio–tactile superiority over visuo–tactile and audio– visual combinations in the temporal resolution of synchrony perception. Experimental Brain Research, 198, 245–259. doi:10.1007/s00221-009-1870-x

Fung, Y. (2010). On the very idea of correlative thinking. Philosophy Compass, 5(4), 296- 306.

Gabrielsson, A. (2011). Strong Experiences with Music. Oxford: Oxford University Press.

Gallagher, S. (1986). Body image and body schema: A conceptual clarification, Journal of Mind and Behavior, 7(4), 541–554.

Gallagher, S. (2005). Introduction In How the Body Shapes the Mind. (pp.1-21) Oxford: Clarendon Press. doi: 10.1093/0199271941.003.0001

Gaser, C., & Schlaug, G. (2003). Gray matter differences between musicians and nonmusicians. Annals of the New York Academy of Sciences,999(1), 514-517. doi:10.1196/annals.1284.062

Geissmann, T. (2000). Gibbon songs and human music from an evolutionary perspective. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 103-124). Cambridge, Mass: The MIT Press.

Gescheider, G. A., Wright, J. H., & Verrillo, R. T. (2009). Information-processing channels in the tactile sensory system: A psychophysical and physiological analysis. Psychology Press.

Gibson, J. J. (1962). Observations on active touch. Psychological Review,69(6), 477-491. doi:10.1037/h0046962

Gibson, J. J. (1966). The Senses Considered as Perceptual Systems. Boston, MA: Houghton Mifflin Company.

282

Graham, D. W. (2008). Heraclitus: Flux, order, and knowledge. In Curd, P., & Graham, D. W. (Ed), The Oxford Handbook of Presocratic Philosophy. (pp. 169-188), Oxford: Oxford University Press.

Graham, D. W. (2010). The Texts of Early Greek Philosophy. Cambridge: Cambridge University Press.

Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in motor areas of the brain. Journal of Cognitive Neuroscience,19(5), 893-906. doi:10.1162/jocn.2007.19.5. 893

Grahn, J. A., & McAuley, J. D. (2009). Neural bases of individual differences in beat in beat perception. NeuroImage,47(4), 1894-1903. doi:10.1016/j.neuroimage.2009.04. 039

Graydon, M. M., Linkenauger, S.A., Teachman, B. A., & Proffitt, D.R. (2012). Scared stiff: The influence of anxiety on the perception of action capabilities, Cognition and Emotion, 26(7), 1301-1315, doi: 10.1080/02699931.2012.667391

Griffith, R. T. H. (1896). The Rig Veda. Retrieved from http://www.sacred-texts.com/ hin/rigveda/index.htm

Grondin, S. (2010). Timing and time perception: A review of recent behavioral and neuroscience findings and theoretical directions. Attention, Perception, & Psychophysics, 72(3), 561-582. doi: 10.3758/APP.72.3.561

Gunji, A., Koyama, S., Ishii, R., Levy, D., Okamoto, H., Kakigi, R., & Pantev, C. (2003). Magnetoencephalographic study of the cortical activity elicited by human voice. Neuroscience Letters,348(1), 13-16. doi:10.1016/s0304-3940(03)00640-2

Hagel, S. (2009). The evolution of ancient Greek music notation. In Ancient Greek music: A New Technical History. (pp. 1-51) Cambridge: Cambridge University Press.

Hale, T. (1994). Griottes: Female voices from west Africa. Research in African Literatures, 25(3), 71-91.

Harbsmeier, C. (1995). Some notions of time and of history in China and in the West: with a digression on the anthology of writing. In Huang, J., & Zürcher, E. (Ed), Time and Space in Chinese Culture. (pp. 49-71), Leiden: E.J. Brill

Harwood, D. L. (1976). Universals in music: a perspective from cognitive psychology. Ethnomusicology, 20(3), 521-533.

283

Hattori, Y., Tomonaga, M., & Matsuzawa, T. (2013). Spontaneous synchronized tapping to an auditory rhythm in a chimpanzee. Scientific Reports, 3, 1566. doi: 10.1038/srep01566

Hauser, M. D. (2000). The sound and the fury: Primate vocalization as reflections of emotion and thought. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 77–102). Cambridge, Mass: The MIT Press.

Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: what is it, who has it, and how did it evolve?. Science, 298(5598), 1569-1579.

Head, H., & Holmes, G. (1911 -1912). Sensory disturbances from cerebral lesions. Brain, 34, 102-254. doi: 10.1093/brain/34.2-3.102

Hediger, H. (1968). The Psychology and Behaviour of Animals in Zoos and Circuses. New York: Dover Publications. (Original work published 1955)

Heidel, A. (1951). The Babylonian Genesis. Chicago: University of Chicago Press.

Herholz, S. C., & Zatorre, R. J. (2012). Musical training as a framework for brain plasticity: Behavior, function, and structure. Neuron, 76, 486–502. doi: 10.1016/j.neuron. 2012.10.011

Hesiod, & Johnson, K. (2017). Theogony and Works and days. Evanston, IL: Northwestern University Press

Hirsh, I. J. (1959). Auditory perception of temporal order. The Journal of the Acoustical Society of America, 31(6), 759-767. doi: 10.1121/1.1907782

Hirsh, I. J., & Sherrick, C. E. (1961). Perceived order in different sense modalities. Journal of Experimental Psychology, 62(5), 423-432.

Hockett, C. F. (1960a). Logical considerations in the study of animal communication. In W. E. Lanyon & W. N. Tavolga (Eds.), Animal Sounds and Communication. (pp. 392-430) Washington, D.C.: American Institute of Biological Sciences.

Hockett, C. F. (1960b) The origin of speech, Scientific American, 203, 88-111.

Hockett, C. F. & Altmann, S. A. (1968). A note on design features. In Sebeok, T. A. (Ed.) Animal Communication: Techniques of Study and Results of Research (pp. 61-72). Bloomington: Indiana University Press.

284

Holmes, N. P., & Spence, C. (2004). The body schema and multisensory representation(s) of peripersonal space. Cognitive Processing: International Quarterly of Cognitive Science, 5(2), 94-105. doi:10.1007/s10339-004-0013-3

Honing, H., Bouwer, F. L., Prado, L., & Merchant, H. (2018). Rhesus monkeys (Macaca mulatta) sense isochrony in rhythm, but not the beat: additional support for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience,12. doi:10.3389/fnins.2018.00475

Hund-Georgiadis, M., & Von Cramon, D. Y. (1999). Motor-learning-related changes in piano players and non-musicians revealed by functional magnetic-resonance signals. Experimental Brain Research,125(4), 417-425. doi:10.1007/s0022100 50698

Hung, T. H. (2011). One music? Two musics? How many musics? Cognitive ethnomusicological, behavioral, and fMRI study on vocal and instrumental rhythm processing (Doctoral dissertation). The Ohio State University, Columbus OH. Retrieved from http://rave.ohiolink.edu/etdc/view?acc_num= osu1308317619

Hyvärinen, J., & Poranen, A. (1974). Function of the parietal associative area 7 as revealed from cellular discharges in alert monkeys. Brain, 97(4), 673-692.

Iriki, A., Tanaka, M., & Iwamura, Y. (1996). Coding of modified body schema during tool use by macaque postcentral neurones. Neuroreport, 7(14), 2325-2330.

Ivry, R. B., Spencer, R. M., Zelaznik, H. N., & Diedrichsen, J. (2002). The cerebellum and event timing. Annals of the New York Academy of Sciences,178(1), 302-317. doi:10.1111/j.1749- 6632.2002.tb07576.x

Jackson, G., Gartlan, J. S., & Posnansky, M. (1965). 31. Rock gongs and associated rock paintings on Lolui island, lake Victoria, Uganda: A preliminary Note. Man, 65, 38- 40.

Jacobsen, T. (1957). Enuma elish-“The Babylonian genesis”. In Munitz, M. K(Ed.), Theories of the Universe: From Babylonian Myth to Modern Science. (pp.8-20), Glencoe, Ill: Free Press.

Jacobsen, T. (1968). The battle between Marduk and Tiamat. Journal of the American Oriental Society, 88(1), 104-108.

James, W. (1950). The perception of time. In The Principles of Psychology. vol.1. (pp.605- 642). New York: Dover Publications. (Original work published 1890)

285

James, W. (1950). Memory. In The Principles of Psychology. vol.1 (pp.643-689). New York: Dover Publications. (Original work published 1890)

Jarvis, E. D. (2007). Neural systems for vocal learning in birds and humans: a synopsis. Journal of Ornithology, 148(1), 35-44. doi:10.1007/s10336-007-0243-0.

Jessup, L. (1983). The instrument. In The Mandinka Balafon: An Introduction with Notation for Teaching (pp.19-52). La Mesa, CA: Xylo Publications.

Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83(5), 323–355. doi:10.1037/ 0033– 295x.83.5.323

Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96(3), 459–491. doi:10.1037/0033-295x.96.3.459

Kak, S. C. (1995). The astronomy of the age of geometric Aaltars. Quarterly Journal Royal Astronomical Society, 36(4), 385-396.

Kak, S. C. (2009). Time, space and structure in ancient India. Retrieved from arXiv:0903. 3252[physics.hist-ph]

Keller, P. E., König, R., & Novembre, G. (2017). Simultaneous cooperation and competition in the evolution of musical behavior: Sex-related modulations of the singer's formant in human chorusing. Frontiers in Psychology, 8(1559). doi: 10.3389/fpsyg.2017.01559

Keller, P. E., Novembre, G., & Hove, M. J. (2014). Rhythm in joint action: psychological and neurophysiological mechanisms for real-time interpersonal coordination. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1658), 20130394. doi:10. 1098/rstb.2013.0394

Kitagawa, N., & Spence, C. (2006). Audiotactile multisensory interactions in information processing. Japanese Psychological Research, 48, 158–173.

Kitagawa, N., Zampini, M., & Spence, C. (2005). Audiotactile interactions in near and far space. Experimental Brain Research, 166, 528–537.

Klyn, N. A., Will, U., Cheong, Y. J., & Allen, E. T. (2015). Differential short-term memorisation for vocal and instrumental rhythms. Memory, 24(6). 766-791.doi: 10.1080/09658211.2015.1050400

286

Kuchenbuch, A., Paraskevopoulos, E., Herholz, S. C., & Pantev, C. (2014). Audio-Tactile Integration and the Influence of Musical Training. PLoS ONE,9(1). doi:10.1371/journal.pone.0085743

Kung, H.-N. (2017). Cultural Influence on the Perception and Cognition of Musical Pulse and Meter. (Unpublished doctoral dissertation). The Ohio Sate University, Columbus, OH. Retrieved from https://etd.ohiolink.edu/

Kunje, D. & Turk, I. New perspectives on the beginning of music: Archeological and musicologial analysis of a middle Paleolithc bone “flute”. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 235-268). Cambridge, Mass: MIT Press.

Lameira, A. R., Hardus, M. E., & Wich, S. A. (2011). Orangutan instrumental gesture-calls: Reconciling acoustic and gestural speech evolution models. Evolutionary Biology, 39(3), 415-418. doi:10.1007/s11692-011-9151-6

Landry, S. P., & Champoux, F. (2017). Musicians react faster and are better multisensory integrators. Brain Cognition, 111, 156–162.

Landry, S. P, & Champoux, F. (2017). Long-term musical training alters tactile temporal- order judgment. Multisensory Research, 31(5). doi:10.1163/22134808-00002575

Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical plasticity induced by short-term unimodal and multimodal musical training. Journal of Neuroscience, 28(39), 9632-9639. doi:10.1523/jneurosci.2254-08.2008

Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time- varying events. Psychological Review, 106(1), 119–159. doi:10.1037/ 0033295x. 106.1.119

Laroche, J., Berardi, A. M., & Brangier, E. (2014). Embodiment of intersubjective time: Relational dynamics as attractors in the temporal coordination of interpersonal behaviors and experiences. Frontiers in Psychology,5. doi:10.3389/fpsyg. 2014.01180

Lawson, F. (2014). Is music an adaptation or a technology? Ethnomusicological perspectives from the analysis of Chinese Shuochang. Ethnomusicology Forum,23(1), 3-26. doi:10.1080/17411912.2013.875786

Leclère C, Viaux S, Avril M, Achard C, Chetouani M, Missonnier, S. & Cohen, D. (2014). Why synchrony matters during mother-child Interactions: A systematic Review. PLoS ONE 9 (12): e113571. doi:10.1371/ journal.pone.0113571

287

Lee, Y. S., Peelle, J. E., Kraemer, D., Lloyd, S., & Granger, R. (2015). Multivariate sensitivity to voice during auditory categorization. Journal of Neurophysiology, 114(3), 1819-1826. Doi:10.1152/jn.00407.2014

Legge, J. (1882). The Yi Ching. Retrieved from https://ctext.org/book-of-changes/yi-jing Leong, V., Byrne, E., Clackson, K., Georgieva, S., Lam, S., & Wass, S. (2017). Speaker gaze increases information coupling between infant and adult brains. Proceedings of the National Academy of Sciences, 114(50). doi:10.1073/pnas.1702493114

Levy D. A., Granot R., Bentin S. (2001). Processing specificity for human voice stimuli: electrophysiological evidence. Neuroreport 12, 2653–2657 doi:10.1097/00001756 200108280-00013

Levy, D. A., Granot, R., & Bentin, S. (2003). Neural sensitivity to human voices: ERP evidence of task and attentional influences. Psychophysiology, 40(2), 291-305. doi:10.1111/1469-8986.00031

Lewis, M. E. (2006). The Construction of Space in Early China. Albany, N.Y.: State University of New York Press.

Liang, D. M. (1985). Music of the Billion: An Introduction to Chinese Musical Culture. New York: Heinrichshofen.

Lin, L. (1995). The notions of time and position in the Book of Change and their development. In Huang, J., & Zürcher, E. (Ed), Time and Space in Chinese Culture. (pp. 89-113), Leiden: E.J. Brill

List, G. (1971). On the non-universality of musical perspectives. Ethnomusicology, 15(3), 344-401.

List, G. (1984). Concerning the concept of the universal and music. The World of Music, 26(2), 40-49.

Lo, S., & Andrews, S. (2015). To transform or not to transform: Using generalized linear mixed models to analyse reaction time data. Frontiers in Psychology,6. doi:10.3389/fpsyg.2015.01171

Longo, M. R., Azañón, E., & Haggard, P. (2010). More than skin deep: Body representation beyond primary somatosensory cortex. Neuropsychologia, 48(3), 655-668. doi: 10.1016/j.neuropsychologia.2009.08.022

Longo, M. R., Mancini, F., & Haggard, P. (2015). Implicit body representations and tactile spatial remapping. Acta Psychologica, 160, 77-87. doi: 10.1016/j.actpsy.2015. 07.002 288

Lotze, M. (2013). Kinesthetic imagery of musical performance. Frontiers in Human Neuroscience,7. doi:10.3389/fnhum.2013.00280

Loui, P., Patterson, S., Sachs, M. E., Leung, Y., Zeng, T., & Przysinda, E. (2017). White Matter Correlates of Musical Anhedonia: Implications for Evolution of Music. Frontiers in Psychology,8. doi:10.3389/fpsyg.2017.01664

Lourenco, S. F., Longo, M. R., & Pathman, T. (2011). Near space and its relation to claustrophobic fear. Cognition, 119(3), 448-453. doi: 10.1016/j.cognition.2011. 02.009

Love, S. A., Petrini, K., Pernet, C. R., Latinus, M., & Pollick, F. E. (2018). Overlapping but divergent neural correlates underpinning audiovisual synchrony and temporal order judgments. Frontiers in Human Neuroscience,12. doi:10.3389/fnhum. 2018.00274

MacDougall, H. G., & Moore, S. T. (2005). Marching to the beat of the same drummer: the spontaneous tempo of human locomotion. Journal of Applied Physiology, 99(3), 1164-1173. doi:10.1152/japplphysiol.00138.2005.

Maddieson, I. (2013). Tone. In M. Dryer, S. & M. Haspelmath, (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Retrieved from http://wals.info/chapter/13.

Maravita, A. (2006). From “body in the brain” to “body in space”: Sensory and intentional components of body representation. In G. Knoblich, I.M. Rhorton, M.Grosjena, & M. Shiffrar (Eds.), Human Body Perception from the Inside Out (pp.65-88). Oxford:Oxford University Press.

Maravita, A., & Iriki, A. (2004). Tools for the body (schema). Trends in Cognitive Sciences, 8(2), 79-86. doi:10.1016/j.tics.2003.12.008

Maravita, A., Spence, C., & Driver, J. (2003). Multisensory integration and the body schema: close to hand and within reach. Current Biology, 13(13), 531-539. doi: 10.1016/S0960-9822(03)00449-4

Markowitz, W., Smart, J. J. C, & Arnold Joseph Toynbee, A. J. (n.d.) Time in Encyclopædia Britannica, Retrieved from https://www.britannica.com/ science/time

Marler, P., & Tamura, M. (1964). Culturally transmitted patterns of vocal behavior in sparrows. Science,146(3650), 1483-1486. doi:10.1126/science.146.3650.1483 289

Marshall, A. (2018, January 25). Can you tell a lullaby from a ?:Find out now. New York Times. Retrieved March 23, 2019, from https://www.nytimes.com/ interactive/2018/01/25/arts/music/history-of-song.html

Maul, S. M, (2008) Walking backwards into the future: The conception of time in the Ancient near East. In T. Miller (Ed.), Given World and Time. Temporalities in Context. (pp.15-24), Budapest: Central European University Press.

McAllester, D. P. (1971). Some thoughts on "universals" in , Ethnomusicology, 15(3), 379-380. doi:10.2307/850637

McKirahan, R. D. (2008) Signs and arguments in Parmenides B8. In Curd, P., & Graham, D. W. (Eds), The Oxford Handbook of Presocratic philosophy. (pp. 189-292), Oxford: Oxford University Press.

McNeill, W. H. (1995). Keeping Together in Time: Dance and Drill in Human History. Cambridge, Mass: Harvard University Press.

Mehr, S. M., Singh, M, York, H., Glowacki, L, & Krasnow, M. M. (2018). Form and function in human song. Current Biology, 28, 1-13. doi: 10.1016/j.cub.2017.12.042

Merchant, H., & Honing, H. (2014). Are non-human primates capable of rhythmic entrainment? Evidence for the gradual audiomotor evolution hypothesis. Frontiers in Neuroscience, 7(274). doi: 10.3389/fnins.2013.00274

Merker, B. (2000). Synchronous chorusing and human origins. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 315–327). Cambridge, Mass: The MIT Press.

Merriam, A. P. (1964). The Anthropology of Music. Evanston, IL: Northwestern Univ. Press.

Meyer, J. (2004). of human whistled languages: An alternative approach to the cognitive processes of language. Anais Da Academia Brasileira De Ciências,76(2), 406-412. doi:10.1590/s0001-37652004000200033

Meyer, J., & Busnel, R.-G. (2015). Whistled Languages: A Worldwide Inquiry on Human Whistled Speech. Berlin: Springer. doi: 10.1007/978-3-662-45837-2

Miller, G. (2000). Evolution of human music through sexual selection. In N. L. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (pp. 329–60). Cambridge, Mass: The MIT Press.

290

Miller, J. O. (1982). Divided attention: Evidence for coactivation with redundant signals. Cognitive Psychology, 14, 247–279.

Mithen, S. J. (2006). The Singing Neanderthals: The Origins of Music, Language, Mind, and Body. Cambridge, MA: Harvard University Press

Mondragon, M., & Lopez, L. (2012). Space and time as containers of the "Physical Material World" with some conceptual and epistemological consequences in Modern Physics. Retrieved April 4, 2019, from arXiv:0711.4050v2.

Montagu, J. (2017) How music and instruments began: A brief overview of the origin and entire development of music, from its earliest stages. Frontiers in Sociology, 2(8), 1-12. doi: 10.3389/fsoc.2017.00008

Morgan, T. J. H., Uomini, N. T., Rendell, L. E., Chouinard-Thuly, L., Street, S. E., Lewis, H. M., Cross, C. P., Evans, C. , Kearney, R., de la Torre, I., Whiten, A., & Laland, K. N. (2015). Experimental evidence for the co-evolution of hominin tool-making teaching and language. Nature Communications, 6(6029). doi:10.1038/ncomms 7029

Morley, I. (2013). The Prehistory of Music: Human Evolution, Archaeology, and the Origins of Musicality. Oxford: Oxford University Press.

Mortillaro, M., Mehu, M., & Scherer, K. R. (2013). The evolutionary origin of multimodal synchronization and emotional expression. In Altenmüller, E., Schmidt, S., & Zimmermann, E. (Eds.), Evolution of Emotional Communication: From Sounds in Nonhuman Mammals to Speech and Music in Man (pp. 3-25). Oxford: Oxford University Press.

Moscatelli, A., Mezzetti, M., & Lacquaniti, F. (2012). Modeling psychophysical data at the population-level:The generalized linear mixed model. Journal of Vision,12(11). doi:10.1167/12.11.26

Murray, M. M., Molholm, S., Michel, C. M., Heslenfeld, D. J., Ritter, W., Javitt, D. C., Schroeder, C. E., & Foxe, J. J. (2005). Grabbing your ear: Auditory–somatosensory multisensory interactions in early sensory cortices are not constrained by stimulus alignment. Cerebral Cortex, 15(7), 963 - 974.doi:10.1093/cercor/bhh197

Nakajima, Y., Hoopen, G. T., & Van, D. W. R. (1991). A new illusion of time perception. Music Perception, 8(4), 431-448. doi: 10.2307/40285521

Nakata, T., & Trehub, S. E. (2004). Infants’ responsiveness to maternal speech and singing. Infant Behavior and Development,27(4), 455-464. doi:10.1016/j.infbeh. 2004.03.002 291

Needham, J. (1966). Time and knowledge in China and the West. In J. T. Fraser (Ed.), The Voices of Time: A Cooperative Survey of Mans Views of Time as Expressed by the Sciences and by the Humanities. (pp. 92-135). New York: G. Braziller.

Needham, J., & Wang, L. (1962). Science and Civilisation in China:Physics and Physical Technology (Vol. 4, part 1). Cambridge: Cambridge University Press.

Nettl, B. (2000). An ethnomusicologist contemplates universals in musical sound and musical culture. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 463-472). Cambridge, Mass: MIT Press.

Neubauer, E. & Doubleday, V. (2001). Islamic . In Grove Music Online. Retrieved from http://www.oxfordmusiconline.com.proxy.lib.ohio-state.edu/ subscriber/article/grove/music/52787.

Nevins, A., Pesetsky, D. M., & Rodrigues, C. (2009). Pirahã exceptionality: A reassessment. Language, 85(2), 355-404. doi: 10.1353/lan.0.0107

The New American Bible. (2005) Huntington, IN: Our Sunday Visitor.

Noad, M. J., Cato, D. H., Bryden, M. M., Jenner, M.-N., & Jenner, K. C. S. (2000). Cultural revolution in whale songs. Nature, 408, 6812. van Noorden. L., & Moelants, D. (1999). Resonance in the perception of musical pulse. Journal of New Music Research, 28(1), 43-66. doi: 10.1076/jnmr. 28.1.43.3122

Nottebohm, F. (2005). The neural basis of birdsong. PLoS Biology,3(5). doi:10.1371/journal.pbio.0030164

Obayashi, S., Suhara, T., Kawabe, K., Okauchi, T., Maeda, J., Akine, Y. Onoe, H., & Iriki, A. (2001). Functional brain mapping of monkey tool use. NeuroImage,14(4), 853- 861. doi:10.1006/nimg.2001.0878

Occelli, V., Spence, C., & Zampini, M. (2011). Audiotactile interactions in temporal perception. Psychonomic Bulletin Review 18, 429–454. doi:10.3758/s13423-011- 0070-4

Occelli, V., O’Brien, J. H., Spence, C., & Zampini, M. (2010). Assessing the audiotactile Colavita effect in near and rear space. Experimental Brain Research, 203(3), 517- 532. doi:10.1007/s00221-010-2255-x

292

O’Keefe, J., & Nadel, L. (1978). Introduction. In The Hippocampus as a Cognitive Map. (pp.1-61). Oxford: Clarendon Press.

O’Neill, G. (2014). Humming, whistling, singing, and yelling in pirahã context and channels of communication in FDG. Pragmatics, 24(2), 349-375. doi: 10.1075/prag.24.2.08nei Ong, W. J. (1982). Orality and Literacy: The Technologizing of the Word. London: Methuen.

Ornstein, R. E. (1969). On the Experience of Time. Harmondsworth: Penguin.

Paillard, J. (1987). Cognitive versus sensorimotor encoding of spatial information. In P. Ellen & C. Blanc-Thinus(Eds.), Cognitive Processing and Spatial Orientation in Animal and Man (pp. 43-77). Dordrecht: Martinus Nijhoff.

Paillard, J. (1991). Motor and representational framing of space. In J. Paillard (Ed.), Brain and Space. (pp. 163-182).

Paillard, J.(1999), Body schema and body image: A double dissociation in deafferented patients. in G. N. Gantchev, S. Mori, &J.Massion (Eds.), Motor Control, Today and Tomorrow, (pp.197–214).Sofia: Academic Publishing House.

Pankenier, R. M. (2004). Temporality and the fabric of space-time in early Chinese thought. In R. M. Rosen (ed). Time and Temporality in the Ancient World. (pp. 129-146). Philadelphia: University of Pennsylvania Museum of Archaeology and Anthropology.

Panksepp, J. (2009). The emotional antecedents to the evolution of music and language. Musicae Scientiae, 13, 229-259.

Pantev, C., Engelien, A., Candia, V., & Elbert, T. (2001). Representational cortex in musicians. Plastic alterations in response to musical practice. Annals of the New York Academy of Sciences. 930, 300 - 314.

Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L.E., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature 392 (6678), 811- 814.

Parker, A. R. (2006). Evolving the narrow language faculty: was recursion the pivotal step? In Cangelosi, A., Smith, A., & Smith K. (Eds.), Proceedings of the Sixth International Conference on the Evolution of Language, (pp. 239–246). London: World Scientific Publishing.

293

Patel, A. D. (2006). Musical rhythm, linguistic rhythm, and human evolution. Music Perception, 24(1), 99-104. doi: 10.1525/mp.2006.24.1.99

Patel, A. D. (2008). Music, Language, and the Brain. Oxford: Oxford University Press.

Patel, A. D. (2010). Music, biological evolution, and the brain. In C. Levander & C. Henry (Eds.), Emerging Disciplines: Shaping New Fields of Scholarly Inquiry in and beyond the Humanities (pp. 91–144). Houston, TX: Rice University Press.

Patel, A. D. & Iversen, J. R. (2014). The evolutionary neuroscience of musical beat perception: the Action Simulation for Auditory Prediction (ASAP) hypothesis. Frontiers in Systems Neuroscience, 8(57). doi: 10.3389/fnsys.2014. 00057

Patel, A. D., Iversen, J. R., Bregman, M. R., & Schulz, I. (2009). Experimental evidence for synchronization to a musical beat in a nonhuman animal. Current Biology, 19(10), 827-830.

Payne, K. (2000). The progressively changing songs of humpback whales: A window on the creative process in a wild anima. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 135-150). Cambridge, Mass: MIT Press.

Pearson, L. (2013). Gesture and the sonic event in Karnatak music. Empirical Musicology Review, 8(1), 2-14. di Pellegrino, G., & Làdavas, E. (2015). Peripersonal space in the brain. Neuropsychologia, 66, 126-133. doi: 10.1016/j.neuropsychologia.2014.11. 011 di Pellegrino, G., Làdavas, E. & Farnè, A. (1997). Seeing where your hands are. Nature, 388(6644). 730.

Peters, H. H. (2001). Tool use to modify calls by wild Orang-Utans. Folia Primatologica,72(4), 242-244. doi:10.1159/000049943

Pickering, A. (2017). The ontological turn: Taking different worlds seriously. Social Analysis,61(2). 134-150. doi:10.3167/sa.2017.610209

Pinker, S. (1997). The meaning of life. In How the Mind Works (pp. 521-565). New York: Norton.

Pike, G., & Edgar, G. (2005). Perception. In N. Braisby, & A. Gellatly, (Eds.), Cognitive Psychology (pp. 71-112). Oxford: Oxford University Press.

294

Phillips-Silver, J., & Trainor, L. J. (2005). Feeling the beat: movement influences infant rhythm perception. Science, 308(5727), 1430.

Powers, H. S., & Widdess, R. (2001). III. Theory and practice of classical music: India Subcontinent of. In Grove Music Online. doi:10.1093/gmo/9781561592630. article.43272

Prasad, H. S. (1992). The problem of time in Indian philosophy: An introduction. In H.S. Prasad (Ed.), Time in Indian Philosophy: A Collection of Essays. (pp.1-20), Delhi: Sri Satguru.

Press, C., Berlot, E., Bird, G., Ivry, R., & Cook, R. (2014). Moving time: The influence of action on duration perception. Journal of Experimental Psychology: General, 143(5), 1787-1793. doi: 10.1037/a0037650

Pöppel, E. (2009). Pre-semantically defined temporal windows for cognitive processing. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 364(1525), 1887-1896. doi: 10.1098/rstb.2009.0015

Purves, A. (2004). Topographies of time in Hesiod. In R. M. Rosen (ed). Time and Temporality in the Ancient World. (pp. 147-68). Philadelphia: University of Pennsylvania Museum of Archaeology and Anthropology.

Qing, T. (2016). The ancient Qin 琴, musical instrument of cultured Chinese gentlemen. (S. Davis, Trans.) Journal of Chinese Literature and Culture, 3(1), 108-136.

Raab, D. H. (1962). Statistical facilitation of simple reaction-times. The New York Academy of Sciences,24(5), 574-590. doi:10.1111/j.2164-0947.1962.tb01433.x

R Core Team (2018) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org

Redmond, G. (2017). The I ching (Book of Changes): A Critical Translation of the Ancient Text. New York: Sydney Bloomsbury Academic.

Remedios, R., Logothetis, N. K., & Kayser, C. (2009). An auditory region in the primate insular cortex responding preferentially to vocal communication sounds. Journal of Neuroscience,29(4), 1034-1045. doi:10.1523/jneurosci.4089-08.2009

Repp, B. H. (2005). Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin & Review, 12(6), 969-992. doi: 10.3758/BF 03206433

295

Repp, B. H., & Su, Y. H. (2013). Sensorimotor synchronization: a review of recent research (2006-2012). Psychonomic Bulletin & Review, 20(3), 403-452. doi: 10.3758/ s13423-012-0371-2

Rizzolatti, G., Fadiga, L. Fogassi, L., & Gallese, L. (1997). The space around us. Science, 277(5323), 190-191. http://www.jstor.org/stable/2893558

Rizzolatti, G., Scandolara, C., Matelli, M., & Gentilucci, M. (1981). Afferent properties of periarcuate neurons in macaque monkeys. II. Visual responses. Behavioural Brain Research, 2(2), 147-163. doi: 10.1016/0166-4328(81)90053-X

Robson, E. (2004). Scholarly conceptions and quantifications of time in Assyria and Babylonia, C. 750-250BCE. In R. M. Rosen (ed). Time and Temporality in the Ancient World. (pp. 45-90). Philadelphia: University of Pennsylvania Museum of Archaeology and Anthropology.

Roeckelein, J. E. (2000). The Concept of Time in Psychology: A Resource Book and Annotated Bibliography. Westport, Conn: Greenwood Press.

Roeckelein, J. E. (2008). History of concepts and accounts of time and early time perception research. In S. Grondin (Ed.), Psychology of time. (pp. 1-50). Bingley, UK: Emerald.

Rosen, R. M. (2004). Ancient time across time. In R. M. Rosen (ed). Time and Temporality in the Ancient World. (pp. 1-10). Philadelphia: University of Pennsylvania Museum of Archaeology and Anthropology.

Rousseau, J.-J., & Scott, J. T. (1998). Essay on the . Essay on the Origin of Languages and Writings Related to Music (pp. 289-332). Hanover, N. H.: University Press of New England. (Original work published 1761)

Rowell, L. E. (1992). Music and Musical Thought in Early India. Chicago: University of Chicago Press.

Rushton, J. (2002). Don Giovanni(ii). In Grove Music Online. doi: 10.1093/gmo/9781561 592630.001.0001/omo-9781561592630-e-5000901351

Sacks, O. (2007). Brainworms, sticky music, and catchy tunes. In Musicophilia: Tales of Music and the Brain. (pp. 48-59). New York: Alfred A. Knopf.

Śārṅgadeva, Shringy, R. K., & Sharma, P. (1996). Saṅgīta-ratnākara of Śārṅgadeva: Sanskrit Text and English Translation with Comments and Notes. New Delhi: Munshiram Manaharlal.

296

Schäfer, T., Fachner, J., & Smukalla, M. (2013). Changes in the representation of space and time while listening to music. Frontiers in Psychology, 4. doi: 10.3389/fpsyg.2013.00508

Schimmel, O., & Kohlrausch, A. (2008). On the influence of interaural differences on temporal perception of noise bursts of different durations. The Journal of the Acoustical Society of America, 123(2), 986-997. doi: 10.1121/1.2821979

Schlaug, G. (2015). Musicians and music making as a model for the study of brain plasticity. In Altenmüller, E., Finger, S. & Boller, F.(Eds.), (pp. 37-55). Music, neurology and neuroscience: evolution, the musical brain, medical conditions and therapies. Progress in brain research, 217. doi:10.1016/bs.pbr.2014.11.020

Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white-matter tracts of patients with chronic Broca's aphasia undergoing intense intonation-based speech therapy. Annals of the New York Academy of Sciences, 1169 (1), 385-394. doi: 10.1111/j.1749-6632.2009.04587.x

Schlaug, G., Jäncke, L., Huang, Y., & Steinmetz, H. (1995). In vivo evidence of structural brain asymmetry in musicians. Science, 267(5198), 699–701.

Schneider, A. (1991). Psychological theory and comparative musicology. In Nettl, B. & Bohlman, P. V. (Eds). Comparative Musicology and Anthropology of Music. Essays on the History of Ethnomusicology. (pp. 293-317). Chicago: The University of Chicago Press.

Schneider, P., Scherg, M., Dosch, H. G., Specht, H. J., Gutschalk, A., & Rupp, A. (2002). Morphology of Heschls gyrus reflects enhanced activation in the auditory cortex of musicians. Nature Neuroscience,5(7), 688-694. doi:10.1038/nn871 van der Schyff, D. (2014). Music, culture and the evolution of the human mind:Looking beyond dichotomies. Hellenic Journal of , and Culture, 4(1).

Sebanz, N., Bekkering, H., & Knoblich, G. (2006). Joint action: Bodies and minds moving together. Trends in Cognitive Sciences,10(2), 70-6. doi:10.1016/j.tics.2005.12.009 Sebeok, T. A., & In Umiker-Sebeok, D. J (1976). Introduction. In Sebeok, T. A., & Umiker-Sebeok, D. J. (Eds). Speech Surrogates: Drum and Whistle Systems (pp.xiii-xxiv). Hauge: Mouton.

Sedley, D. (1982). Two conceptions of vacuum. Phronesis, 27, 175-193.

Seeger, A. (1979). What can we learn when they sing? Vocal genres of the Suya Indians of central Brazil. Ethnomusicology 23, 373–394. 297

Seeger, C. (1971). Reflections upon a given topic: music in universal perspective. Ethnomusicology, 15 (3), 385-398. doi:10.2307/850639 Seeger, T. (Producer), Seeger, D., Seeger, P. (Producer), & Jackson, B. (Producer). (1966). Afro-American Work Songs in a Texas Prison (1966). United State: Folklore Research Films, Inc. Retrieved from http://www.folkstreams.net/film- detail.php?id=122

Serino, A., Noel, J. P., Galli, G., Canzoneri, E., Marmaroli, P., Lissek, H., & Blanke, O. (2015). Body part-centered and full body-centered peripersonal space representations. Scientific Reports, 5. 1-14. doi: 10.1038/srep18603

Siffre, M. (1964). Beyond time (H. Briffault, Trans.). New York: McGraw-Hill.

Shea, J. J. (2017). Stone Tools in Human Evolution: Behavioral Differences among Technological Primates. Cambridge, United Kingdom: Cambridge University Press.

Slater, P. J. B. (2000). Birdsong repertoires: Their origins and use. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 31-48). Cambridge, Mass: MIT Press.

Snowdon, C. T., Zimmermann, E., & Altenmuller, E. (2015). Music evolution and neuroscience. Progress in Brain Research. 217, 17–34. doi: 10.1016/bs.pbr.2014.11.019

The Song Family. (2012, August 6). Korean lullaby (자장자장). Retrieved from https://www.youtube.com/watch?v=H8TwhytQzlI.

Spence, C. (2013). Just how important is spatial coincidence to multisensory integration? Evaluating the spatial rule. Annals of the New York Academy of Sciences, 1296, 31– 49. doi: 10.1111/nyas.12121 Spencer, H. (1891). The origin and function of music. In Essays: Scientific, Political, and Speculative (Vol. 2, pp. 400-451). London: Williams & Northgate. (Original work published 1857) Sperber, D. (1996). Mental modularity and cultural diversity. In Explaining Culture: A Naturalistic Approach (pp.119-150). Oxford, UK: Blackwell.

Stein, B. E., & Stanford, T. R. (2008). Multisensory integration: Current issues from the perspective of the single neuron. Nature Reviews Neuroscience, 9, 255–267.

298

Stein, B. E., Stanford, T. R., & Rowland, B. A. (2009). The neural basis of multisensory integration in the midbrain: Its organization and maturation. Hearing Research, 258, 4-15. doi: 10.1016/j.heares.2009.03.012 Stein, B. E., Stanford, T. R., & Rowland, B. A. (2014). Development of multisensory integration from the perspective of the individual neuron. Nature Reviews Neuroscience,15(8), 520-535. doi:10.1038/nrn3742

Stern, T. (1957). Drum and whistle "languages": An analysis of speech surrogates. American Anthropologist, 59(3), 487-506.

Su, J. (2006). Whistling and its magico-religious tradition: A comparative perspective. Lingnan Journal of Chinese studies, 3, 14-44. Retrieved from http://commons.ln.edu.hk/ljcs_1999/vol3/ iss1/3

Tajadura-Jiménez, A., Kitagawa, N., Väljamäe, A., Zampini, M., Murray, M. M., & Spence, C. (2009). Auditory–somatosensory multisensory interactions are spatially modulated by stimulated body surface and acoustic spectra. Neuropsychologia, 47(1), 195-203. doi:10.1016/j.neuropsychologia.2008.07.025

Takahashi, T., Kansaku, K., Wada, M., Shibuya, S., & Kitazawa, S. (2013). Neural correlates of tactile temporal-order judgment in humans: An fMRI study. Cerebral Cortex,23(8), 1952-1964. doi:10.1093/cercor/bhs179

Teki, Grube, M., Kumar, S., & Griffiths, T. D. (2011). Distinct neural substrates of duration-based and beat-based auditory timing. Journal of Neuroscience, 31(10), 3805-12.doi:10.1523/jneurosci.5561-10.2011

Teki, S., Grube, M., & Griffiths, T. D. (2012). A unified model of time perception accounts for duration-based and beat-based timing mechanisms. Frontiers in Integrative Neuroscience, 5. doi:10.3389/fnint.2011.00090

Teneggi, C., Canzoneri, E., di Pellegrino, G., & Serino, A. (2013). Social modulation of peripersonal space boundaries. Current Biology,23(5), 406-411. Time, (n.d.). In Wikipedia. Retrieved May 22, 2017, from https://en.wikipedia.org/wiki/Time.

Tomasello, M. (1999). The Cultural Origins of Human Cognition. Cambridge, MA: Harvard University Press.

Tomasello, M. (2008). Origins of Human Communication. Cambridge, MA: MIT Press.

299

Townsend, S. W., Rasmussen, M., Clutton-Brock, T., & Manser, M. B. (2012). Flexible alarm calling in meerkats: The role of the social environment and predation urgency. Behavioral Ecology, 23(6), 1360-1364. doi: 10.1093/beheco/ars129

Tracey, A. (1970). The matepe mbira music of Rhodesia. African Music, 4(4), 37-61.

Trainor, L. J. (2015). The origins of music in auditory scene analysis and the roles of evolution and culture in musical creation. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1664). doi.:10.1098/rstb.2014.0089

Trehub, S. E. (2013). Communication, music, and language in infancy. In Arbib, M. A. (Ed.). Language, music, and the brain: A mysterious relationship (pp. 463-80). Cambridge, Mass: MIT Press

Trehub, S. E., & Trainor, L. J. (1998). Singing to infants: lullabies and play songs. Advances in Infancy Research 12, 43 - 77.

Trehub, S. E., Becker, J., & Morley, I. (2015). Cross-cultural perspectives on music and musicality. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1664). doi: 10.1098/rstb.2014.0096

Trehub, S. E., Unyk, A. M., & Trainor, L. J. (1993). Maternal singing in cross-cultural perspective. Infant Behavior and Development, 16 (3), 285-295. doi:10.1016/0163- 6383(93)80036-8

Turetzky, P. (1998). Greek thoughts before Aristotle. In Time. (pp. 5-17), London: Routledge.

Turner, V. W. (1977). The Ritual Process: Structure and Anti-structure. Ithaca, N.Y: Cornell University Press.

Ulrich, R., Miller, J., & Schröter, H. (2007). Testing the race model inequality: An algorithm and programs. Behavior Research Methods. 39, 291–302.

Unverricht, H. & Eisen, C. (2002) Serenade. In Grove Music Online. doi: 10.1093/gmo/9781561592630.article.25454

VandenBos, G. R. (2015). APA Dictionary of Psychology. Washington: American Psychological Association.

Vermeersch, S. (2012). Biography of state preceptor Wonhyo 元曉國師傳 (A. C. Muller, J. Y. Park & S.Vermeersch trans.). In A. C. Muller (Ed.). Wonhyo: Selected Works, Collected Works of Korean Buddhism, (Vol. 1, pp. 304-308) Seoul, Korea: The Jogye Order of Korean Buddhism. 300

Vesper, C., Abramova, E., Bütepage, J., Ciardo, F., Crossey, B., Effenberg, A., Hristova, D., Karlinsky, A. McEllin, L. Nijssen, S. R. R., Schumitz, L., & Wahn, B. (2017). Joint action: Mental representations, shared information and general mechanisms for coordinating with others. Frontiers in Psychology, 7. de Vignemont, F., & Iannetti, G. D. (2015). How many peripersonal spaces?. Neuropsychologia, 70(1), 327-34. doi: 10.1016/j.neuropsychologia.2014. 11.018

Wacewicz, S., & Żywiczyński, P. (2014). Language evolution: Why Hockett’s design features are a non-starter. Biosemiotics, 8(1), 29-46. doi:10.1007/s12304-014- 9203-2

Wachsmann, K. P. (1971). Universal perspectives in music. Ethnomusicology, 15(3), 381- 384.

Wan, C. Y., & Schlaug, G. (2010). Music making as a tool for promoting brain plasticity across the life span. The Neuroscientist,16(5), 566-577. doi:10.1177/1073858410 377805

Wearden, J. H., Norton, R., Martin, S., & Montford-Bebb, O. (2007). Internal clock processes and the filled-duration illusion. Journal of Experimental Psychology. Human Perception and Performance, 33, 3, 716-729. doi: 10.1037/0096- 1523.33.3.716

Whaling, C. (2000). What’s behind a song? The neural basis of song learning in birds. In N. L. Wallin, B. Merker, & S. Brown. (Eds.), The Origins of Music (pp. 65-76). Cambridge, Mass: MIT Press.

Whelan, R. (2008). Effective analysis of reaction time data. The Psychological Record,58(3), 475-482. doi:10.1007/bf03395630

White, S. A .(2008) Milesian measures: Time, space, and matter. In Curd, P., & Graham, D. W. (Ed), The Oxford handbook of presocratic philosophy. (pp. 89-133), Oxford: Oxford University Press.

Whitrow, G. J. (2004). Time in History. New York: Barnes & Noble Books.

Wich, S.A, Krützen, M., Lameira, A.R., Nater, A., Arora, N., Bastian, M. L., Meulman, E., Morrogh-Bernard, H. C., Atmoko, S.S.U., Pamungkas, P., Perwitasari-Farajallah, D., Hardus, M.E., van Noordwijk, M., & van Schaik, C. P. (2012) Call cultures in Orang-Utans? PLoS ONE, 7(5), e36180. doi:10.1371/journal.pone.0036180

301

Widdess, R. (2001). Gharānā. In Grove Music Online. doi:10.1093/gmo/9781561592630. article.48146

Wilhelm, R., & Baynes, C. F. (1967). The I ching: Or, Book of Changes. Princeton, N.J.: Princeton University Press.

Will, U. (2004). Oral memory in Australian song performance and the parry-kirk debate: A cognitive ethnomusicological perspective. In E. Hickmann, & R. Eichmann (Eds.), Studies in Music-archaeology IV: The 3rd Symposium of the International Study Group on Music Archeology. (pp. 161-180). Rahden: Leidorf.

Will, U. (2011). Perspectives of a reorientation in cognitive ethnomusicology. (R. Wischkoski, trans.) In W. Steinbeck, (Ed.) Selbstreflexion in der Musik/Wissenschaft. Kölner Beiträge zur Musikwissenschaft, 16. Kassel: Bosse Verlag. (pp. 193-211).

Will, U. (2014). Rhythm, time experience, and the body: Re-thinking musical time. Insight.

Will, U. (2017). Cultural factors in responses to rhythmic stimuli. In J. R. Evans & R. Turner (Eds.), Rhythmic Stimulation Procedures in Neuromodulation (pp. 279-306). doi:10.1016/B978-0-12-803726-3.00009-2

Will, U. (2018, February). Chinese music and concept of time. In Institute for Chinese Studies. Re-Imagining China’s Past and Present Lecture Series. Lecture conducted from The Ohio State University, Columbus, OH.

Will, U., & Poss, N. (2008). The role of pitch contours in tonal language processing. Paper presented at The 4th International Conference on Speech Prosody, Campinas, Brazil, 6-9 May (pp. 309-312).

Wilson, S., & Moore, C. (2015). S1 somatotopic maps. Scholarpedia 10(4):8574.

Witzleben, L. J. (2001). A profile of east Asian musics and cultures. In Provine, R. C., Tokumaru, Y., & Witzleben, L. J. (Eds.). Garland Encyclopedia of World Music Vol. 7 : East Asia: China, Japan, and Korea (pp. 41-46). Taylor & Francis Group. Routledge.

Wu, K.-M. (1995). Spatiotemporal interpretation in Chinese thinking. In Huang, J., & Zürcher, E. (Ed), Time and Space in Chinese Culture. (pp.17-44), Leiden: E.J. Brill

Yamamoto, S., & Kitazawa, S. (2001). Reversal of subjective temporal order due to arm crossing. Nature Neuroscience, 4(7), 759-65.

302

Yanchevskaya N., & Witzel M. (2017) Time and space in Aancient India: Pre- philosophical period. In: Wuppuluri S., & Ghirardi G. (Eds), Space, Time and the Limits of Human Understanding. (pp.23-42). The Frontiers Collection. Springer, Cham. doi:10.1007/978-3-319-44418-5_3

Yuan, J. (2006). The role of time in the structure of chinese. Philosophy East and West, 56(1), 136-152.

Yung, B. (1987). Historical interdependency of music: A case study of the Chinese seven- string zither. Journal of the American Musicological Society, 40(1), 82-91

Yung, B. (1997). Celestial Airs of Antiquity: Music of the Seven-string Zither of China. Madison, WI: A-R Ed.

Zampini, M., Brown, T., Shore, D. I., Maravita, A., Röder, B., & Spence, C. (2005). Audiotactile temporal order judgments. Acta Psychologica, 118, 277–291.

Zampini, M., Torresan, D., Spence, C., & Murray, M. M. (2007). Audiotactile multisensory interactions in front and rear space. Neuropsychologia, 45, 1869–1877.

Zarco, W., Merchant, H., Prado, L., & Mendez, J. C. (2009). Subsecond timing in primates: comparison of interval production between human subjects and rhesus monkeys. Journal of Neurophysiology, 102(6), 3191-202.

303

Appendix A. Jajangga text with transliteration and translation

Phrase Korean text Transliteration English Translation 1 자장자장 우리 아가 Jajang, jajang woori agha Beddy-bye, beddy-bye, my baby 2 잘도잔다 우리 아가 Jaldo janda woori agha My baby sleeps well 3 자장자장 우리 화음이 Jajang, jajang woori Whaeumee Beddy-bye, beddy-bye, my Whaeum 4 잘도잔다 우리 화음이 Jaldo janda wooji Whaeumee My Whaeum sleeps well 5 우리 화음이 잠자고 있으면 Woori Whaeumee jamjago itsumyon While My Whaeum sleeps 6 엄마가 할일이 무지 많아요 Eummaga halilee muji manahyo Mommy has lots of work to do 7 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 8 미뤄논 설거지도 얼른 하고 Miruhnon seolgujido eulreun hago I do dishes quickly 9 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 10 엄마는 빨래도 얼른 돌리고 Uhmmaneun palraedo eulreun dolreego Mommy does laundry quickly 11 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 12 돌려논 빨래도 널어야 되고 Dolryonon palraedo nuluhya daego I hang the wash 13 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 14 할머니 널어놓고간 빨래도 걷고 Halmuhnee nuleonotgogan palraedo guetgo I fold the laundry your grandma washed 15 걷어논 빨래도 개서 정리하고 Guetuhnon palraedo gaeso jungreehago I put the clothes in the closet 16 안방에 청소기도 한번 돌리고 Ahnbangeh chungsogido hanbun dolreego I vacuum the bed room 17 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 18 꾸질꾸질한 엄마도 가서좀 씻고 Kujilkujilhan uhmmado gaseo jom sitgo Dirty mommy takes a shower 19 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 20 재밌는개콘도 한편 보고 Jaemitneun Gagcon*do hanpyon bogo I watch the funny Gagcon* 21 우리화음이 자는 동안에 Woori Whaeumee jamjaneun donganhe While my Whaeum sleeps 22 엄마가 할일이 꽤 많구나 Uhmmaga haleeli koh mankuna Mommy has to do lots of work 23 엄마가 할일을 마치고 나면 Uhmmaga halleeleul machigo nameon When mommy finishes all business 24 화음이 앞으로 돌아 올게요 Whaeumee ahpuro dolah olgyeyo Mommy will be back next to you 25 걱정을 말고 자고 있어요 Geukjeongeul malgo jago itseoyo Do not worry and sleep well 26 예쁜딸 화음이 자고 있어요 Yepntal Whaeumee jago itseoyo Pretty daughter Whaeum, sleep well 27 자장자장 우리 아가 Jajang, jajang woori agha Beddy-bye, beddy-bye, my baby 28 잘도잔다 우리 아가 Jaldo janda woori agha My baby sleeps well 29 눕혀도 될까 우리 아가 Nupheyodo daelka woori agha May I put you down on the bed, my baby? 30 눕히면 깰까 우리 아가 Nypheemyeon kaelka woori agha If I do that, will you wake up? my baby 31 자장 자장 우리 화음이 Jajang, jajang woori Whaeumee Beddy-bye, beddy-bye, my Whaeum 32 딥슬립 하거라 우리 아가 Deep sleep hageurah woori agha Take a deep sleep my baby 33 어디 보자 우리 아가 Uhdee boja woori agha Let me see my baby 34 눕혀도 되겠다 우리 아가 Nupheyodo daegetd woori agha It seems ok to put you on the bed my baby

*acronym of the famous Korean comedy show called ‘Gag Concert’

304

Appendix B. Simple Reaction Time (SRT) experiment mean reaction time ANOVA table

Factor F-value P-value STATUS F 2,28 = 0.165 0.849 GENDER F 1,28 = 2.256 0.144 ARM F 1,28 = 0.174 0.680 LOCATION F 1,28 =3.696 0.064 MODALITY F 3,84 = 178.244 <0.0001 STATUS : GENDER F 2,28 = 0.514 0.604 STATUS : ARM F 2,28 = 0.761 0.477 STATUS : LOCATION F 2,28 = 1.668 0.206 STATUS : MODALITY F 6,84 = 2.211 0.049 GENDER : ARM F 1,28 = 2.354 0.136 GENDER : LOCATION F 1,28 = 0.463 0.501 GENDER : MODALITY F 3,84 = 0.729 0.537 ARM : LOCATION F 1,28 = 1.911 0.178 ARM : MODALITY F 3,84 = 1.435 0.238 LOCATION : MODALITY F 3,84 8.715 0.0001 STATUS : GENDER: ARM F 2,28 = 1.247 0.303 STATUS : GENDER: LOCATION F 2,28 = 1.581 0.223 STATUS : GENDER : MODALITY F 6,84 = 0.662 0.680 STATUS : ARM : LOCATION F 2,28 = 0.455 0.639 STATUS : ARM : MODALITY F 6,84 = 0.967 0.452 STATUS : LOCATION : MODALITY F 6,84 = 0.309 0.931 GENDER : ARM : LOCATION F 1,28 = 0.392 0.536 GENDER : ARM : MODALITY F 3,84 = 0.759 0.520 GENDER : LOCATION : MODALITY F 3,84 = 1.154 0.332 ARM : LOCATION : MODALITY F 3,84 = 0.862 0.464 STATUS: GENDER : ARM : LOCATION F 2,28 = 1.518 0.237 STATUS : GENDER : ARM : MODALITY F 6,84 = 1.655 0.142 STATUS : GENDER : LOCATION : MODALITY F 6,84 = 1.000 0.431 STATUS : ARM : LOCATION : MODALITY F 6,84 = 0.819 0.558 GENDER : ARM : LOCATION : MODALITY F 3,84 = 0.649 0.691

305

306