<<

The Role of Melodic Contour in Linguistic Processing

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Yun Wang

Graduate Program in

The Ohio State University

2017

Dissertation Committee

Udo Will, Advisor

Graeme Boone

Marjorie K.M. Chan

1

Copyrighted by

Yun Wang

2017

2

Abstract

Melodic contour is one of the bridges connecting and music. It is considered as one of the basic aspects of music and easier perceptible than intervals (e.g. Edworthy,

1985; Patel, 2008). Some music phenomena, such as speech surrogates, imply that tonal information may help listeners decode messages. In this study, we investigate how pitch contours affect linguistic processing by applying one lexical decision task and one speech shadowing task.

The first experiment involved a lexical decision task, where participants were required to judge whether syllables they heard are words or non-words. The result shows musicians responded with a shorter reaction time, suggesting extended musical training makes contour processing more efficiently. Females outperformed males in speed, indicating the possible gender bias. The results indicate that it is easier to make a decision for non- words, as they have no entry in the mental lexicon, and, it is harder to do so for pseudo- words, as they are very similar to words. For non-words, the fastest responses occur when the tones do not match, suggesting that when melodic contours and non-word syllables differ in tones, it may speed up the decision process. Interestingly, a facilitatory effect was found in words only under match condition. This helps us understand how musical speech surrogates work, because many musical speech surrogates map lexical tones into

ii whistling/instruments: as tonal information is a part of lexicon, melodic pitch contours lead to the pre-activation of the lexicon, resulting in the larger facilitation size.

The second experiment involved a speech shadowing task, where participants were asked to repeat the target items. The results support the idea that tonal information facilitates linguistic processing. Musicians outperformed non-musicians in both speed and accuracy, confirming the enhanced ability and sensitivity in contour processing. One of the most important results is that timbre was a significant main factor in speech shadowing task: vocal primes generated fastest responses than instrumental and noise primes, supporting the view that the special role played by the in human communication contributed to an increased sensibility and to human voice. Vocal primes seem to activate the speech motor system, which is engaged in the vocal tract movement planning, leading to faster responses. In the shadowing task, pseudo-words were responded as fast as non- words. We suggested that this is because of the different task requirements, as there was no need to identify the lexical status, rather, participants needed to store the input phonological sound in mind and then activate the motor system to reproduce the sounds.

In both experiments, we found that different contours did not have the same priming effects, and we suggested it would be interesting to further explore what caused these difference in future studies.

This dissertation demonstrates the possibility of bridging and linguistics.

This interdisciplinary study gives us some new insights into the role and importance of melodic contours in linguistic processing, offering evidence and a better understanding of

iii certain musical phenomena, e.g. how musical speech surrogates can carry messages with pitch and contour only.

iv

Dedication to my family

v

Acknowledgments

Special thanks go out to Dr. Will. I am very grateful for your encouragement and understanding. You not only offered important feedbacks and insights on my dissertation, but also empathy with my feelings when I was down. I’m so lucky to have you been my advisor, and will never forget those chats with you.

I would thank Dr. Chan and Dr. Boone, for your patience with my questions and offering helpful advice during classes, candidacy exam and dissertation writing. Thank you very much, Ma Laoshi, for inviting me to attend the ICS graduate forum, which was really a good experience.

Also, I’d thank my family members, especially little Raina, for she smiled broadly every time I came back home from library, and her frequent waking up did help me get up early to write my dissertation (smile).

And finally, I’d offer special gratitude to the Mixter family for providing the Professor

Keith E. Mixter Scholarship for Music History unselfishly to support my research. I am really honored to have been selected as a recipient of this scholarship.

Thanks all for your support!

vi

Vita

2010...... B.A. Recording Arts, Capital Normal University

2013...... M.A. Musicology, Capital Normal University

2013 to 2016 ...... Graduate Teaching Associate, The Ohio State University

Fields of Study

Major Field: Music

vii

Table of Contents

Abstract ...... ii Dedication ...... v Acknowledgments...... vi Vita ...... vii List of Tables ...... x List of Figures ...... xi Chapter 1. Music and language ...... 1 1.1 Music and language: from the origin ...... 1 1.2 The similarities and differences between speech and music ...... 6 1.3 Mechanism for language processing and music processing ...... 9 Chapter 2. Pitch contour as a starting point ...... 14 2.1 The importance of pitch contour ...... 14 2.2 Why people use musical speech surrogates ...... 18 2.3 Musical speech surrogate types ...... 19 2.4 How musical speech surrogate works: starting from exploring how pitch contour affect mental lexicon accessing ...... 22 Chapter 3. Sounds from instrumental and vocal sources ...... 28 3.1 Introduction ...... 28 3.2 Preference to vocal sounds...... 29 3.3 Previous studies regarding vocal and instrumental sounds...... 33 Chapter 4. Music training—a cultural factor affects pitch contour ...... 38 4.1 The enhanced contour processing ability in musicians ...... 38 4.2 Previous studies regarding the interaction between melodic contour and lexical processing in terms of musicianship ...... 45 Chapter 5. Pitch contour priming experiments ...... 50

viii

5.1 Introduction ...... 50 5.2 Materials ...... 53 5.3 Experimental Variables ...... 56 5.4 Experiment 1 ...... 57 5.5 Experiment 2 ...... 87 Chapter 6. Conclusion ...... 107 6.1 Findings Summary ...... 107 6.2 Limitations of current study and possibilities for future studies...... 109 6.3 A possible new direction regarding pitch and contour influence: music text-setting...... 112 6.4 Conclusion ...... 115 Bibliography ...... 117

ix

List of Tables

Table 2.1 Classification of phrases ...... 15 Table 2.2 Classification of phrases...... 15 Table 5.1 The list of word, nonword and pseudoword syllables ...... 55 Table 5.2 Basic version 1 ...... 60 Table 5.3 Basic version 2 ...... 61 Table 5.4 Post-hoc test for the LEXSTATUS*RELATIONSHIP (expt 1) ...... 69 Table 5.5 Post-hoc test for the STATUS*LEXSTATUS*GENDER (expt 1) ...... 72 Table 5.6 Post-hoc test for the PRIMETONE* (expt 1) ...... 74 Table 5.7 Post-hoc test for the PRIMETONE*TONE*STATUS (expt 1) ...... 76 Table5.8 Post-hoc test for the STATUS*LEXSTATUS (expt 1) ...... 81 Table 5.9 Post-hoc test for the STATUS*LEXSTATUS (expt 1) ...... 82 Table 5.10 Post-hoc test for the PRIMETONE*TONE (expt 2) ...... 97 Table 5.11 Post-hoc test for the PRIMETONE*TONE*STATUS (expt 2) ...... 99

x

List of Figures

Figure 1.1 The evolutionary tree of music and language ...... 5 Figure 1.2 A model of music processing module ...... 11 Figure 2.1 Some examples with Eastern music core ...... 16 Figure 2.2 Areas where ascending and descending contour types lead to different activation pattern ...... 17 Figure 2.3 The spoken Gavião word tatia and corresponding bilabial whistling ...... 21 Figure 2.4 Spectrograms of (i, e, a, o, and u) ...... 22 Figure 2.5 The auditory dual-route model………………………………………………………………………..25 Figure 4.1 Musicians/non-musicians’ performance in four tones under three different syllable types ...... 42 Figure 4.2 The overall reaction time for the four subject groups ...... 46 Figure 4.3 The facilitation effect across the four groups ...... 47 Figure 4.4 The facilitation effect for different kinds of primes across the four groups .... 48 Figure 5.1 Instrumental and vocal primes in the four standard Mandarin tones ...... 54 Figure 5.2 Response time in lexical decision task ...... 64 Figure 5.3 The mean reaction time for non-words, pseudo-words and words (expt 1) .... 67 Figure 5.4 The LEXSTATUS*RELATIONSHIP interaction in terms of reaction time (expt 1) ...... 68 Figure 5.5 The STATUS*GENDER interaction in terms of reaction time (expt 1) ...... 70 Figure 5.6 Reaction time for the STATUS:LEXSTATUS:GENDER interaction (expt1) 71 Figure 5.7 The interaction PRTONE*TONE in terms of reaction time (expt 1) ...... 73 Figure 5.8 The STATUS:PRTONE:TONE interactions in terms of reaction time (expt 1) ...... 75 Figure 5.9 The correct rate for words, non-words and pseudo-words (expt1) ...... 78 Figure 5.10 The correct rate for musician group and non-musician group (expt 1) ...... 79 Figure 5.11 The STATUS*LEXSTATUS interaction in terms of accuracy (expt 1) ...... 80 Figure 5.12 The interaction STATUS*LESTATUS*GENDER in terms of accuracy (expt 1) ...... 81 Figure 5.13 The main effect TIMBRE in terms of reaction time (expt 2) ...... 91 Figure 5.14 The reaction times to different lexical categories (expt 2) ...... 93 Figure 5.15 The STATUS*GENDER interaction in terms of reaction time (expt 2)...... 94 Figure 5.16 The STATUS:LEXSTATUS:GENDER interaction in terms of reaction time (expt 2) ...... 95 Figure 5.17 The PRTONE:TONE interaction in terms of reaction time (expt 2) ...... 96

xi

Figure 5.18 The PRTONE*TONE*STATUS interaction in terms of reaction time (expt 2) ...... 98 Figure 5.19 The main effect of LEXSTATUS in terms of accuracy (expt 2) ...... 101 Figure 5.20 The interaction STATUS*GENDER in terms of accuracy (expt 2) ...... 102 Figure 6.1 An example of confusion from song Lubinghua ...... 113

xii

Chapter 1. Music and language

1.1 Music and language: from the origin

Music seems to exist in all human societies. From the most developed regions to isolated islands, we can find something resembling music. Music has a long history; based on current archeological research, pre-historical flutes be dated back to more than 35000 years ago (Conard, Malina, & Münzel, 2009). What is the origin of music? This question not only attracts musicians, but also linguists, economists and many scholars from other fields, who devote themselves to the riddle of the Sphinx. Darwin’s (1871) evolutionary theory suggested that music could be linked to sexual selection, that is, human make use of music to attract opposite sex, resulting in an advantage in reproduction. Scholars who are in agreement with Darwin’s adaptive mechanism claim that sexual selection, emotional expression, social cohesion, mother-infant communication, etc would form important driving forces in the evolution of music (Charlton, 2014; Dissanayake, 2000;

Balter, 2004; Mithen, 2005).

Mithen discussed the origin of music and language from perspective of archaeology.

Archeological evidence showed hyoid bone in the Kebara I specimen was found to be

“effectively identical in form to that of a modern human”, and that “the Neanderthal vocal tract was also essentially the same as that which you or I possess” (2005, p.226).

Further, hypoglossal canal and vertebral canal support the view that Neanderthals would

1 have already gained the ability for enhanced communication1. But there are no indications that Neanderthals already possessed language. Rather, hmmmmm, an abbreviation for holistic, manipulative, multi-modal, musical and mimetic, is the communication system in the Neanderthals in Mithen’s view. He proposed that hmmmmm is the shared ancestor of language and music. He believed that the precursor of language is holistic rather than compositional, that is, there were no discrete words in pre-modern communication. That is, utterances were used to express something that cannot be further divided into smaller components, and lack of components makes forming new utterances unavailable. This is because, as Mithen indicated, the

Neanderthals were in small communities, so he speculated that they may not have the necessity to create new utterances.2 The hmmmmm communication system developed with the enhanced selective pressure. The selective pressure not only came from attracting females, but also from social activities like caring for babies and their development, cooperation in hunting, and information exchange regarding challenging environments, etc.

Mithen emphasized the importance of hmmmmm. Firstly, he pointed out the function of hmmmmm to express emotions. Herbert Spencer (1857) believed that music has its origin in vocal production. He reckoned that it is produced by certain muscles (e.g. larynx,

1 Enhanced communication is in comparison to ancestors of the Neanderthals. They must have had an enhanced communication as archeological evidence showed that they successfully survived in the earth for nearly 250000 years, when the pressure to survive in the ever-changing environment dramatically increased. 2 Because the Neanderthals lived in small communities, and had relatively limited communication with other Neanderthals, Mithen argued that the Neanderthals “didn’t have much to say that had not been said many times before” (p.228). In addition, at that time, the arbitrary association between phonetic segmentations and objects had not been established yet, which means the Neanderthals were not ready to create new utterances in a language fashion. 2 muscles of abdomen, etc) under certain emotional stimuli. In other words, music is an emotional speech, since we need to change intonations during vocal emotional expressions. Mithen further discussed the need to express more complex emotions through nuances of utterances in hmmmmm, for example, to express anxiety when hunting large animals.

Secondly, he mentioned that how to cooperate with other hunters via hmmmmm. Mithen argued that large collections of animal bones in open sites are the evidence of the cooperation of many Neanderthal hunters. And to make a hunting plan, they must be able to communicate via mimesis (i.e. communicate with others about the size, the species by mimicking the features of animals). In addition, he also pointed out the essential role of communal music-like sound making, which “will mould their own minds and bodies into a shared emotional state, and with that will come a loss of self-identity and a concomitant increase in the ability to cooperation with others” (p.215).

Thirdly, hmmmmm was important for baby caring and development. This is not only because the Neanderthals were faced with challenging environment, which means they had to sing to comfort their babies; but also, due to their relatively faster growth rate, which pushed them to learn the utterances. That the mother-infant communication played an important role was also favored by some other scholars (Trehub, 2003; Dissanayake,

2000). In the pre-linguistic stage, a mother (also other relatives) will use a kind of speech that features exaggerated intonations called motherese to communicate with the baby, and the baby in turn learns to access the meaning by “reading” the intonations repeatedly. In addition, it is believed that mothers’ singing would comfort babies (Trehub & Trainor,

3

1998; Trehub, 2001), which suggest the existence of the link between music and mother- infant communication.

Fourthly, Mithen argued that sexual selection may not weight that much in the

Neanderthals, which means music-making at that time perhaps was not for attracting females, but for “advertising and consolidating pair-bonding” (2005, p.240).

The separation of language and music is because they were efficient in different tasks, as shown in Figure 1.1. Language evolved when segmentations were established. Mithen pointed out that once the arbitrary links between phonetic components and objects were realized, it was possible to make new utterances in a different way from the previous holistic fashion. Thus, language evolved as a more effective system to communicate.

Mithen proposed that when language evolved, hmmmmm system at that time was

“almost entirely concerned with the expression of emotion and the forging of group identities” (p.266). Along this line, the previous hmmmmm communication system developed into music system. Mithen argued that though hmmmmm is not in use today, the trace of hmmmmm communication system can still be observed. He gave an example that parents communicate with babies with infant-directed speech (motherese), and further he pointed out the process that toddlers around a certain age will begin to learn the referential associations “mirrors the evolutionary process of the differentiation of

‘hmmmmm’” (p.276) system.

4

Figure 1.1 The evolutionary tree of music and language (Mithen, 2005, p.276)

Besides those theories mentioned above, (1911) put forward that keeping vowels sustained would make vocal communication over long distance more efficient.

Considering the importance of communication in human survival, sending messages over long distances became a problem that had to be solved. In various human societies not only the voice, but also instruments are used to disperse information over a distance. The 5 problem of how to express linguistic messages without using explicit speech will be discussed in the next chapter Pitch as A Bridge—Musical Speech Surrogate.

1.2 The similarities and differences between speech and music

Language is a communication tool for human to understand each other. Like music, language is also a sound system unique to human (Patel, 2008). Noam Chomsky indicated, “when we study human language, we are approaching what some might call the ’human essence,’ the distinctive qualities of mind that are… unique to man” (2006, p.88).

However, there are considerable differences between music and language. The most pronounced difference is that one can express precise semantic meaning through language, while not possible in music. In addition, as Patel (2008) indicated, the basis for these two sound systems is different, for music is mainly built on pitch, and language is based on timbral contrast.

Besides the differences, there are a lot of similarities between music and language, which drew increasing attention in recent years.

Firstly, they share acoustic elements, namely pitch, timbre, intensity and duration. Pitch in language not only carries linguistic content (e.g., tones in tonal ), but also expresses emotional information. Music in most cases uses pitch as the basis to organize the sounds, though in some cases, timbral contrasts may also function as the foundation for sound organization (Patel, 2008). In terms of timbre, in music, we learn to distinguish instruments because of their different sound quality. In language, timbral contrasts lay the

6 foundation for the production and perception of different phonemes. In both language and music domain, intensity offers listeners cues to figure out emotional status, and duration is the basis for temporal organization and helps, for example to distinguish phonemes and express emotions.

Secondly, music and speech are both primarily processed by the auditory system, though other sensory modalities, like the visual or somatosensory system, may also be involved.

That means, they are perceived via the same pathways. Sound, is transmitted in the form of sound waves that first arrive at our outer and middle ear, and are then converted from air-pressure vibration into fluid wave in the inner ear. Inner hair cells are tuned to different sound frequencies, and send electrical signals to auditory nerve. These signals are then transmitted to the auditory cortex, where are thought to be further processed in sound perception.

Thirdly, as Patel indicated, music and language is something we learn. The word “learn” points out the importance of cultural factors in sound perception, which means, the way we learned to organize sound produces and reinforces “a mental framework of sound categories for our native language or music” (2008, p.9).

Fourthly, Kraus and Slater proposed that “both music and language are complex communication systems, in which basic components are combined into higher-order structures in accordance with certain rules” (2015, p.207). Patel agreed with this, and pointed out that both systems used “similar interpretive feats, converting complex acoustic sequences into perceptually discrete elements (such as words or chords) organized into hierarchical structures that convey rich meanings” (2008, p3).

7

Besides, there are a lot of overlapping concerns in music and language studies. For example,

1. Whether there exists a critical period in both music and language learning. Many studies suggest that music training has a critical period, at least for acquisition, and it is approximately from 3 to 6 years old (Lenhoff, Perales, & Hickok,

2001; Levitin & Zatorre, 2003). In language learning, recent studies support that the critical period for oral language is around 1 to 3 years old, and that for written language is about 3 to 5 years old (Wang, Yang, & Li, 2016, p.49).

2. Whether language learning and music training will produce positive influence on individual cognitive ability. Many studies regarding the interaction between language learning and individual cognitive development focus on the effect of being bilingual.

Results have demonstrated that such a training enhances other cognitive abilities, such as working , visual-spatial, inductive reasoning, etc. And for participants with tonal language background, an improved ability in pitch discrimination and musical ability has been investigated (Bidelman, Hutka, & Moreno, 2013; Giuliano, Pfordresher, Stanley,

Narayana, & Wicha, 2011; Pfordresher & Brown, 2009). In terms of the effects of musical training, there is growing evidence showing that music training can also enhance cognitive processing in non-musical domains. There are various studies showing differences between musicians and non-musicians in areas other than music. For example, music training enhances auditory working-memory (Kraus, Strait, & Parbery-Clark,

2012), but also strengthen the ability to extract meaningful signals, which facilitating to perceive speech in noise (Kraus & Chandrasekaran, 2010; Strait, Parbery-Clark, Hittner,

8

& Kraus, 2012). In addition, compared to musically untrained people, musicians who play instruments have better motor control abilities (Kincaid, Duncan, & Scott, 2002).

Also, researchers found that there was a predictive relationship between music expertise and performance in mathematics (Gouzouasis, Guhn, & Kishor, 2007).

3. Whether music and language have the common brain processing mechanism is a not yet fully understood topic, but it attracts more and more attention. We will review this topic in the next section.

1.3 Mechanism for language processing and music processing

Music and language share many similarities yet they also differ in many aspects.

Researchers have made several speculations on the relationship between music processing and language processing.

Though language and music share sound features like pitch and timbre, there are two different opinions on how our brain process linguistic and musical information. One of the opinions proposes that our brain processing is based on modularity (Peretz &

Coltheart, 2003). This opinion is built on findings that patients with brain lesions can suffer from selective impairment of their music or language ability. 3Thus, it would be impossible that music and language processing share exactly the same processing system; otherwise a damaged ability in one domain (e.g. music) would predict an affected ability in the other domain (e.g. language).

3 some patients may have their music ability damaged while their language ability remains unaffected (Mattei, Rodriguez, & Bassuner, 2013; Dalla Bella & Peretz, 1999) and vice versa (Peretz & Coltheart, 2003). 9

Importantly, Peretz and Coltheart pointed out that the music and language modules can both be divided into smaller modules, each dealing with a certain part of the acoustic input, e.g. contour processing. They illustrated the music processing module clearly in

Figure 1.2 that is mainly based on clinic studies. Some of the sub-modules are specific to music, and some deal with features shared by both language and music. However, there is a controversial statement regarding contour analysis component: Figure 1.2 shows that contour processing is specific to music, while the authors pointed out that “the ‘contour processing’ component, which abstract the pitch trajectories could conceivably be involved in processing speech intonation as well as music” (p.689). This is quite interesting, since they seemed to treat speech intonation as a musical aspect rather than linguistic aspect. And this gives us the impetus to investigate whether vocal contour processing and non-vocal contour processing are domain-general or not. In fact, there are several studies supporting that vocal and instrumental stimuli would produce significant different responses both in behavioral studies and neuro-imaging studies. Hung (2011) conducted an fMRI study, and showed that vocal rhythms triggered greater activations in brains. Other researchers reported that the involvement of the articulatory loop was only found in vocal rhythm memorization but not for instrumental rhythms (Klyn, Will,

Cheong, & Allen, 2016). Will (2018) proposed that producing vocal and instrumental sounds involved different neural circuits as different parts of body engaged in the process.

We will discuss the topic regarding timbre later in Chapter 3 Sounds from vocal and instrumental sources.

10

Figure 1.2 A model of music processing module. Boxes and arrows indicate processing components and information flow. Components and information flow specific to music are indicated in green, otherwise in blue. Italics components are those whose specificity to music is currently unknown (Peretz & Coltheart, 2003, p.690).

Ivry & Robertson (1998) proposed the theory of hemispheric specialization (asymmetries) in auditory perception. They argue that hemispheric asymmetries are caused by different tasks. And the “task-based differences emerge as a function of whether the critical information (in/of the task) is contained in the relatively higher or lower frequencies”

11

(p.52). They pointed out that left hemisphere deals with high-frequencies, processing local features, while right hemisphere favors low-frequencies and processes global features. Many neuroimaging studies have shown that left hemisphere is heavily involved in processing phonemic information, while right hemisphere is more sensitive to contour changes (Zatorre & Belin, 2001; Johnsrude et al., 2000). An important aspect that needs to be noticed is that the asymmetries can be modulated by one’s cultural background.

Studies have shown that for musical contour, the right superior temporal sulcus, anterior cingulated cortex and left inferior parietal lobule are all activated during processing (Lee,

2011). By contrast, left-hemisphere regions are activated (e.g. left premotor cortex, pars opercularis, etc) for contour processing of speech in participants who have the knowledge of that specific language (Hsieh, Gandour, Wong, & Hutchins, 2001; Zatorre & Gandour,

2008), though in non-tonal language speaker these activations are absent.

Another view, proposed by Patel, emphasizes more the “share neural resources” (2008, p.268) for music and language processing4. Patel mentioned, “a clear conceptual distinction must be made between the end products of development, which may be domain specific, and developmental processes, which may be domain general” (p.72).

Aniruddh Patel hypothesized that mechanisms for music and language sound processing may overlap in certain parts. According to him, if such an overlap exists, "it is conceivable that exercising these mechanisms with sounds from one domain could enhance the ability...in the other domain" (p.79). And this transfer effect has been found

4 Patel mentioned the resource-sharing in syntactic processing (SSIRH). He argued that music and language “have distinct and domain-specific syntactic presentations (e.g., chords vs. words), but that they share neural resources for activating and integrating these representations during syntactic processing” (p.268). 12 in many studies (Moreno et al., 2009; Wong, Skoe, Russo, Dees, & Kraus, 2007; Schön,

Magne, & Besson, 2004). In addition, some studies with amusic patients5 support the shared mechanism for language and music. Researchers found that amusic subjects performed similarly in both music and speech domains for certain contour and rhythm tasks (Patel, Peretz, Tramo, & Labreque, 1998)6, which led them to suggest that there are shared neural resources for language and music processing. Tillmann et al. (2010) have reported that patients with congenital not only suffered from musical deficit7, but also had difficulty in discriminating linguistic contour.

Until now, this is still a debated topic without a definite answer. Rather than regarding these two views as incompatible, we argue that we may gain deeper understanding by taking both of them as evidence for modularity but with different focuses. Patel’s and many other neuroimaging studies probably are better used as evidence for topographical neural overlap rather than counter examples of modularity, as researchers pointed out,

“neural separability between music and speech may occur in overlapping brain regions”

(Peretz, Vuvan, Lagrois, & Armony, 2015, p.1).

5 Amusic patients refer to those who suffer from amusia, which can be either acquired or congenital. They show a disability in pitch processing, e.g. difficulty in recognizing the pitch relationship. 6 Their experiment involved prosodic and musical discrimination tasks, where stimuli pairs are either differ in pitch contour or rhythm. Participants were asked to make decisions on whether the stimuli pairs are identical or different. Results showed that subjects’ performance in different tasks didn’t significantly differ with each other. 7 They applied MBEA to qualify amusia participants. The MBEA include pitch discrimination test, and the authors found that participants’ performance in pitch discrimination test is proportional to that in lexical tone discrimination task. 13

Chapter 2. Pitch contour as a starting point

2.1 The importance of pitch contour

Pitch contour perception, as the low level of auditory processing which happens in the early stage is a fundamental aspect in sound processing, serving as a perfect research object bridging music and language. Contour is so important that even if a tune is modified in a different key, we can still recognize it. Even infants, before forming the more sophisticated high level auditory processing ability, are able to distinguish pitch contour: research showed that 6-month old infants are already able to recognize a transposed melody after listening to the original version for 7 days (Plantinga & Trainor,

2005). And since Dowling (1999) has shown that infants about this age still cannot distinguish specific interval, these reports can be used to support that contour is easier perceptible than intervals.

In addition, musically, we found many interesting things regarding pitch contour. The first thing is that different contour types may not enjoy equal status in musical practices.

Huron (1996) did a statistical analysis on the phrases contour classification in western folksongs by using the Essen Folksong Collection. He used two methods to count the contour type. One is by calculating the average pitch for all notes except the first and final notes, and then compares the average middle-notes-cluster pitch, the first pitch and the final one. The other way to classify the contour type is to equally split a phrase into 3 parts, and compare the average pitches of each part. The results are listed as follows: 14

Table 2.1 Classification of phrases. (Huron, 1996, p.9)

number of contour type percent phrases ascending 6,983 19.4% descending 10,376 28.8% concave 3,496 9.7% convex 13,926 38.6%

horizontal-ascending 181 0.5% horizontal-descending 439 1.2% ascending-horizontal 307 0.9% descending-horizontal 174 0.5% horizontal 193 0.5% TOTAL: 36,075 100%

Table 2.2 Classification of phrases (Second method). p.10.

contour type number of phrases Percent ascending 6,873 19.1% descending 9,195 25.4% concave 4,844 13.4%

convex 12,568 34.8% horizontal-ascending 516 1.4% horizontal-descending 778 2.2% ascending-horizontal 563 1.6% descending-horizontal 527 1.5% horizontal 211 0.6% TOTAL: 36,075 100%

15

A similar study conducted by Juhász and Sipos (2009) analyzed melodic contours of 16 folksong collections. They found that in Eastern music, descending contour type is the core feature.

Figure 2.1 Some examples with Eastern music core (Juhász & Sipos, 2009, p.11)

16

And, Lee, Janata, Frost, Hanke, and Granger (2011) applied whole-brain multivariate pattern analysis on 12 non-musicians' fMRI data, finding that ascending and descending contours lead to different activation patterns in three brain regions: the right superior temporal sulcus, the left inferior parietal lobule, and the anterior cingulate cortex. This is quite interesting, as the results give the impetus to zoom in to examine specific melodic contours rather than taking contour as a whole.

Figure 2.2 Areas where ascending and descending contour types lead to different activation pattern (Lee et al., 2011, p.5)

17

The other interesting thing we found is that melodic contour may facilitate mental lexicon accessing (e.g. Will, Poss, & Hung, submitted; Poss, 2012; Cheong, Will, & Lin, 2017), which may help us understand how musical speech surrogates work8. We will first talk about what musical speech surrogates are and then discuss the influence of melodic contour on linguistic processing.

2.2 Why people use musical speech surrogates

Many musical speech surrogates were developed in environments, where terrain and/or background noises limit verbal communication. Users utilized musical speech surrogates to convey messages by imitating some linguistic content in speech. In other words, musical speech surrogates make use of the mapping between linguistic cues and musical cues. Drums are used for long-distance communication. They can send sounds over distances of 4 to 5 miles (Oreskovich, 2016) and the sounds spread much faster than horse riding. Meyer (2015) did an experiment to test how efficient speech surrogate

(whistle speech) is in natural condition. The results showed that compared to natural speech and shouted voice, whistle speech has the strongest signal-to-noise ratio because it is “characterized by strong energy, a narrow general range of frequencies but a rather large band for each whistled syllable” (p.97). Thus, musical speech surrogates are very suitable for long-distance communication. However, an important point as Meyer

8 Previous works may use the term speech surrogate, but actually as Poss (2012) indicated, it is not an unproblematic term to describe the phonemeon in music domain, because it also contains other signal systems like Morse code. Moreover, the widespread used term “talking musical instruments” (Meyer, 2004) does not cover all the possibilities, since it only refers to those surrogates performed on non-vocal instruments, so that whistle speech will not be included. Thus, in this thesis, we take Feld and Fox’s (1994) term “musical speech surrogates” to describe such a phenomenon. 18 indicated, a common misunderstanding is that people use surrogates to substitute verbal communication. In fact, they are not independent speech system, but a complementary part of the local language communication.

2.3 Musical speech surrogate types

There are basically two kinds of musical speech surrogates: instrumental surrogates and whistle speech. The former one refers to those applications known as talking musical instruments. The most famous talking instruments are perhaps the talking drums. They are drums that can spread their sounds over a distance and have attracted scholars’ attention for several decades. There are also a lot of other instruments with ability to talk, such as flute in Hmong, mouth harp in Yi, and trumpet in Ghana. The mentioned musical speech surrogates share an important feature, that is, they are used in tonal language areas.

Besides instrumental speech surrogate, there is also another kind of speech surrogate called whistle language. This form of speech surrogate can be found all over the world.

From Gavião to the Canary Islands, from the Pyrenees to the Chepang, we can notice such an amazing phenomenon. Though most of the whistle languages are used in tonal language context, some are also found in non-tonal language context.

From the perspective of principle and application, there are two types of musical speech surrogates: one is through maintaining pitch information, and the other one is through maintaining formant contours (Sebeok & Umiker-Sebeok, 1976). The division is mainly based on whether it is tonal or non-tonal languages (Moore & Meyer, 2014, p.623).

19

For tonal languages, whistlers and drummers imitate tonal information (pitch). That is, some instrumental surrogate languages outline the speech pitch information to convey message over a distance. Importantly, that musical speech surrogate can be understood is not just because there is a pitch mapping between speech and speech surrogate, but because users will use routine sequences with contextual information. For example,

“‘moon’ would be played as ‘the Moon looks towards earth’” (Matthew, 2017, para. 6).

That means, the speech surrogates are patterned signals, providing context so that the surrogates make sense to their users.

The lunas would be an example of musical speech surrogates used in tonal language context. They are talking drums of West Africa. The two heads of the drum are stretched by a cord, and players place the drum under their arm. When they press the cord with different pressures, the tension of drum membrane, and hence their pitch change. By doing so, players are able to imitate the fundamental pitches of an utterance. Another example can be found in Hmong instrumental speech surrogate. The high tones of spoken words are mapped onto high pitches on a flute called raj (Poss, 2012), and this offers clues in understanding the messages encoded in the instrumental sounds. In addition,

Poss indicated that some other phonetic information (different are either performed in a smooth or a tongued articulation type in raj) may also be encoded. In his study, he conducted an experiment with skilled listeners as participants and found that for skilled listeners, they “relied mainly on the established relationships between musical pitch and lexical tone to infer meaning” (p.iii). Besides instrumental surrogates, whistle

20 speeches in the tonal language, like Gavião, also imitate lexical tones. Figure 2.3 shows the pitch-based whistle speech for one word in Gavião.

Figure 2.3 The spoken Gavião word tatia and corresponding bilabial whistling (Meyer, 2015, p.125). The blue lines indicate the pitch in the spoken word.

The other type of musical speech surrogate is achieved through emulating formant information in non-tonal language environment. Caughley found that “generally higher average pitch with the high front /i/, lower with the low back vowel /o/” (1976, 21 p.968). Rialland (2005) has shown that, in Silbo Gomero, a whistle language, the second formants of spoken vowels /i/, /e/, /a/, /o/ and /u/ transpose to the f0 of the same whistled vowels, where vowel i has the highest pitch.

Figure 2.4 Spectrograms of vowels (i, e, a, o, and u). The top is the spoken vowels, and the bottom is the whistled vowels (Rialland, 2005, p.4)

2.4 How musical speech surrogate works: starting from exploring how pitch contour affect mental lexicon accessing

Carreiras, Lopez, Rivero, and Corina (2005) have shown that “temporal regions of the left hemisphere that are usually associated with spoken-language function are engaged 22 during the processing of Silbo” in whistlers, suggesting “the language-processing regions of the human brain can adapt to a surprisingly wide range of signaling forms” (p.31).

Though researchers are still not very clear on how musical speech surrogates work, studies implied that melodic contour may facilitate mental lexicon accessing (e.g. Will et al., submitted; Poss, 2012; Cheong et al., 2017). Thus, exploring the interaction between melodic contour and linguistic processing may further help us understand why pitch information would be important in musical speech surrogates.

Before we move to the previous studies regarding how melodic contour affect linguistic processing, we will first introduce some models of speech recognition and the dual-route model for speech reproduction.

There are several models of speech perception. Logogen9 model was proposed by Morton

(1970). He proposed that each logogen has a threshold, and once the auditory input information reached the thresholds of it, the target word will be accessed. Cohort10 model

(Marslen-Wilson, 1987) is also an activation model. Marslen-Wilson proposed that there are three lexical access stages: activation, competition and selection. In the first stage, the first speech activates multiple candidates with the same sound, and in the competition and selection stages, with the speech unfolding, more and more segments will be presented, thus ruling out those incompatible candidates. TRACE model, which was proposed by McClelland and Elman (1986), is a connectionist model. TRACE model emphasizes the connectivity and interactive activation of processing layers (input layer, feature layer, phoneme layer and word layer), and the recognized word is the product of

9 Logogen means word birth, referring to the word recognition unit. Each lexical entry has a logogen. 10 Cohort refers to a set of words that share the first few phonemes. 23 interactions and competition. However, these models did not explain the role of tonal information and at what stage it affects speech recognition.

Besides, there are two pathways: lexical pathway and non-lexical pathway in speech reproduction, as indicated by Patterson and Shewell (1987). Studies have shown that the lexical pathway is more preferred than the non-lexical pathway. People may wonder why non-lexical pathway is not preferred as the following figure illustrated speech through non-lexical pathway travels fewer steps than through lexical route. This is not only because reaching lexical entry will make the processing more reliable, as the lexical entry contains the meaning information of the input, but also because its efficiency: the auditory dual-route model suggested that speech through lexical route is processed more efficient than through the non-lexical route.

24

Figure 2.5 The auditory dual-route model proposed by Patterson and Shewell (as cited in Coltheart, Rastle, Perry, Longdon, & Ziegler, 2001, p.211)

Priming effect and paradigm

To investigate how tonal information affect mental lexicon accessing, priming paradigm is always involved in experiments. The phenomenon that preceding stimuli can facilitate the perception to the following stimuli is called priming effect. It is an effect of implicit

25 memory (an unconsciously memory11). The first stimuli (primes) can be associated with the latter stimuli (targets) in a similar form (perceptual priming) or meaning (cognitive priming). As we are more interested in how perceptual similarity affects priming in our study, we will then focus on perceptual priming.

Reisberg (2013) indicated that, “participants are able to identify a stimulus more quickly after a recent exposure, is frequently interpreted as occurring because the previous encounter with the stimulus raised the activation of its lexical representation above resting level and closer to the threshold level necessary to invoke an identification response” (p.247). For example, once we are presented with a sequence of words including a word “image”, we will have higher chances to fill in “image” in the blanks following the letters im_____ . This is an example of form priming (i.e. priming of the word form), where the word image activates candidates with similar representations, which may lead to a faster response to the target im_____.

Priming paradigm is widely used in lexical decision and other linguistic experiments. We can manipulate the stimuli with only the lexical tone of the primes being changed. By doing so, we can analyze whether pitched primes will affect the mental lexicon accessing.

Will and Poss (2008) investigated the role of pitch contours in lexical processing by removing segmental information from primes. They successfully demonstrated some essential principles underlying speech surrogate through both a shadowing experiment and a lexical decision task. They found that responses are speeded up with pitched primes.

The shadowing experiment showed that primes carrying matched contour information

11 The counterpart of implicit memory is explicit memory, a conscious memory. 26 significantly facilitate participants’ vocal response than unpitched white noise. In subsequent shadowing experiments (Will et al., submitted), they confirm that tonal information plays an important role in target repetition. They reported that both match and non-match condition (the contour relationship between prime and target) primed reaction time, and under non-match condition produced the greatest advantage. In a later study, Poss (2012) used a priming paradigm to investigate whether instrumental sounds facilitate participants’ response to Hmong words and pseudo words. He found that primes with tonal contour information significantly primed participants’ response compared to control condition, which is consistent with their previous result. Cheong et al. (2017) conducted an fMRI study and found that “melodic contour primes change the BOLD activation during word processing” (p.38).

To better understand musical phenomenon such as speech surrogates, we probably can start from exploring how pitch contour affect linguistic processing. That is the main entry point for this dissertation. Along this line, we can further investigate how different contour types involved in the process.

27

Chapter 3. Sounds from instrumental and vocal sources

3.1 Introduction

“Timbre is operationally defined as the attribute that distinguishes sounds of equal pitch, loudness, location and duration” (Town & Bizley, 2013, p.1). Research have supported that even 38-week old fetuses would be able to response to different timbres. Kisilevsky et al. (2003) found that when playing poem recorded by fetuses’ own mother on mothers’ abdomen, fetuses’ heart beat increased. Compared to this, when presented with the same poem recorded by other pregnant women, a decreased heart rate was noticed. From the perspective of ecological significance, timbre plays an important role in human survival.

Since timbre is a multidimensional acoustic cue with plenty information, it helps human beings figure out the size of sound source, gender and even other key features, such as identity and emotional status.

In vocal timbre domain, timbre is the main contrastive aspect in organizing speech (Patel,

2008). How we recognize vowels and consonants depends on timbre, since different phonemes are with different timbre attributes. Human produce different timbres by varying mouth opening width, place and , the shape of resonant cavities, etc, which means timbre in natural speech would change really fast. Thus,

“speech can be analyzed as a succession of phonemes, it can also be considered a succession of timbres” (p.60). Beyond the phonetic level, environmental and individual

28 physical condition (e.g. the length of vocal tract) may also affect the timbre of the speaker and hence our perception of it.

In instrumental timbre domain, we are able to distinguish different instruments because the spectral and temporal features help us recognize the timbre. As we know, energy distributions of harmonics of different instruments have different characteristics, for example, oboe would nevertheless resemble the sound from dizi because energy is distributed across the harmonics differently. The energy distribution could be expressed in the envelop shape, which typically include attack, decay, sustain and release phases, providing listeners important information on the tone color.

3.2 Preference to vocal sounds

As to the origin of music

Mithen (2005) argued that the Neanderthals hmmmmm communication system had already possessed the musical characteristic, and we may speculate that this musical characteristic is mainly by vocalization, since at that time the Neanderthals were unable to make instruments because of the lack of cognitive fluidity12.

Some literary works supported the view that vocal music is the first type of music. The author of Lü’s commentaries on History (Anonymous, n.d.) said that,

“Kongjia… made the song PoFu, which is the beginning of eastern music…the

woman sang a song named HouRenXiQi, which marks the beginning of southern

music…Two women sang, with the ending words ‘swallows fly’ is the origin of

12 Mithen proposed cognitive fluidity in comparison to domain-specific feature in the Neanderthals. He argued that the Neanderthals have isolated cognitive domains (e.g. social, technical, etc) but unable to make these cognitive domains connected. 29

northern music 13(孔甲…乃作为”破斧”之歌,实始为东音…女乃作歌,歌曰:”候人兮

猗。”实始作为南音…二女作歌,一终曰:“燕燕往飞。”实始作为北音)” (Ji Xia Ji

section, para. 3).

In Indian classical music, an important term raga is something we probably should know.

It refers to the melodic aspect and literally means coloring. Raga was traced back to “the chanting of the Vedas” (Ruckert & Widdess, 1998, p.92), suggesting that vocal form is the basis of Indian classical music.

It is believed that in the beginning, music making was probably pure vocal, and then the first kind of instruments human used perhaps was percussion, so that we can have vocal expressions with percussion accompanied, and this was relatively easy to achieve by drumming on the objects. And later on, our ancestors found we can manipulate objects to produce sounds, which were similar to our voice, so we have instruments that can play melody.14

That is to say, instrumental playing did not evolve as an independent music type

(instrumental music) at the very early stage; rather, its main function was to accompany vocal music. A lot of musical examples show that instruments are applied as accompaniment to vocal music. For example, in Africa, Mande musicians (griot) would sing out the history with Kora and Balaphone as accompanying instruments. Forest

13 Though Lü did not make a clear description of the form of the earliest western music, one may suspect that western music, like other three regional , comes from vocal form. 14 The oldest without doubt instrument was dated back to the Upper Paleolithic by the Homo sapiens, and it is a bone flute. However, percussions were believed to be at least co-existed even earlier than that melodic instrument (this is because even our ancestor treated the objects as percussions, we can hardly tell they were percussions as there was no archeological evidence). Fitch (2006) implied that chimpanzees and other nearest relatives’ drumming behaviors (e.g. they would drum on objects for instance to show their aggression) maybe an evolutionary homologue to human instrumental music. 30 people in central Africa, claps and other simple instruments are used to keep timeline for a special singing yodelling. And, for American Indians, instruments are relatively simple, with keeping rhythmic pattern as their main function, such as rattle and drum.

Significance of voice in ritual and ceremony

Voice is always valued in rituals and ceremonies. We can find some clues of the importance of singing in documents, such as The Rites of Zhou (Anonymous, n.d.). It says,

“For the Great Ritual, striking Fu to announce the start of singing, the greatest

master (blind) first sings on the stage. Then, after singing, striking Yin as the start of

instrument playing, instrumentalists under the stage begin to play … The same

procedure should be applied with the Daxiang Ritual (a ritual to welcome kings from

vassal states). For Dashe Ritual (an archery ritual to choose priests), the Master’s

singing sets the rhythm of archery (大祭祀, 帅瞽登歌, 令奏击拊, 下管播乐器, 令奏鼓

朄. 大飨, 亦如之. 大射, 帅瞽而歌射节)” (Chun Guan Zong Bo section, para. 108).

These statements explicitly demonstrate the priority of vocal music: the importance of voice is manifested through the player’s standing position. Singer stands on the stage, while instrumentalists are off the stage. In addition, the conductor is the greatest master, and his responsibility is to sing, rather than playing instruments.

In South America, Shamans would sing/chant to get access to certain supernatural power to heal the illness or communicate with spirits. And in Taiwan, in the native people’s sacrificial festival, such as Tsou’s Mayasvi, vocal music is used to welcome and see off

31 the God of War. But why not instruments? why vocal music would be a perfect medium to communicate with supernatural power or the God? As to the reason, it probably is because voice is believed to be the best media to express ourselves. Darwin (1872/1998) wrote, “with many kinds of animals, man included, the vocal organs are efficient in the highest degree as a means of expression” (p.88).

Imitating/Fusing vocal sounds in instrument playing/learning

The strong preference to vocal timbre can be reflected in the phenomenon that vocal-like timbre being imitated through instruments. Such priority of vocal sounds can be reflected in Duan Anjie’s comments (n.d.), where he wrote, “vocal music is the highest form of music. Thus, silk instrumental music is not as good as bamboo instrumental music, and the latter one is not valued as good as vocal music (歌者,乐之声也。故丝不如竹,竹不

如肉,迥居诸乐之上)” (Singing section, para. 1). Furthermore, one may notice that many writers would frequently link instrumental music playing with vocal timbre to describe music. For example, Su Shi (n.d.) wrote, “there is one guest, who accompanied my singing by blowing his Xiao according to the rhythm. How sad, just like someone is weeping, and how yearning, as if he is narrating (客有吹洞箫者,倚歌而和之。其声呜呜

然,如怨如慕,如泣如诉)” (para. 2).

In traditional Chinese silk instrumental music learning, masters would require students to sing the music with fine ornamentations first, and then imitate those treatments and timbre on instruments. In India, drum rhythms are learned through a set of verbal labels.

In South Indian classical music (Carnatic), voice occupies an important position. Not

32 only music works are supposed to be sung, but also instrumentalists are expected to be able to sing. In addition, instruments would perform in gayaki style, a singing style. And many instruments in carnatic music imitate the vocal timbre by playing within the register of voice and playing out the ornamentations features vocal flavor. Vocalists enjoy the highest place and Nettl indicated that Carnatic musical system as

“quintessentially vocal” (1995, p.60).

Summarizing, vocal music was evolutionarily earlier than instrumental music, which was used to accompany vocal music at first. And the importance of vocal sounds not only had been recorded in literature, but also can be seen in many musical phenomena today. But, there are many questions need to be asked: in cognition, how would these different timbres affect our sound processing? Will they have the same influence on our cognitive processing? If not, which one plays a more important role, and why?

3.3 Previous studies regarding vocal and instrumental sounds

The stronger preferences for vocal sounds lead us to ask why humans have such preferences to vocal timbre and how timbre affects our cognitive operations.

Many studies provided evidence arising from some other fMRI and electroencephalogram studies that have shown that instrumental and vocal sounds caused partly not overlapping activations and have different effects on brain activity. Belin,

Zatorre, Lafaille, Ahad, and Pike (2000) reported that vocal sounds (both speech and non- speech) triggered greater activation levels than non-vocal sounds in the STS, suggesting the existence of voice-selective regions in our brain. Levy, Granot, and Bentin (2001)

33 applied ERPs and found that a peaking at 320 ms only elicited by vocal tones but not instrumental tones. They implied that this component may “indicate differential pre- phonological processing of human voices” (p.2656).

Vocal timbre is biologically and socially much more significant than instrumental timbres, just as an analogue to the primacy of human faces than other objects. The Neanderthals and their ancestors utilized vocal sounds to interchange information, convey emotions, signal dangers, etc, and all these were based on the premise that one can quickly recognize vocal sounds. During the evolution, vocal sounds were so important that they may automatically catch our attention. This can be understood as cognitive adaptions and can be seen from a lot of studies. For example, Salame and Baddeley (1989) found that with vocal music playing, participants got lower accuracy in a short-term memory test than doing the same task under instrumental music condition. In line with this, Crawford and Strapp (1994) have shown that compared to instrumental music, vocal music had a more pronounced negative influence in participants’ performance in verbal logical reasoning task and scanning speed task. Boyle and Coltheart (1996) showed that vocal music had a significant disruptive influence on serial recalling, while the same effect was not found in instrumental music condition. Importantly, Cairns and Butterfield (1975) found that new-born babies had the capability to distinguish vocal sounds and instrumental sounds. They reported that when vocal sounds, new-born infants first stopped sucking with “a pronounced burst-pause pattern—as if waiting for more human signals” (p.59) then restarted. These studies implied that vocal sounds are more distracting due to the human preference for human voices. And we find these results are

34 quite interesting and may imply that the voice specific responses may have evolved for millions of years.

In addition, involving voice in instrument learning becomes explainable as researchers found that vocal sounds produced a more positive effect on music memory. Weiss,

Trehub, and Schellenberg (2012) reported that participants performed better in recognizing vocal melodies than instrumental ones in melody recall task. This probably is because we are more attuned to vocal sounds than instrumental sounds. Klyn et al. (2016) reported that the involvement of the articulatory loop, which helps people store auditory information in working memory by repeating, was only found in vocal rhythm memorization rather than instrumental memorization. These offer a very reasonable explanation of learning instruments via voice assistance as “verbal labels are better memorized than the original instrumental sound because they can be better maintained in working memory” (Will, 2018, p.8).

In terms of emotional arousal, Loui, Bachorik, Li, and Schlaug (2013) conducted an experiment. They used 16 song sections, each with a vocal (there were also 3 songs partially included non-verbal vocals) and an instrumental version to investigate the influence of timbre on arousal rating. Result showed that stronger arousal-enhancing effect was observed in vocal versions, and “songs that contained non-verbal vocal portions showed a similar effect size as songs containing verbal vocals” (p.54). Based on the results, the authors claimed that it is the timbre rather than language content that affects the increased arousal rating.

35

Another interesting and very important area, investigating the influence of timbre on linguistic processing and music cognition, however, until now, there are only few studies.

Will and Poss (2007) indicated that stronger facilitation by vocal primes than instrumental ones only exists in tonal language speakers. In a later study (2008), they investigated how pitch contours influence lexical processing by removing segmental information from primes. In terms of timbre, they found that subjects performed differently with vocal primes and instrumental primes. Interestingly, the result revealed a double priming effect when it comes to vocal ones. Hung (2011) conducted a behavioral and a fMRI study and found that participants performed significantly different when presented with vocal and instrumental rhythms. Her experiment involves vocal and instrumental rhythm stimuli. Participants were presented with trials containing both types of timbre. Hung reported that vocal rhythm elicited higher activation in brain, compared to instrumental ones. In addition, she suggested that timbre processing may occur in the early stages. In the behavioral study, the result showed that, response times were significantly shorter for vocal rhythm. A recent study has shown that vocal and instrumental rhythms lead to different short-term memorization, suggesting that sounds from different sources are processed differently (Klyn et al., 2016). They showed that, compared to vocal rhythm, though response to instrumental rhythm (by clapstick) was significantly more accurate, vocal rhythms elicit faster responses.

These studies suggest that we should not take sounds as a whole acoustic phenomenon, but investigate sounds from the perspective of timbre, like vocal and non-vocal. In addition, these experiments showed that cultural factors play an important role in the

36 interaction with timbre and sound processing (e.g. Will & Poss, 2007), which may lead us to suggest for example, whether music training will have a strong influence on both within-domain and cross-domain processing.

37

Chapter 4. Music training—a cultural factor affects pitch contour perception

Importance of cultural factors is widely recognized. Will mentioned, “cognition is shaped and formed by an intricate interplay of biological, environmental and experiential, i.e. cultural factors” (2016, p.279). Economic status, , language background, etc, all of these are cultural factors that can play an essential role in our .

Among these cultural factors, being a musician, someone who is spending a great amount of time in intensive musical practice, is worthy of attention. As and Karen

Banai indicated, “auditory processing is not a rigid, encapsulated process; rather … it

[could be] affected by experience, environmental influences… and active training” (2007, p.105). Musicians, who have regular music practice, their neural connection and pathways will be changed during training. By involving musicians and non-musicians into our study, we may be able to understand how this cultural factor affects the low-level pitch contour process.

4.1 The enhanced contour processing ability in musicians

Music training has a great impact on melodic contour perception. Fujioka, Trainor, Ross,

Kakigi, and Pantev (2004) conducted an experiment to explore how music expertise affects melodic contour processing. They used ascending melodic contours as standard stimuli and set corresponding melodic contour with a descending final note as deviant

38 stimuli. They found that MMN15 was significantly higher in musician group, compared to non-musicians, suggesting an enhanced ability to detect the contour structure in music domain. Bidelman, Hutka, and Moreno (2013) conducted an experiment, where they employed two tones either in ascending or descending direction, and participants with different language and music training background were required to identify the contour.

Reaction time results suggested that musicians even without tonal language background were better in melodic contour processing than English and Cantonese speaking non- musicians. Schön, Magne, and Besson (2004) manipulated the final note of music phrases, and found that musicians had a better performance in detecting subtle melodic contour changes than non-musicians, and this enhanced contour tracking ability was also found in musically trained children in a later related study conducted by Magne, Schön, and

Besson (2006). This supported that 3-4 years music training is already helpful in the improved melodic contour detecting ability. Trainor, Desjardins, and Rockel (1999) examined the relationship between musicianship and melodic contour processing ability.

The EEG data they collected showed that larger and earlier P3a and P3b were observed in the musician group than in the non-musician group when the contour was presented with small changes, though there was no significant difference between the groups for strong contour deviations.

Pitch contours not only exists in music, but also in language, and the latter seems to attract more attention, as a majority of studies regarding the influence of musicianship

15 MMN refers to mismatch negativity. "MMN is an ERP associated with automatic change detection in a repetitive auditory signal... It is typically studied in an 'oddball' paradigm in which a standard sound or sound pattern is presented repeatedly, with an occasional deviant introduced in a pseudorandom fashion" (Patel, 2008, p.27). 39 involved linguistic stimuli. Even with linguistic stimuli, musicians were found to outperform non-musicians in detecting/distinguishing pitch contour in most studies.

These studies showed the transfer effect, namely the transfer of acquired abilities to a different domain. These studies help us better understand the interaction between music and language processing, as they suggested the overlapped brain areas are involved in these two processing systems.

Musacchia (2008) indicated that “musicians have more robust representations of pitch periodicity and faster neural timing to sound onset when listening to sounds” (p.34).

Terry Gottfried (2004) used Mandarin stimuli and found that musicians outperformed non-musicians in linguistic pitch contour categorical discrimination and imitation task.

However, his subject recruiting is problematic. He mentioned that his subjects are college students and mainly in the first and second year. Importantly, he said, some non- musicians received music class and participated in ensemble in high school and even during college period. In addition, he mentioned that an important difference between the groups is that musicians learned music theory. However, since he did not clearly identify other music background information, we cannot tease apart other influences, for example, the training duration, or the onset time of their training.

Chao-yang Lee and Tsun-Hui Hung (2008) recruited musicians and non-musicians with a stricter requirement. In their study, none of the non-musicians reported that they have received formal music training. They did an experiment to examine musicians/non- musicians’ performance in four tones identification under three different syllable types.

40

The syllables are intact syllables, silent-center syllables with limited f0 information, and stimuli only contained syllable onsets, respectively.

Lee and Hung found that (results are shown in Figure 4.1) in the intact syllable condition, musicians had shorter reaction time and higher accuracy than non-musicians. In the silent-center syllable condition, musicians had a significant better performance on accuracy but not reaction time. For the stimuli that only contained syllable onsets, musicians didn’t show any significant better performance on reaction time. In addition, identification accuracy for both musicians and non-musicians fell down to chance level.

That is to say, for intact syllables, music training has an effect on linguistic pitch contour processing. Even for the syllables lacking part of the pitch information, musicians can reconstruct f0 pitch movement more accurately based on limited information. And as syllable onsets have no defined pitch, the performance for stimuli only contained syllable onsets showed that musical training has no advantage if there is no pitch information to process.

41

Figure 4.1 Musicians/non-musicians’ performance in four tones under three different syllable types (Lee and Hung, 2008, p.3239-3241)

Besides shorter reaction time and higher accuracy, as recorded in behavioral tasks, musicians' improved linguistic pitch contour processing ability has been demonstrated by electrophysiological data. Daniele Schön and colleagues (2004) conducted an experiment, where they changed the fundamental frequency of final notes and words16, and recorded

French subjects’ (musicians/non-musicians) behavioral responses and event-related potential (ERP) data17. Schön et al. found that only musicians showed a pattern that the stronger the violations, the shorter the reaction time. This was compatible with their ERP data, which showed that the onset response of the positive component (200-850 ms) in musicians group under stronger incongruity condition was significant earlier than weak

16 They manipulated the fundamental frequency by using WinPitch software. There were two types of incongruities: a weak one (increasing the f0 of final words/notes by 35%) and a strong one (by 120%). 17 The ERP recordings started from 150 ms before the stimuli onsets. Participants listen to sound stimuli through headphone while wearing a 28-electrodes EEG cap for registration of brain waves. Their article focused on the positive component between 200 ms to 850 ms. 42 incongruity condition (100 ms and 50 ms for notes and words, respectively). That is to say, after the onset of the sound, musicians detected the changes in linguistic pitch contour faster than non-musicians.

Céline Marie’s ERP data (2011) showed that in linguistic pitch contour discrimination task, for musicians, there was an increased N2/N3 potential, which came about 100 ms earlier compared to non-musicians. In details, N2/N3 component refers to stimulus discrimination. The increased amplitude of N2/N3 may be because of the "increased efficiency of neural networks" (Moreno et al., 2009, p.721) or the higher level of difficulty. However, the shorter latency excludes the latter possibility, otherwise, we should get longer latency. In addition, musicians demonstrated enhanced P3b potentials to linguistic pitch pattern variations in the 600-800 ms range, and the latency for P3b was

200 ms earlier than non-musicians. P3b is a component “associated with categorization and decision processes” (Marie et al., 2011, p.2708), reflecting musicians' higher level of confidence in confirming whether the consecutive words are same or different.

Detailed ERP data like these shed light on music expertise’s influence on linguistic pitch contour processing over the course of time. Such ERP recordings are of great value since they could be used as complementary data for future explorations, and may offer some clues on which stages for which music training enhances processing efficiency.

Patrick Wong examined brainstem encoding of linguistic pitch, and found that musicians show "more robust and faithful encoding" (2007, p.421) than non-musicians. The measurement of frequency following response (FFR) is employed in that paper, which

"encodes the energy of the stimulus fundamental frequency (f0) with high fidelity" (2007,

43 p.420). In the experiment, ten amateur musicians and ten non-musicians had been chosen, and the subjects were asked to listen to three kinds of Mandarin stimuli with different contours. From the experiment results, musicians showed more robust representation of the f0 contours representation. In addition, musicians showed stronger f0 amplitude of the

FFR, which means an enhanced signal-to-noise ratio in the subcortical level. That is important, because it "represents the average amount of spectral energy devoted to encoding the changing f0" (2007, p.421). Thus, by comparison of the subject groups

(musician/non-musician), the authors arrived at the conclusion that music expertise has a significant positive correlation with the pitch tracking ability. The importance of Wong's finding is that the evidence shows an enhancement of linguistic pitch features at subcortical level, which challenged the idea that "speech-specific operations probably do not begin until the signal reaches the cerebral cortex" (Scott & Johnsrude, 2003, p.100).

A longitudinal study showed that for a musically untrained child, a six-month music training program can increase performance in pitch contour-related tasks to a great extent

(Moreno et al., 2009). More than 30 musically untrained children were pseudorandomly assigned to receive training in music or painting for 6 months. Before training, Moreno et al. assessed the verbal and spatial ability of all participants, and did not find significant difference between groups. After training, children who received music training demonstrated better abilities in detecting small variations in linguistic pitch contour, while those in painting group didn’t show enhancement. Their results were in line with other studies showing the influence of music training on the pitch contour processing ability.

44

Though there are numerous studies regarding the relationship between musicianship and their enhanced contour tracking ability in both music and language, the majority of them were to test transfer effects that could happen from music domain to linguistic domain. In the current study, we are more interested in investigating the influence of musical training on how melodic contour affects linguistic processing, thus, we will review previous studies regarding this topic in the next section.

4.2 Previous studies regarding the interaction between melodic contour and lexical processing in terms of musicianship

Will and Poss (2008) showed melodic primes with matched contour with targets facilitated syllable repetition. In Poss' study (2012), he designed an experiment by adopting a priming paradigm and found that pitch contour facilitated linguistic processing.

However, to our knowledge, there were only few researchers who tapped into this interesting area in terms of musicianship. Will et al. (submitted) recruited four groups of participants with different language and music training background. They used priming paradigm, where melodic contour primes corresponded to the pitch contours of the four tones in and preceded either word or pseudo word targets. Comparison across four groups showed that there was a significant difference in their verbal response time. The Chinese musician group, as the only group with a dual advantage of both tonal language and music expertise, reacted significantly faster than the other three subject groups in the overall reaction time. The Chinese musicians responded 219 ms faster than

Chinese non-musicians. 45

1200

1000

800

600 rt(ms) 400

200

0 CM CN EM EN

Figure 4.2 The overall reaction time for the four subject groups. CM: Chinese musicians, CN: Chinese non-musicians, EM: English musicians, EN: English non-musicians (Will et al., submitted).

In addition, they calculated the facilitation effects across the four groups. By comparing the difference between RTmatch and RTcontrol, they found the greatest facilitation was in

Chinese non-musician group, while for Chinese musicians, the facilitation size was the smallest.

46

Figure 4.3 The facilitation effect across the four groups (Will et al., submitted)

By combining the factor of timbre, language background and musicianship, they found that compared to Chinese non-musicians, the facilitation size for both vocal and instrumental primes in Chinese musicians was smaller (Figure 4.4). This may be because of their long-term music practice, which made them able to clearly distinguish the two types of primes and less affected by the different timbre. In addition, since Chinese musicians always made the fastest response across groups, the smaller facilitation may be caused by the flooring effect. Chinese non-musicians, showed the greatest facilitation in both vocal and instrumental primes. And with both instrumental and vocal primes,

English musicians showed greater facilitation size than English non-musicians. Cheong et al. (2017) explored the priming effect caused by vocal and instrumental stimuli in musicians and non-musicians with and fMRI study. They found that in an active listening task, musicians showed strongest brain activation for instrumental primes, while non- 47 musicians showed highest activation for vocal primes. They reported that “differences in effects of vocal and instrumental primes in musicians and non-musicians largely correspond to differential activation changes in the right superior temporal gyrus” (p.35).

Figure 4.4 The facilitation effect for different kinds of primes across the four groups. i: instrumental primes, v: vocal primes, M: musicians, N: non-musicians, C: Chinese, E: English (Will et al., submitted).

The studies we reviewed showed that rather than take music and language as unassociated independent system, we probably should be more open to the interactions between these two domains. Of course, music training's influence on linguistic processing is only one of the associations and could be seen as a starting point to the broad picture of the interactions between music and language.

In addition, as we have seen, many studies have shown the interaction between music learning and linguistic processing, thus people may expect a close relationship between 48 music and language sound system. Further, as studies showed, there are a lot of factors functioning in this complex issue. In sum, as a relatively young field, though the interactions between these two domains are still not well understood, new questions and new technologies may open new lines for future study. By answering questions toward how cultural factors, e.g. musicianship, etc, influence the interactions between music and language, we may be able to understand the deeper connections between these two domains.

49

Chapter 5. Pitch contour priming experiments

5.1 Introduction

Melodic contour, as Edworthy (1983) defined, refers to “the sequence of ups and downs independent of the precise pitch relationships” (p.263). It is considered as one of the basic aspects of music and easier perceptible than intervals (e.g. Edworthy, 1985; Patel, 2008).

Edworthy (1985) conducted several experiments, where participants were required to detect violations in interval and contour tasks. She pointed out that “contour information is immediately available to the listener regardless of novelty, familiarity, transposition, or nontransposition” (p.172), while interval processing is more dependent on tonality.

Different from many other musical abilities, the ability to process contour is widespread in not only musicians but also non-musicians, not only in adults but also infants (e.g.

Plantinga & Trainor, 2005; Dowling, 1999; Trehub, Bull, & Thorpe, 1984). Currently, researches concerning how contour information affects linguistic processing gradually set up a bridge connecting linguistics and musicology, which may further offer new evidence in explaining musical phenomenon, such as musical speech surrogates (e.g. Poss, Hung,

& Will, 2008; Will & Poss, 2008; Poss, 2012; Cheong et al., 2017).

Recent work regarding the role of contour in linguistic processing adopted different tasks and arrived at conflicting results. However, many interpretations are complicated by the fact that the stimuli used also contain segmental information. We have previously

50 introduced to use contour stimuli without segmental information, which allows us to specifically focus on the effect of tonal information. Results have shown that pitched contour facilitated responses to both words and pseudo-words compared to non-pitched noise (e.g. Will & Poss, 2008; Poss, 2012). In addition, many factors were found to be important in how melodic contours influence the responses to target words. For example, effects of timbre and lexical category were reported (e.g. Will et al., submitted); an inhibitory effect was found when prime and target shared tone information only when primes contained segmental information unrelated to the target (Will & Poss, 2008). In addition, cultural factors, such as musical training and language background were found to be important in the process (Will et al., submitted).

However, a more refined lexical category classification needs to be made, as previous studies did not distinguish between pseudo-words and non-words and hardly classified syllables with focus on tonal information. In addition, though a general match/non-match relationship can give a lot of information on how tonal information affects linguistic processing, we need to further examine specific melodic contours and how they interact with other factors. As indicated in previous chapter, ascending and descending contour types may have different places in folksongs. By investigating specific contours, we may answer some interesting questions, such as, whether musicians deal with different contour shapes in the same way as non-musicians do. What is more, the interesting factor gender was ignored in previous studies. Whether males differ significantly from females?

How does this factor interact with other factors? These will be answered in current study.

51

Most of our previous studies dealt with shadowing task, and only one involved lexical decision task. In this study, we conducted two experiments, the first involving a lexical decision task, and the second involving a shadowing task. The main reason for using two tasks is to test at which level of cognitive processing there is an interaction between the prime and target contours. Different from a shadowing task, where participants are required to repeat the words they hear, a lexical decision task requires them to retrieve additional information from the mental lexicon and then makes a decision based on the search result. In addition, importantly, one of the motivations to conduct a shadowing task is that, in the lexical decision task, if the response is wrong, when participants responded to the stimuli set via computer keys, we could not know whether that is because of a wrongly processed tone or inaccurate phoneme coding. On the other hand, some studies suggested that lexical decision task may mask the priming effect, which prompts us to test it in a different task (Balota & Lorch, 1986).

First, as indicated in previous chapters, because musicians showed better performance than non-musicians in contour discrimination and recognition, we hypothesized that in both experiments musician group will outperform non-musician group. Second, due to the biological and social significance of human voices, it is hypothesized that vocal primes will prime more than instrumental ones. Third, as preceding stimuli may raise the activation of subsequent candidates if they share a similar feature, we hypothesized that primes and targets with matching tones will have greatest priming effect. Fourth, many studies showed that speech reproduction through lexical pathway was more reliable and faster than non-lexical pathway, thus we hypothesized that responses to non-word and

52 pseudo-word will be slower than to words. In addition, due to the higher level lexical decision process that is only involved in lexical decision task, it is also hypothesized that responses in the first experiment would be slower than that in the second experiment.

5.2 Materials

Primes

We used two types of primes, one was vocal, and the other instrumental. All primes had a melodic contour corresponding to one of the four Mandarin speech tones. Vocal contour primes were created by extracting the contour information from spoken syllables recorded by a native female speaker and were then resynthesized with a formant spectrum corresponding to the resonance of the human vocal tract. Thus, there was no segmental information in vocal primes. The fundamental frequency range of the vocal primes for tone 1 is about 246 Hz, for tone 2 it rises from 184 to 222 Hz, for tone 3 the F0 first drops from 179 to 151 Hz and then rises to 223 Hz, and for tone 4 the F0 goes down from 284 to 178 Hz. The instrumental contour primes were produced by playing a traditional Chinese instrument, Xiao, a vertical aerophone. Inspired by Rialland's study

(2005), the instrumental prime pitch was manipulated so that the fundamental frequency of the instrumental primes corresponded to the second formant of the vocal contour primes, which made them perceptually similar. The fundamental frequency range for the instrumental contour primes for tone 1 is about 442 Hz, and for tone 2 it rises from 335 to

400 Hz, for tone 3 the F0 first drops from 332 to 295 Hz, and then rises to 366 Hz, and for tone 4 the F0 goes down from 441 to 372 Hz. Besides these two types of primes with 53 timbral contrast, a control prime was constructed by convolving a spoken syllable’s amplitude envelop with white noise. Thus, the control prime did not offer any segmental or tonal information. All the primes (instrumental, vocal and control) were all set to have the same duration of 420 msec, and RMS power (both average and total) was similar around -23 dB.

Figure 5.1 Instrumental (left) and vocal (right) primes in the four standard Mandarin tones (1, 2, 3, and 4; from left to right). Top: waveforms; bottom: spectrograms with pitch (blue) and intensity (yellow) curves.

Targets

In this study, we took Chinese, a tonal language, where lexical tones offering pitch contrast are used to mark word meanings as targets. There are four tones, marking high- level pitch, rising pitch, low-dipping pitch and falling pitch, respectively. For example, mā (tone 1) can mean mother, má (tone 2) hemp, mǎ (tone 3) horse, and mà (tone 4) to scold. In the both experiments, 112 spoken syllables were equally distributed in four tones, that means, for each tone, there were 28 target syllables. All the target syllables

54 were recorded by a native female speaker at a sampling rate of 44.1 kHz. The stimuli are listed in Table 5.1, and numbers in brackets indicate how many phonemes in the syllables.

Table 5.1 The list of word, nonword and pseudoword syllables

Non-word list Tone 1 Tone 2 Tone 3 Tone 4 fi (2) hi (2) fi (2) fi (2) gi (2) ki (2) ki (2) hi (2) ten (3) fiu (3) lua (3) fra (3) dua (3) gia (3) kia (3) gin (3) tua (3) mia (3) fia (3) kin (3) luai (4) muai (4) fuan (4) nuai (4) fuai (4) buan (4) prai (4) luai (4) fian (4) puan (4) buai (4) fiao (4) Pseudo-word list Tone 1 Tone 2 Tone 3 Tone 4 er (2) ce (2) ri (2) ka (2) nu (2) ku (2) le (2) za (2) ran (3) ban (3) zei (3) lia (3) rao (3) jin (3) ruo (3) nin (3) cou (3) lie (3) qun (3) qiu (3) mian (4) seng (3) shei (3) mang (3) niao (4) shuo (3) hang (3) nuan (4) long (3) suan (4) huai (4) keng (3) Word list Tone 1 Tone 2 Tone 3 Tone 4 ha (2) pa (2) ba (2) sa (2) la (2) ma (2) ke (2) he (2) ka (2) da (2) ge (2) ne (2) ca (2) de (2) na (2) ce (2) pai (3) bai (3) kai (3) pan (3) gan (3) han (3) kan (3) man (3) san (3) can (3) dai (3) dan (3) gai (3) hai (3) lan (3) ban (3) geng (3) neng (3) deng (3) beng (3) peng (3) teng (3) meng (3) mian (4) guan (4) heng (3) leng (3) duan (4) pian (4) ceng (3) bian (4) cuan (4)

55

In addition, targets were also divided into three categories, as words (n=48), non-words

(phoneme combination unacceptable, n=32) and pronounceable pseudo words (n=32).

Non-words are pronounceable phoneme combinations (syllables) not used in Mandarin, for example, /fiān/. Pseudo-words are phoneme combinations that are used as words in

Mandarin, but not in combination with the specific tone (s), for example, /chuǐ/ (though

/chui/ in tone 3 does not exist in Mandarin Chinese, /chui/ in tone 1 and tone 2 is used.)

All the syllables were edited with audacity with silent sections before and after each spoken syllable being cut. In addition, for target syllables there were three different phoneme number sets, syllable with 2 (8 items for non-words, 8 for pseudo-words, 16 for words), 3 (12 items for non-words, 19 for pseudo-words, 26 for words), and 4 phonemes

(12 items for non-words, 5 for pseudo-words, 6 for words), respectively.

There were three types of prime conditions (the relationship between prime and target), match, non-match and control. If the prime shares the same contour with the target, it is in the match condition, and if prime differs from the target in contour, it is in the non- match condition. For the control condition, it means the prime involved is a control prime

(white noise) with no tonal contour.

5.3 Experimental Variables

In both experiments, there are two kinds of experimental variables (listed with capital letters): one consists of fixed factors (as main variables), and the other of random factors.

We set the following parameters as the fixed factors: ‘TIMBRE’, which refers to the prime types, including three-levels: white noise, vocal, and instrumental primes. By

56 taking this factor as main variable, we could detect whether timbre will have an influence on subjects’ performance. ‘STATUS’, refers to musical training, namely, classification of the subjects either as musician or non-musician. This variable allows us to investigate whether differences resulting from musical training affect response time and accuracy.

‘RELATIONSHIP’, refers to the contour relationship between prime and target items.

There are three types of relationships, match, non-match, and control. ‘LEXSTATUS’, refers to the lexical category of targets. There are three types of targets, words, non- words, and pseudo-words. ‘GENDER’, refers to the sex state of participants. By taking gender into consideration, we can see whether there is any difference caused by it. These factors make it possible to explore how melodic contour and cultural factor affect linguistic processing. We set variations between subjects and variations between target syllables as random factors respectively, since we want to generalize the conclusion to the whole population and target syllables.

5.4 Experiment 1

The first experiment involves a lexical decision task. Participants were asked to classify as quickly and accurately as possible the auditory target items were words of the Chinese language or not by pressing corresponding buttons.

Participants for experiment 1

We recruited all participants in China. Since we are interested in effects of contour primes on tone language speakers, we chose Chinese participants and Chinese stimuli in this study. To qualify as Chinese musicians, subjects needed to satisfy all the following

57 criteria: 1) at least 5 years of formal musical training18, 2) ongoing active musical engagement at the time of participation, 3) self-identification as a musician, and 4) native

Chinese speaker.

Those participants who are native Chinese speakers but did not meet all the other three criteria were categorized as Chinese non-musicians.

There was no specific requirement for their major, since the influence of their specific musical expertise (singing, instruments) is not a factor for this study (though we still collected this information in the questionnaire). All musicians had passed a competitive entrance examination before they entered their program, and were required to take ear training and sight singing class in the first two years. Participants in this lexical decision task consisted of 40 adults (mean age 20± 4 years for non-musicians, 21± 3 years for musicians, musician number=20, non-musician number=20) with gender-matched, i.e. 10 males and 10 females in each group. Musicians at least had 5-years formal music training, and for non-musicians, none had any formal music training. All of the participants were right-handed based on an adapted version of Stanley Coren’s questionnaire

(Anonymous, n.d.). None of the participants reported abnormal hearing or motor disability.

All participants gave their informed consent before participating in this study in accordance with the Ohio State University Institutional Review Board regulations, and all of them completed the task.

18 Formal musical training in our study mainly deals with the context where the participants acquired systematic musical training. If a participant receives professional music training in a conservatoire or music department, we call him/her as someone with formal musical training. 58

Experimental design and procedure for experiment 1

In this study, we used a priming paradigm, where stimuli were presented in pairs: each trial consisted of a prime and a target syllable. The stimulus onset asynchrony (SOA, the time interval between the end of the prime and the start of the target syllable) was set to

250 ms.

For each subject, trials were divided into 3 sections, thus participants would have some rest time during the experiment. We set 3000 ms as time limit for participants’ response.

The total number of prime & target pairs presented in each run was 336. Each target syllables appeared three times in each run with either match/non-match/control condition

(with 112 in match condition, 112 in non-match condition and 112 in control condition).

Target syllable would not appear more than once in one section. In addition, every syllable appears with control prime, but half of the syllables appear with vocal, the other half with instrumental primes. To investigate the influence of timbre, we assigned the same target syllable with different prime types (vocal/instrumental) in counterpart script versions, thus produced two versions that we call basic versions 1 and 2. Simplified examples are shown below. For example, in basic version 1, if fian1 appears with vocal prime under non-match condition in section 1, then it will be presented with vocal prime under match condition in another section, and with white noise prime in another section.

And in a counterbalanced basic version 2, fian1 will be associated with instrumental prime under a match condition and a non-match condition, and also will be presented with control prime.

59

Table 5.2 Basic version 1

Target syllables Prime type prime-target contour relationship Word Non- Pseudo- Instrumental Vocal control match non- control word word match Section da2 ╳ ╳ 1 fian1 ╳ ╳ lia4 ╳ ╳ kan3 ╳ ╳ luai4 ╳ ╳ mian1 ╳ ╳ Section da2 ╳ ╳ 2 fian1 ╳ ╳ lia4 ╳ ╳ kan3 ╳ ╳ luai4 ╳ ╳ mian1 ╳ ╳ section da2 ╳ ╳ 3 fian1 ╳ ╳ lia4 ╳ ╳ kan3 ╳ ╳ Luai ╳ ╳ mian1 ╳ ╳

60

Table 5.3 Basic version 2

Target syllables Prime type prime-target contour relationship Word Non- Pseudo Instrumental Voca control match non- contro word -word l matc l h Sectio da2 ╳ ╳ n 1 fian1 ╳ ╳ lia4 ╳ ╳ kan3 ╳ ╳ luai4 ╳ ╳ mian1 ╳ ╳ Sectio da2 ╳ ╳ n 2 fian1 ╳ ╳ lia4 ╳ ╳ kan3 ╳ ╳ luai4 ╳ ╳ mian1 ╳ ╳ section da2 ╳ ╳ 3 fian1 ╳ ╳ lia4 ╳ ╳ kan3 ╳ ╳ luai4 ╳ ╳ mian1 ╳ ╳

Then, In order to avoid the effects resulting from a fixed section order, which may

compound the experiment result, we designed another 4 variant versions. The only

difference among versions was the presentation order of sections. Thus, we had 6

different scripts in total.

In this experiment, participants were asked to classify the auditory target items as either

words or non-words as quickly and accurately as possible. The lexical decision

61 experiment was conducted on a Sony Vaio laptop, and all stimuli were controlled and presented by DMDX software (Forster & Forster, 2003). There were two keys assigned for the responses, with a left control button labeled ‘Not Real Words’, and right control button labeled Real Words. Response time and accuracy was recorded for each target syllable. The response times were automatically measured from the start of the target syllables to the onset of button response. All stimuli were played through a headphone connected to the laptop.

Before the experiment, subjects took a trial to practice. During the informal practice session, participants were able to adjust the volume of the headphone, and ask questions regarding the practice. The trail consisted of 4 pairs of primes and targets (not part of the experimental trials), with targets in all four tones, and presented with either instrumental/vocal/control prime in match/non-match condition. Participants were instructed to ignore the first sound of a sequence, and only pay attention and make a decision on whether the spoken syllable (the second sound of the sequence) is a real word or not by pressing the registered buttons as quickly and accurately as possible. The experiment took about 35 minutes in total, including the practice session.

Result for experiment 1

We removed responses made in less than 200 ms from the onset of target (0.02%), and those responses made in more than 2000 ms from the onset of target (1.56%), because responses beyond these thresholds may likely be a guess or anticipation.

62

The response time was automatically calculated from the beginning of the targets; however, there is a potential problem with this setting. That is because, target syllables do not have identical length. For example, in Figure 5.2, three target syllables with 400 ms,

600 ms and 800 ms duration were all responded to at 1000 ms after the start of targets.

However, since different syllables have quite different lengths, if we set the start of target as zero point, we probably will miss a lot of information. To make our argument more reasonable, we set the end of targets as point zero. Hence, the reaction times for the three targets were 600 ms, 400 ms, and 200 ms respectively. This shows that if we were to calculate response times from target onsets, different target lengths could affect the results significantly.

One point that needs to be clarified is that subjects sometimes responded before the end of the target items (in this case, the reaction times would be shown as negatives).

However, this does not mean the response is invalid. such a response was not necessarily taken as anticipation, because identification of lexical entries can happen before a syllable is fully presented (because the “lexical search” seems to happen in parallel to the processing of input information). Thus, the negative reaction times only demonstrated that participants successfully used their background or other sources to make the lexical decision before the target offset.

63

Figure 5.2 Response time in lexical decision task

Response Time:

Response times were analyzed based on correct responses, so we removed those incorrect answers, which counts for 11.3% of the items. We applied SAS software (University

Edition) with MIXED (effect) procedure to examine how fixed factors affect the lexical decision processing. Due to the complexity of experimental design, using a couple of nesting options in order to avoid too long run times of the experimental sessions, we did not apply mixed random effects modeling (i.e. taking both SUBJECT (participants) and

64

ITEMS (target syllables) as random factors in one model). Rather, mixed effect models that included random effects for subjects with intercepts (F1) 19 and for items with intercepts (F2)20 were constructed for reaction time as dependent variable. For the experimental design, Subjects were nested within STATUS and GENDER, and target syllables were nested within LEXSTATUS and TONE (target tones).

We arrived at the minimal model that best described our data with the following procedures: a) First listed all main factors and all interactions. Run the model. b) Then remove one main factor or an interaction. Compare the two models (the

previous and the reduced one) with a likelihood ratio test. If there is a significant

difference in the likelihood ratios, put the factor back; otherwise proceed with

eliminating the next non-significant factor.

Factors STATUS, GENDER, LEXSTATUS, TIMBRE, RELATIONSHIP and all the 2- and 3-way interactions between and among these factors were in the initial full model.

The following statements in PROC MIXED indicate the final minimal model:

Proc mixed21;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP SUBJECT

Model realresp=LEXSTATUS GENDER LEXSTATUS*RELATIONSHIP

STATUS*LEXSTATUS*GENDER /ddfm=satterthwaite solution;

Random SUBJECT SUBJECT(GENDER) SUBJECT(STATUS);

19 This model is analogue to by-subject analysis. 20 This model is analogue to by-item analysis. 21 The by-subject analysis 65

Run;

Proc mixed22;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP TONE ITEM;

Model realresp=STATUS LEXSTATUS GENDER STATUS*GENDER

LEXSTATUS*RELATIONSHIP STATUS*LEXSTATUS*GENDER /ddfm=satterthwaite solution;

Random ITEM ITEM(TONE) ITEM(LEXSTATUS);

Run;

Our results showed that the effect of LEXSTATUS reached significance in both by- subject (F1) and by-item (F2) analysis (F1(2,39)=373.4, p˂0.0001,

F2(2,111)=18.08,p˂0.0001). Non-words were responded to most quickly, with a mean response time of 374.75 ms. Words were in the middle, averagely taking 450.06 ms to give a response, and pseudo-words required longest time (525.64 ms) to make a button decision. Figure 5.3 shows the mean reaction time with different lexical categories.

22 The by-item analysis 66

600

500

400

300

200 reaction time reactiontime (ms)

100

0 Non-words pseudo-words words

Figure 5.3 The mean reaction time for non-words, pseudo-words and words (expt 1)

The main effect of GENDER reached significance in both by-subject (F1) and by-item

(F2) analysis (F1(1,39)=10.53, p=0.0025, F2(1,111)=623.63, p˂0.0001). Female participants on the average took 392.19 ms to give a feedback, and male participants used

504.25 ms to respond. There was a main effect of STATUS only in by-item analysis

(F2(1,111)=187.52, p˂0.0001). Musicians and non-musicians showed mean reaction times of 416.26 ms and 478.85 ms , respectively, but this difference was not significant in the by subject analysis.

There was also a 2-way interaction LEXSTATUS*REALTIONSHIP in both by-subject

(F1) and by-item (F2) analysis (F1(4,39)=2.65, p=0.0145, F2(4,111)=2.37, p=0.0275).

67

550 530 510 490 470 450 430

410 reaction time reactiontime (ms) 390 370 350 control match non-match

non-word word pseudo-word

Figure 5.4 The LEXSTATUS*RELATIONSHIP interaction in terms of reaction time (expt 1)

Non-words were responded fastest under all conditions. Under match condition, facilitatory effect was only found in words (-20.58 ms). Under non-match condition, a facilitatory effect was found with non-words (-19.36 ms) and words (-6.19 ms). Results for post-hoc test (lsmeans) for the LEXSTATUS*RELATIONSHIP (with Bonferrioni correction for multiple comparison) are listed below.

68

Table 5.4 Post-hoc test for the LEXSTATUS*RELATIONSHIP (expt 1)

Interaction Level A Interaction level B p-value non-word: control (380.71 ms) pseudo-word: control (515.55 ms) ˂0.0001 non-word: control (380.71 ms) word: control (459 ms) ˂0.0001 non-word: match (382.34 ms) pseudo-word: match (526.54 ms) ˂0.0001 non-word: match (382.34 ms) word: match (438.42 ms) ˂0.0001 non-word: non-match (361.35 ms) pseudo-word: non-match (534.53 ms) ˂0.0001 non-word: non-match (361.35 ms) word: non-match (452.81 ms) ˂0.0001 pseudo-word: control (515.55 ms) word: control (459 ms) ˂0.0001 pseudo-word: match (526.54 ms) word: match (438.42 ms) ˂0.0001 pseudo-word: non-match (534.53 ms) word: non-match (452.81 ms) ˂0.0001

A 2-way interaction STATUS*GENDER only emerged in by-item analysis (F2

(1,111)=70.92, p˂0.0001). Female musicians had the shortest reaction time (376.07 ms), and the difference between responses from female and male was larger in non-musicians

(142.92 ms).

69

600

550

500

450 reaction time reactiontime (ms) 400

350 female male

musician non-musician

Figure 5.5 The STATUS*GENDER interaction in terms of reaction time (expt 1)

A 3-way interaction STATUS*LEXSTATUS*GENDER reached significance in both by- subject (F1) and by-item (F2) analysis (F1(2,39)=7.83, p˂0.0001, F2(2,111)=10.23, p˂0.0001).

70

700 650 600 550 500 450

400 reaction time reactiontime (ms) 350 300

female male

Figure 5.6 Reaction time for the STATUS:LEXSTATUS:GENDER interaction. M represents musicians, and nm represents non-musicians (expt 1)

The largest difference between females and males was in non-musician group with pseudo-word targets. Results for post-hoc test (lsmeans) for the

STATUS*LEXSTATUS*GENDER (with Bonferrioni correction for multiple comparison) are listed below.

71

Table 5.5 Post-hoc test for the STATUS*LEXSTATUS*GENDER (expt 1)

Interaction Level A Interaction level B p-value musician: non-word: female (316.24 musician: pseudo-word: female ˂0.0001 ms) (468.97 ms) musician: non-word: female (316.24 musician: word: female (363.86 0.0021 ms) ms) musician: non-word: male (371.28 ms) musician: pseudo-word: male ˂0.0001 (519.42 ms) musician: non-word: male (371.28 ms) musician: word: male (477.53 ms) ˂0.0001 musician: pseudo-word: female (468.97 musician: word: female (363.86 ˂0.0001 ms) ms) musician: pseudo-word: male (519.42 musician: word: male (477.53 ms) 0.0005 ms) non-musician: non-word: female non-musician: pseudo-word: female ˂0.0001 (332.53 ms) (462.85 ms) non-musician: non-word: female non-musician: word: female ˂0.0001 (332.53 ms) (424.01 ms) non-musician: non-word: male (479.85 non-musician: pseudo-word: male ˂0.0001 ms) (651.62 ms) non-musician: non-word: male (479.85 non-musician: word: male (535.82 ˂0.0001 ms) ms) non-musician: pseudo-word: female non-musician: pseudo-word: male 0.0071 (332.53 ms) (651.62 ms) non-musician: pseudo-word: female non-musician: word: female 0.0062 (332.53 ms) (424.01 ms) non-musician: pseudo-word: male non-musician: word: male (535.82 ˂0.0001 (651.62 ms) ms)

The factor RELATIONSHIP did not reach significance, so we would zoom in to see whether and how specific prime contours affect participants’ performance. We set a new variable ‘PRIMETONE’ to test our hypothesis. It refers to the shape of the prime, including horizontal, ascending, convex and descending. By considering prime tone and

72 target tone together, we found that the interaction between PRIMETONE and TONE reached significance (F=4.38, p˂0.0001).

650

600

550

500

450 reaction time reactiontime (ms) 400

350 control flat primes rising primes low-dipping falling primes primes

t1 target t2 target t3 target t4 target

Figure 5.7 The interaction PRTONE*TONE in terms of reaction time (expt 1)

Tone 3 targets had the shortest reaction times. Targets with a falling tone were responded to with a longer reaction time than the other targets. All rising primes facilitated responses. Results for post-hoc test (lsmeans) for the PRTONE*TONE (with Bonferrioni correction for multiple comparison) are listed below.

73

Table 5.6 Post-hoc test for the PRIMETONE*TONE (expt 1)

Interaction Level A Interaction level B p-value control: T1 (445.57 ms) control: T4 (560.23 ms) 0.0027 control: T2 (441.14 ms) control: T4 (560.23 ms) 0.0006 control: T3 (359.89 ms) control: T4 (560.23 ms) ˂0.0001

Pfall: T1 (433.63 ms) Pfall: T4 (559.66 ms) 0.0021

Pfall: T2 (431.97 ms) Pfall: T4 (559.66 ms) 0.0238

Pfall: T3 (355.94 ms) Pfall: T4 (559.66 ms) ˂0.0001

Pflat: T1 (433.96 ms) Pflat: T4 (596.98 ms) 0.0004

Pflat: T2 (483.23 ms) Pflat: T4 (596.98 ms) 0.0022

Pflat: T3 (366.24 ms) Pflat: T4 (596.98 ms) ˂0.0001

Plow-dipping: T1 (461.9 ms) Plow-dipping: T4 (560.34 ms) 0.0264

Plow-dipping: T2 (373.66 ms) Plow-dipping: T4 (560.34 ms) 0.0004

Plow-dipping: T3 (362.75 ms) Plow-dipping: T4 (560.34 ms) ˂0.0001

Prising: T1 (406.18 ms) Prising: T4 (493.87 ms) 0.0397

Prising: T3 (354.44 ms) Prising: T4 (493.87 ms) 0.0002

And the interaction PRIMETONE*TONE*STATUS reached significance (F=2.76, p˂0.0001).

74

700 650 600 550 500 450

400 reaction time reactiontime (ms) 350 300

musician non-musician

Figure 5.8 The STATUS:PRTONE:TONE interactions in terms of reaction time (expt 1)

Basically, we have the same results as for the two-way interaction of PRTONE*TONE, but the size of this two-way interaction was dependent on status: the difference between musicians and non-musicians was greater for Tone 2 and Tone 4 than for other target tones. Results for post-hoc test (lsmeans) for the PRTONE*TONE*STATUS (with

Bonferrioni correction for multiple comparison) are listed below.

75

Table 5.7 Post-hoc test for the PRIMETONE*TONE*STATUS (expt 1)

Interaction Level A Interaction level B p-value control: T1: musician (415.74 ms) control: T4: musician (533.84 ms) 0.0156 control: T1: non-musician (473.86 control: T4: non-musician (584.97 0.013 ms) ms) control: T2: musician (397.84 ms) control: T4: musician (533.84 ms) 0.0005 control: T3: musician (315.15 ms) control: T4: musician (533.84 ms) ˂0.0001 control: T3: non-musician (402.9 ms) control: T4: non-musician (584.97 ˂0.0001 ms)

Pfall: T1: non-musician (459.57 ms) Pfall: T4: non-musician (597.53 ms) 0.0045

Pfall: T3: musician (324.95 ms) Pfall: T4: musician (520.61 ms) ˂0.0001

Pfall: T3: non-musician (385.21 ms) Pfall: T4: non-musician (597.53 ms) ˂0.0001

Pflat: T1: non-musician (456.05 ms) Pflat: T4: non-musician (637.87) 0.0006

Pflat: T2: non-musician (506.21 ms) Pflat: T4: non-musician (637.87) 0.0058

Pflat: T3: musician (365.37 ms) Pflat: T4: musician (555.62 ms) 0.0009

Pflat: T3: non-musician (367.11 ms) Pflat: T4: non-musician (637.87) ˂0.0001

Plow-dipping: T2: musician (341.78 ms) Plow-dipping: T4: musician (538.2 ms) 0.0259

Plow-dipping: T3: musician (338.71 ms) Plow-dipping: T4: musician (538.2 ms) ˂0.0001

Plow-dipping: T3: non-musician (386.18 Plow-dipping: T4: non-musician (582.2 ˂0.0001 ms) ms)

Prising: T3: musician (327.23 ms) Prising: T4: musician (461.97 ms) 0.0192

Prising: T3: non-musician (379.94 ms) Prising: T4: non-musician (524.99 ms) 0.0129

Accuracy

As the accuracy is a binary variable (either right or wrong decision), we applied PROC

GLIMMIX with binomial distribution and logit function to analyze accuracy. We arrived at the following minimal model in the same way described above:

Proc glimmix23;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP SUBJECT

23 This is the by-subject analysis 76

Model correct=STATUS LEXSTATUS STATUS*LEXSTATUS

STATUS*LEXSTATUS*GENDER /distribution=binomial ddfm=satterthwaite solution;

Random SUBJECT SUBJECT(GENDER) SUBJECT(STATUS);

Run;

Proc glimmix24;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP TONE ITEM

Model correct=LEXSTATUS STATUS*LEXSTATUS STATUS*LEXSTATUS*GENDER

/distribution=binomial ddfm=satterthwaite solution;

Random ITEM ITEM(TONE) ITEM(LEXSTATUS);

Run;

The results showed that the main effect of LEXSTATUS was significant in both by- subject (F1) and by-item (F2) analysis (F1(2,39)=69.43, p˂0.0001, F2(2,111)=13.49, p˂0.0001). On the average, participants reached 95.35% correct rate for non-words,

88.27% for pseudo-words, and 87.86% for words.

24 This is the by-item analysis 77

96

94

92

90

Correct Correct rate(%) 88

86

84 non-words pseudo-words words

Figure 5.9 The correct rate for words, non-words and pseudo-words (expt 1)

The effect of STATUS reached significance only in by-subject analysis (F1(1,39) =5.89, p=0.0198). Non-musicians (91.82%) had a higher correct rate than musicians (88.44%).

78

93

92

91

90

89 correct correct rate(%) 88

87

86 musicians non-musicians

Figure 5.10 The correct rate for musician group and non-musician group (expt 1)

The 2-way interaction STATUS*LEXSTATUS reached significance in both by-subject

(F1) and by-item (F2) analysis (F1(2,39)=11.91, p˂0.0001, F2(2,111)=22.32, p˂0.0001).

79

98 96 94 92 90 88 86

correct correct rate(%) 84 82 80 78 non-word word pseudo-word

musician non-musician

Figure 5.11 The STATUS*LEXSTATUS interaction in terms of accuracy (expt 1)

Musicians and non-musicians responded to non-word more uniformly, and the difference between musicians and non-musicians was largest with pseudo-word targets than the other targets in different lexical categories. Results for post-hoc test (lsmeans) for the

STATUS*LEXSTATUS (with Bonferrioni correction for multiple comparison) are listed below.

80

Table5.8 Post-hoc test for the STATUS*LEXSTATUS (expt 1)

Interaction Level A Interaction level B p-value musician: non-word (95.01%) musician: pseudo-word (84.16%) ˂0.0001 musician: non-word (95.01%) musician: word (86.89%) ˂0.0001 musician: pseudo-word (84.16%) non-musician: pseudo-word (92.44%) ˂0.0001 non-musician: non-word (95.69%) non-musician: pseudo-word (92.44%) 0.0007 non-musician: non-word (95.69%) non-musician: word (88.83%) ˂0.0001 non-musician: pseudo-word (92.44%) non-musician: word (88.83%) 0.0019

The 3-way interaction STATUS*LEXSTATUS*GENDER was significant in both by- subject (F1) and by-item (F2) analysis (F1(2,39)=10.73, p˂0.0001, F2(2,111)=13.64, p˂0.0001).

100

95

90

85

80 correct correct rate(%) 75

70

female male

Figure 5.12 The interaction STATUS*LESTATUS*GENDER in terms of accuracy (expt 1)

81

The way STATUS and LEXSTATUS interact was dependent on gender. The difference between female and male musicians was largest with pseudo-words, and female musicians and male non-musicians are better on non-words and pseudo-words, but it is the inverse for words. Results for post-hoc test (lsmeans) for the

STATUS*LEXSTATUS*GENDER (with Bonferrioni correction for multiple comparison) are listed below.

Table 5.9 Post-hoc test for the STATUS*LEXSTATUS (expt 1)

Interaction Level A Interaction level B p-value musician: non-word: female musician: pseudo-word: female ˂0.0001 (94.35%) (80.21%) musician: non-word: female musician: word: female (89.87%) 0.012 (94.35%) musician: non-word: male musician: pseudo-word: male (88.06%) ˂0.0001 (95.68%) musician: non-word: male musician: word: male (83.87%) ˂0.0001 (95.68%) musician: pseudo-word: female musician: word: female (89.87%) ˂0.0001 (80.21%) non-musician: non-word: female non-musician: word: female (86.23%) ˂0.0001 (95.97%) non-musician: non-word: male non-musician: pseudo-word: male 0.02 (95.65%) (91.23%) non-musician: non-word: male non-musician: word: male (91.47%) 0.012 (95.65%) non-musician: pseudo-word: non-musician: word: female (86.23%) ˂0.0001 female (93.59%)

82

Discussion for experiment 1

Words, Non-words, and Pseudo-words

The results showed that non-words were responded much faster than pseudo-words. On the average, non-words were responded at 374.75 ms, words at 450.06 ms, and pseudo- words at the longest time (525.64 ms). This indicates that in the lexical decision task it is easier to make a decision when a syllable is not a word (as there is no entry in the mental lexicon exists). For pseudo-words, a prolonged search was required because they were very similar to real words, and making a decision therefore took more time.

In addition, the interaction between LEXSTATUS and RELATIONSHIP suggested that responses regards different lexical categories were highly affected by various prime- target contour relationships. We compared the simple effect size of lexical category (i.e. the RT difference between control and match/non-match condition, respectively). We found a large difference in the facilitation size across different lexical category: in

RTmatch-RTcontrol, only words had a size of -20.58 ms, while non-words (1.63 ms) and pseudo-words (10.99 ms) did not produce facilitation. In RTnon-match-RTcontrol, pseudo- words produced an inhibitory effect (18.98 ms), while a facilitatory effect was found with non-words (-19.36 ms) and words (-6.19 ms). It is telling that for non-words, the fastest response come when the tones do not match, suggesting that when melodic contours and non-word syllables differ in tones, it may speed up the decision process. In addition, under match condition, the facilitatory effect was only found in words. It helps us understand how musical speech surrogates work, because many musical speech surrogates map lexical tones into whistling/instruments: as tonal information is a part of

83 lexicon, melodic pitch contour primed the pre-activation of the lexicon, resulting in the larger facilitation size.

Pseudo-words processing seems to involve an extended lexical search. As Stone and Van

Orden (1989, 1993) indicated, the more the non-word resembles a word, the longer time it needs to be activated. In addition, Wiener and Turnbull (2016) reported that in a word reconstruction task, where participants were required to alter non-words to words by changing one of its elements (1: vowel; 2: ; 3: tone; 4: free option), Chinese speakers showed a lower error rate and faster response with tonal altering and free option condition. In addition, within the free option condition, tonal change is the first choice, counting for 59.5% of the cases. In line with Wiener and Turnbull (2016), our results concerning reaction time of pseudo-words and non-words, support the information load hypothesis, in which tones carry the least information compared with consonants and vowels. Thus, it would be easier for native Chinese speakers to reject non-words with illegal phoneme combination because they carry a large amount of information that can constrain lexical access. For example, the non-word fian, could be seen as a non-word syllable either with wrong consonant (as it may turn to a word by altering the consonant, e.g. qian) or with a wrong vowel (as it may also turn to a word fan by removing the vowel). Interestingly, our study suggests that non-word processing maybe primarily based on phonological discrimination, but tonal information does also exert an influence.

This was supported by the finding that decision process was speeded up when melodic contour primes and non-word syllables differ in tones.

84

Another interesting and surprising result is that non-words were responded to significantly faster than real words. Based on the theory of cohort model, once a phoneme in this case gets activated, all the phonotactically legal combinations in the cohort will be activated immediately. In addition, many candidates will be excluded once a component of it does not conform to the auditory stimuli (however, it does not mean listeners are intolerant to any mismatches). But again, Chinese words need a special recognition unit-- tone. For example, non-word /fian1/ would be rejected when it reached to the second phoneme, however a real word /suan1/ and pseudo-word /suan2/ will be continually processed until the recognition point since there are still so many candidates after the second phoneme. Thus, no entry to mental lexicon and its lexically illegality at a relatively early stage makes the decision for non-words faster than the other two lexical categories.

Specific contour shape

Our study suggests that different contours had a different effect on syllable processing.

The data showed that when melodic primes and target syllables share contour information, rising primes had the largest facilitation size (-11.75 ms) and highest correct rate

(91.16%), and all prime tones except tone 3 primed responses compared with unpitched white noise. Thus, our data are in line with previous studies suggesting that tone 3 maybe one of the most confusing tones (Shen & Lin, 1991; Fromkin, 2014). When primes and syllables differ in tones, low-dipping contour with tone 2 target syllables generated the greatest facilitation size (67.48 ms), followed by the primes carrying rising tone with tone

85

4 targets (66.36 ms). The difference between these two is really small and generally, rising primes with falling targets produced greatest facilitation size in all other cases.

When we took the interactive relationship of ‘STATUS’, ‘TONE’ and ‘PRTONE’ into consideration, we found that when melodic primes and target syllables share contour information, musicians had the greatest facilitation size when primes carrying a falling tone (-13.23 ms), followed by tone 2 primes (-6.95 ms), and when primes were with a low-dipping tone, an inhibitory effect was found (23.56 ms). Crucially, these observations were absent in non-musicians. For non-musicians, tone 4 primes were the only one exerts inhibitory effect (12.56 ms), however, tone 1 and tone 2 primes generated approximately the same facilitation size (-17.81 ms and -17.72 ms). When primes and syllables differ in tones, for musicians, primes carrying rising tone preceding a falling- tone target had the greatest facilitation size (-71.87 ms). However, for non-musicians, primes with tone 3, which preceded target with tone 2 got the greatest facilitation size (-

78.82 ms). Our results imply that long-term training experience may adjust participants’ sensitivity to contour information.

Musicians vs. Non-musicians and Gender Role

That social-cultural differences lead to different behavioral performance has been confirmed in many studies. In the current study, subjects with music training background responded 51.63 ms faster than non-musicians. Our experiments are in line with previous studies, indicating that musical training may facilitate participants’ responses. However, in terms of accuracy, non-musicians got a 2.7 pp more correct rate than musicians, and

86 this probably is because musicians adopted a different listening strategy, where they weighted speed over accuracy even if they were instructed to respond as accurately and quickly as possible.

Meanwhile, GENDER as a factor affected participants’ response significantly, and females had a faster reaction time than males in this experiment. As Roebuck and

Wilding (1993) suggested, female listeners and male listeners performed better when they heard a voice from the same sex. That could be the reason why female subjects performed significantly better since we used a female voice for the recording of the stimuli in our experiment. Though we are not able to say that females are better at auditory processing, this experiment may suggest that sex differences play a role in pitch contour processing.

5.5 Experiment 2

The second experiment was a shadowing task. Participants were asked to repeat what they heard as quickly and accurately as possible.

Experimental design and procedure for experiment 2

In the second experiment, a syllable reproduction task, we again used priming paradigm with a stimulus onset asynchrony of 250 ms.

For the syllable reproduction task, we used the same prime-target pairs as in the first experiment. Participants would have 1 minute to rest after finishing each section.

Different from the experiment 1, we allowed DMDX software to scramble prime& target

87 pairs within each section automatically. Thus, technically, each participant would have a different sequence of prime& target pair presentation. We set 2500ms as time limit for participants’ response.

In this experiment, participants were required to orally repeat what they heard right after hearing it. The syllable reproduction experiment was conducted on a Sony Vaio laptop, and all stimuli were controlled and presented by DMDX software (Forster & Forster,

2003). There was a headphone with mic attached to computer and their vocal responses were recorded for each target item.

Before the experiment, subjects took a trial to practice. The trial consisted of 9 pairs of primes and targets, with targets in all four tones, and presented with either instrumental/vocal/control prime in match/non-match condition. Participants were instructed to ignore the first sound of a sequence (the prime), and repeat the second sound

(the target syllable) of the sequence as quickly and accurately as possible. The experiment took about 30 minutes in total, including the practice session. After the experiment, response times of all the target items were checked and corrected with

CheckVocal (Protopapas, 2007). And accuracy was checked by me, with two aspects taken into consideration: one is the syllable, and the other one is the speech tone. The number of phoneme errors is 631, counting 95.61% of all wrong answers. The number of tone errors is 29, counting 4.39% of the wrong answers.

88

Participants for experiment 2

In the syllable reproduction experiment, we again recruited Chinese musicians and

Chinese non-musicians as our subjects. The criteria for Chinese musicians and non- musicians were same as in our previous lexical decision experiment.

All musicians had passed a competitive entrance examination before they entered their program, and were required to take ear training and sight singing class in the first two years. Participants in this syllable reproduction task consisted of 28 adults (mean age for musicians= 25.6± 8 years, mean age for non-musicians= 22± 4years). There were 14

Musician participants,11 of them were female musician. The number of Non-musician participant was 14, with 7 female non-musicians. Musicians on the average had 17-years formal music training, and for non-musicians, no one had formal music training. All of the participants were right-handed, established by an adapted version of Stanley Coren’s handedness questionnaire (Anonymous, n.d.). None of the participants reported abnormal hearing or motor disability. All participants gave their informed consent before participating in this study in accordance with the Ohio State University Institutional

Review Board regulations.

Result for experiment 2

We removed the responses made in less than 200 ms from the onset of target (0.17%), and those responses made in more than 2000 ms from the onset of target (0.04.%), because responses beyond these thresholds may likely be a guess or anticipation.

89

Response Time:

Response times were analyzed based on correct responses, so we removed incorrect answers, which counts for 7.03%. We applied SAS software (University Edition) with

MIXED (effect) procedure to examine how fixed factors affect the lexical decision processing. For the same reason as in the first experiment, by-subject (F1) and by-item analysis (F2) were conducted separately. The factors STATUS, GENDER, LEXSTATUS,

TIMBRE, RELATIONSHIP and all the 2- and 3-way interactions between and among these factors were in the initial full model. Subjects were nested within STATUS and

GENDER, and items (target syllables) were nested within LEXSTATUS and TONES.

We arrived at the minimal model in the same way as we did for the first experiment. The following statements in PROC MIXED indicate the minimal model:

Proc mixed;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP SUBJECT

Model realresp=STATUS LEXSTATUS TIMBRE /ddfm=satterthwaite solution;

Random SUBJECT SUBJECT(GENDER) SUBJECT(STATUS);

Run;

Proc mixed;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP TONE ITEM

Model realresp=STATUS TIMBRE STATUS*GENDER

STATUS*LEXSTATUS*GENDER /ddfm=satterthwaite solution;

Random ITEM ITEM(TONE) ITEM(LEXSTATUS);

Run;

90

The results showed that the effect of TIMBRE was significant in both by-subject (F1) and by-item (F2) analysis (F1(2,27)=6.26, p=0.0019, F2(2,111)=7.76, p=0.0004). Results showed that both instrumental and vocal primes facilitated participants' reaction time.

Mean reaction time with vocal primes was 238.21 ms; with the instrumental primes was

247.95 ms, and white noise primes took longest reaction time (252.98 ms). Post-hoc test showed that comparison between control and vocal (p=0.0003), instrumental and vocal

(p=0.0425) reached significance (in by-item analysis).

255

250

245

240 reaction time reactiontime (ms)

235

230 vocal instrumental control

Figure 5.13 The main effect TIMBRE in terms of reaction time (expt 2)

To directly compare the facilitation effect led by different timbres, we used RTMV- RTCL/

RTMI- RTCL (the difference between response time under match condition with vocal/

91 instrumental primes and response time under control condition) and RTNMV-RTCL/RTNMI-

RTCL (under non-match condition). To have a complete set of responses for each item, we removed participant’s responses to an item as long as the participant wrongly answered any part of the combination. For example, in the RTmatch-RTcontrol combination, no matter he/she wrongly answered the item under match or control condition, we excluded his/her responses to that item in calculating the facilitation effect size. Difference for vocal and instrumental primes, each as match and non-match condition did not produce significant results. Taking musicianship into consideration, we found there was a significant difference between the facilitation size for vocal primes and instrumental primes in non- match conditions in musician group (-5.82 ms vs. -21.79 ms, p=0.0468).

There was a main effect of LEXSTATUS only in by-subject analysis (F1(2,27)=64.05, p˂0.0001). Post-hoc test (lsmeans) showed that comparison between words (267.54 ms) and pseudo-words (228.63 ms), words and non-words (231.65 ms) reached significance

(all p-values are less than 0.0001).

92

280

270

260

250

240

230 reaction time reactiontime (ms) 220

210

200 non-words pseudo-words words

Figure 5.14 The reaction times to different lexical categories (expt 2)

The effect of STATUS reached significance in both by-subject and by-item analysis

(F1(1,111)=4.62, p=0.041; F2(1,111)=343.3, p˂0.0001). On the average, musicians and non-musicians took 206.61 ms and 286.91 ms to reproduce the target, respectively.

No interaction showed significance in by-subject analysis. The 2-way interaction

STATUS*GENDER revealed to be significant only in by-item analysis

(F2(1,111)=101.95, p˂0.0001) Female musicians responded fastest (194.47 ms), and the difference between musicians and non-musicians was larger in females (115.4 ms), but this interaction was not significant in the by subject analysis.

93

330 310 290 270 250 230

210 reaction time reactiontime (ms) 190 170 150 female male

musician non-musician

Figure 5.15 The STATUS*GENDER interaction in terms of reaction time (expt 2)

The 3-way interaction STATUS*LEXSTATUS*GENDER reached significance in by- item analysis (F2(2,111)=2.37, p=0.0167), but this interaction was not significant in the by subject analysis..

94

350 330 310 290 270 250 230 210

reaction time reactiontime (ms) 190 170 150

female male

Figure 5.16 The STATUS:LEXSTATUS:GENDER interaction in terms of reaction time (expt 2)

By zooming in to see how specific contour shape affects participants’ performance, we found that, the interaction PRIMETONE*TONE (F=17.99, p˂0.0001) reached significance.

95

500

450

400

350

300

250

reaction time reactiontime (ms) 200

150

100 control flat primes rising primes low-dipping falling primes primes

t1 target t2 target t3 target t4 target

Figure 5.17 The PRTONE:TONE interaction in terms of reaction time (expt 2)

Tone 3 targets had the shortest reaction times. Targets with falling tone were responded with a longer reaction time than the other target tones. Results for post-hoc test (lsmeans) for the PRTONE*TONE (with Bonferrioni correction for multiple comparison) are listed below.

96

Table 5.10 Post-hoc test for the PRIMETONE*TONE (expt 2)

Interaction Level A Interaction level B p-value control: T1 (210.17 ms) control: T4 (404.58 ms) ˂0.0001 control: T2 (234.08 ms) control: T3 (164.49 ms) 0.0009 control: T2 (234.08 ms) control: T4 (404.58 ms) ˂0.0001 control: T3 (164.49 ms) control: T4 (404.58 ms) ˂0.0001

Pfall: T1 (175.31 ms) Pfall: T4 (388.62 ms) ˂0.0001

Pfall: T2 (226.7 ms) Pfall: T3 (136.78 ms) 0.0339

Pfall: T2 (226.7 ms) Pfall: T4 (388.62 ms) ˂0.0001

Pfall: T3 (136.78 ms) Pfall: T4 (388.62 ms) ˂0.0001

Pflat: T1 (211.98 ms) Pflat: T4 (395.32 ms) ˂0.0001

Pflat: T2 (226.48 ms) Pflat: T3 (178.5 ms) 0.0437

Pflat: T2 (226.48 ms) Pflat: T4 (395.32 ms) ˂0.0001

Pflat: T3 (178.5 ms) Pflat: T4 (395.32 ms) ˂0.0001

Plow-dipping: T1 (218.07 ms) Plow-dipping: T4 (436.69 ms) ˂0.0001

Plow-dipping: T2 (201.33 ms) Plow-dipping: T4 (436.69 ms) ˂0.0001

Plow-dipping: T3 (156.39 ms) Plow-dipping: T4 (436.69 ms) ˂0.0001

Prising: T1 (212.42 ms) Prising: T4 (355.74 ms) ˂0.0001

Prising: T2 (221.92 ms) Prising: T3 (145.9 ms) 0.0025

Prising: T2 (221.92 ms) Prising: T4 (355.74 ms) ˂0.0001

Prising: T3 (145.9 ms) Prising: T4 (355.74 ms) ˂0.0001

In addition, the interaction PRIMETONE*TONE*STATUS reached significance

(F=10.27, p˂0.0001).

97

600

500

400

300

200

reaction time reactiontime (ms) 100

0

musician non-musician

Figure 5.18 The PRTONE*TONE*STATUS interaction in terms of reaction time (expt 2)

Basically, we have the same results as for the two-way interaction of PRTONE*TONE, but the size of this two-way interaction was dependent on status: the difference between musicians and non-musicians was greater for Tone 4 than for other target tones. Results for post-hoc test (lsmeans) for the PRTONE*TONE*STATUS (with Bonferrioni correction for multiple comparison) are listed below.

98

Table 5.11 Post-hoc test for the PRIMETONE*TONE*STATUS (expt 2)

Interaction Level A Interaction level B p-value control: T1: musician (178.6 ms) control: T4: musician (352.28 ms) ˂0.0001 control: T1: non-musician (242.99 control: T4: non-musician (457.62 ˂0.0001 ms) ms) control: T2: musician (197.33 ms) control: T3: musician (129.78 ms) 0.0147 control: T2: musician (197.33 ms) control: T4: musician (352.28 ms) ˂0.0001 control: T2: non-musician (270.83) control: T3: non-musician (200.64 0.0181 ms) control: T2: non-musician (270.83) control: T4: non-musician (457.62 ˂0.0001 ms) control: T3: musician (129.78 ms) control: T4: musician (352.28 ms) ˂0.0001 control: T3: non-musician (200.64) control: T4: non-musician (457.62 ˂0.0001 ms) Pfall: T1: musician (145.5 ms) Pfall: T4: non-musician (441.84 ms) ˂0.0001 Pfall: T2: musician (189.01 ms) Pfall: T4: musician (336.69 ms) ˂0.0001 Pfall: T2: non-musician (264.4 ms) Pfall: T4: non-musician (441.84 ms) ˂0.0001 Pfall: T3: musician (102.78 ms) Pfall: T4: musician (336.69 ms) ˂0.0001 Pfall: T3: non-musician (172.88 ms) Pfall: T4: non-musician (441.84 ms) ˂0.0001 Pflat: T1: musician (173.92 ms) Pflat: T4: musician (340.71 ms) ˂0.0001 Pflat: T1: non-musician (250.25 ms) Pflat: T4: non-musician (451.89 ms) ˂0.0001 Pflat: T2: musician (187.9 ms) Pflat: T4: musician (340.71 ms) ˂0.0001 Pflat: T2: non-musician (264.81 ms) Pflat: T4: non-musician (451.89 ms) ˂0.0001 Pflat: T3: musician (144.01 ms) Pflat: T4: musician (340.71 ms) ˂0.0001 Pflat: T3: non-musician (214.02 ms) Pflat: T4: non-musician (451.89 ms) ˂0.0001 Plow-dipping: T1: musician (185.38 ms) Plow-dipping: T4: musician (376.41 ms) ˂0.0001 Plow-dipping: T1: non-musician (252.53 Plow-dipping: T4: non-musician (496.98 ˂0.0001 ms) ms) Plow-dipping: T2: musician (164.58 ms) Plow-dipping: T4: musician (376.41 ms) ˂0.0001 Plow-dipping: T2: non-musician (237.54 Plow-dipping: T4: non-musician (496.98 ˂0.0001 ms) ms) Plow-dipping: T3: musician (124.38 ms) Plow-dipping: T4: musician (376.41 ms) ˂0.0001 Plow-dipping: T3: non-musician (189.2 Plow-dipping: T4: non-musician (496.98 ˂0.0001 ms) ms) Prising: T1: musician (171.76 ms) Prising: T4: musician (296.51 ms) ˂0.0001 Prising: T1: non-musician (252.07 ms) Prising: T4: non-musician (414.98 ms) ˂0.0001 Prising: T2: musician (191.47 ms) Prising: T3: musician (89.03 ms) 0.0007 Prising: T2: musician (191.47 ms) Prising: T4: musician (296.51 ms) ˂0.0001 Prising: T2: non-musician (252.37 ms) Prising: T4: non-musician (414.98 ms) ˂0.0001 Prising: T3: musician (89.03 ms) Prising: T4: musician (296.51 ms) ˂0.0001 Prising: T3: non-musician (203.64 ms) Prising: T4: non-musician (414.98 ms) ˂0.0001

99

Under match condition, musicians had the greatest facilitation size when primes carrying a falling tone (-15.59 ms), followed by tone 2 primes (-5.86 ms). For non-musicians, primes with rising tone generated the greatest facilitation size (-18.47 ms), and tone 1 primes were the only one exerts inhibitory effect (7.26 ms). Under non-match condition, primes with rising tone preceding target with falling tone showed the greatest facilitation effect in both musicians (-55.77 ms) and non-musicians (-42.64 ms).

Accuracy

As the accuracy is a binary variable, either right or wrong, we applied PROC GLIMMIX with binomial distribution and logit function to analyze accuracy. We arrived at the following minimal model in the same way described in the previous chapter:

Proc glimmix;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP SUBJECT

Model correct= LEXSTATUS /distribution=binomial ddfm=satterthwaite solution;

Random SUBJECT SUBJECT(GENDER) SUBJECT(STATUS);

Run;

Proc glimmix;

Class STATUS GENDER LEXSTATUS TIMBRE RELATIONSHIP TONE ITEM

Model correct=STATUS LEXSTATUS GENDER STATUS*GENDER

/distribution=binomial ddfm=satterthwaite solution;

Random ITEM ITEM(TONE) ITEM(LEXSTATUS);

Run;

100

The results showed that the main effect of LEXSTATUS was significant in both by- subject (F1) and by-item (F2) analysis (F1(2,27)=61.05, p˂0.0001, F2(2,111)=3.28, p=0.0417). Post-hoc test (lsmeans) showed that comparisons between non-words

(87.64%) and words (94.23%), and between pseudo-words (95.62%) and non-words reached significance.

98

96

94

92

90

88 correct correct rate(%)

86

84

82 non-words pseudo-words words

Figure 5.19 The main effect of LEXSTATUS in terms of accuracy (expt 2)

The main effect of STATUS only emerged in by-item analysis (F2(1,111)=33.04, p˂0.0001). The data showed that the correct rate was higher in musicians (93.79%) than in non-musicians (91.74%), but this difference was not significant in the by subject analysis. There was an effect of GENDER in by-item analysis but not by-subject analysis

101

(F2(1,111)=12.06, p=0.0005). Male participants had higher correct rate (93.15%) than female (92.88%).

The 2-way interaction STATUS*GENDER reached significance in by-item analysis

(F2(1,111)=23.92, p˂0.0001), but this interaction was not significant in the by subject analysis.. Male musicians had the highest correct rate (97.02%), and male non-musicians had the lowest correct rate (91.48%).

98 97 96 95 94 93 92

correct correct rate(%) 91 90 89 88 female male

musician non-musician

Figure 5.20 The interaction STATUS*GENDER in terms of accuracy (expt 2)

102

Discussion for experiment 2

Vocal vs. Instrumental Timbre

Nick Poss (2012) focused on Hmong speech surrogate, and concluded that pitch information indeed helps participants decode and understand "lexical meaning" carried by instruments, based on two experiments he performed. In our experiments, we found that pitched primes, no matter what the timbre, lead to faster response than non-pitched stimuli, thus confirm his finding.

Not only vocal primes, but also instrumental primes facilitated participants’ responses.

Though there was no clear effect of TIMBRE in the first experiment, the second experiment demonstrated that TIMBRE did affect responses to targets, suggesting that our brain may deal with timbres differently. As Fitch (2006) proposed, researchers may investigate the origin of music by exploring two different paths determined by timbre, namely, vocal and non-vocal. Our results support the idea that vocal and non-vocal sounds are processed differently, in line with previous studies arguing for a clear distinction between timbre (Poss, 2012; Hung, 2011). In addition, human have different physiological reactions when being exposed to vocal sounds than to non-vocal sounds

(Belin, Zatorre, & Ahad, 2002; Loui et al., 2013), supporting that we have greater excitability during listening to a specific timbre. Since humans are social animals living in association with others mainly through speech, vocal sounds acquire a biological and social significance in human's life during the long evolutionary history. In the speech shadowing task, vocal primes speeded up responses more than instrumental primes, supporting that the special role played by the voice in human communication contributed

103 to an increased sensibility and attention to human voice. Speech motor theory suggests that we identify articulatory gestures when hearing speech, and vocal tract muscles, pre- motor as wells as motor cortexes will be activated. Though there is no vocal articulatory gesture in our contour stimuli, vocal contours did have melodic gesture, which is clearly a part of vocal gestures when we produce speech. Thus, we argue that vocal contours may activate the speech motor system, which is engaged in the vocal tract movement planning, leading to faster responses.

From the perspective of facilitation effect size, musicians treated vocal primes and instrumental primes differently in non-match condition: they responded to targets with instrumental primes faster than those with vocal primes (15.97 ms difference). That may indicate that musicians are attuned and more sensitive to the change of timbre. This is in line with Cheong et al.’s fMRI study (2017), where they found that in active listening task, musicians showed strongest activation for instrumental primes, while non-musicians showed highest activation for vocal primes.

Specific contour shape

The speech shadowing task confirmed that different contours may have different priming effects. Interestingly, the PRIMETONE*TONE analysis yields similar results in experiment 1 and 2. In addition, in both experiments, for musicians, falling prime contours produced the greatest facilitation size under match condition, and primes with rising tone preceding targets with falling tones produced the greatest facilitation under non-match condition. This pattern was absent in non-musician group, which may suggest

104 that it was a product of long-term musical training. But as to the reason why different contours have different priming effects, currently it is difficult to explain, and this needs to be examined in future studies.

Non-words, Words and Pseudo-words

Though there was almost no difference between response times for non-word and pseudo-word, there was indeed an obvious difference between that for pseudo-word and words.

In addition, interestingly, in speech reproduction task, non-words were still responded faster than words. That is, whatever lexical search might be initiated, it seems to last much shorter than for the other two syllable categories. Different from the results from experiment 1, pseudo-words were responded as fast as non-words in current experiment, but words were constantly responded to slower than non-words in both tasks. Though different task requirements cannot be taken as the complete explanation for our results, we argue that this might at least explain why pseudo-words were responded to as fast as non-words only in syllable repetition task. In the first experiment, pseudo-words need longer time because they have legal phoneme combination, so an extended lexical search was involved until no entry with that lemma information is found, however, in current task, a full-scale lexical search is not required. In contrast to previous studies reporting that words had an advantage over non-words in lexical decision and repetition task (e.g.,

Poss et al., 2008; Will et al., submitted), longer reaction times were found for words in

105 both tasks. We currently have no good explanation, but it may be caused by the fact that some 2-phoneme syllables recurred with different tones only in non-words.

Another possibility could be that the differences, at least partially, came from the word lists used in our experiment. That is because, there were no clear distinctions between non-words and pseudo-words in our previous studies; rather, there were only two sub- levels of lexical category, words and pseudo-words, and the latter one was actually a mix of non-words and pseudo-words.

Musicianship and gender role

In line with previous studies and our lexical decision task, the results showed that musicianship indeed played an important role. Our results, together with many other studies, supported the view that musicians with long-term training experience may develop sharpened ability to process auditory signals.

Interestingly, against our initial hypothesis, participants made more errors in phoneme pronunciation than tone. However, though most errors participants made were wrong syllables production, we noticed a difference between musicians and non-musicians in wrong tone production. Results showed that for error items, wrong tones only occurred 4 times in musicians (0.61% of all errors), however, they occurred 25 times in non- musicians (3.79%). There was also a large difference in phoneme error rate in musicians

(255 errors, 38.64%) and non-musicians (376 errors, 56.97%). This implies that musicians are more sensitive to tonal information and also with higher phonological awareness than non-musicians.

106

Chapter 6. Conclusion

6.1 Findings Summary

This dissertation started with a review concerning the relationship between music and language. As our previous studies indicated, pitch contour may be important in mental lexicon accessing, and this may further help us understand how musical speech surrogates work. Therefore, in our study, we conducted two experiments to clarify (1) whether and how melodic pitch contours affect linguistic processing, (2) if tonal information has influences on linguistic processing, whether musical training plays an essential role in it, and (3) whether melodic contours affect linguistic processing via different pathways.

Our results showed that, firstly, musical training may enhance participants’ performance due to their extended experience with pitch and contour processing. We propose that musicians may have advanced melodic abilities, which help them process the contour information more efficiently, though we cannot exclude that differences in some other cognitive processes, e.g. attention, may also play a role in it. In addition, the factor

GENDER played an important role in how melodic contours affect linguistic processing.

Results showed that females are better than males in the current tasks, in which the target stimuli were produced by a female speaker. Whether the advantage for females comes from the fact that these participants and the speaker of the stimuli are same gender is an 107 open question, which could be tested by replicating the experiments with male voice for the target stimuli.

Secondly, as for the factor TIMBRE, the syllable repetition task was in line with previous studies, indicating the “humanness bias” (Lévêque & Schön, 2013), that is, vocal sounds enjoy superiority over non-vocal sounds. Though we did not observe such superiority in the lexical decision task, we cannot conclude that results of these two tasks are contradictory, because they have quite different task requirements. In the first experiment

(lexical decision task), a full-scale lexical search is required as participants were required to respond whether the targets they heard was a word or not. To identify a word, participants needed all features (not just phonetic information), so that they will wait and check until all information, such as timbre, duration, meaning, were activated by the phonological sounds. Lexical activation could also happen in the shadowing task, but different from that in the decision task, it is an automatic processing and the information that needs to be retrieved is also different: in the shadowing experiment, there was no need to identify the lexical category, rather, participants needed to store the input sound in mind and then activate the motor system to reproduce the sounds, so only phonological information was needed. Thus, we suggest it is the task requirement that makes the priming effect not noticeable in the decision task.

Thirdly, interesting but most surprising observations concern the lexical categories.

Comparing the facilitation size (i.e. the RT difference between control and match/non- match condition, respectively) in the lexical decision task, words were the only lexical category that were facilitated by the melodic pitch contour. This is important, as it helps

108 us understand how musical speech surrogates work: as tonal information is a part of lexicon, melodic pitch contours lead to the pre-activation of the lexicon, resulting in response facilitation. In accordance with the auditory dual-route model we assumed that processing through the lexical route is more efficient than through non-lexical route.

However, we found that for Chinese monosyllables as targets, non-word syllables were responded to the fastest, and pseudo-words were responded to the longest in the lexical decision task. We suggested that in lexical decision task, non-words can be rejected faster, because phoneme sequence in non-words are not in use in Chinese, so they probably not even start a search, which lead to quick decision on non-words. And pseudo-words need longer time because they have legal phoneme combinations, so they lead to a search that is stopped at one point when no entry with that lemma information is found.

Fourthly, in both experiments, for musicians, falling prime contours produced the greatest facilitation size under match condition, and primes with rising tone preceding targets with falling tones produced the greatest facilitation under non-match condition. This pattern was absent in the non-musician group, which may suggest that it was a product of long- term musical training.

6.2 Limitations of current study and possibilities for future studies

There are several limitations that may interfere in our interpretation. However, we argue that these limitations may also offer new paths for future studies.

1) In data analysis we did not apply a mixed model with crossed random factors (as

subjects and items are crossed in our experimental design), instead, we did two

109

separate mixed models, one with random factor ‘SUBJECT and on with random

factor ‘ITEM. This is because our experimental design is relative complex, having

a complicated nested structure of the random factors (e.g. items are nested within

TIMBRE levels, but this nesting only holds for contour primes, because control

primes are crossed with items). This design was made in order to keep the time

for the experimental sessions within acceptable limits.

2) The fundamental frequencies of instrumental contours were higher than that of

vocal contours. We created the instrumental contours with reference to Rialland’s

study (2005) regarding musical speech surrogate, where the second harmonic of

the vocal contours were mapped onto the fundamental frequencies of the

instrumental contours. By doing so, vocal contours and instrumental contour were

perceptually similar. Surely, if pitch differences between instrumental sounds and

vocal sounds were very different, this could significantly affect results. In one of

our previous studies we found that only if the prime pitch is transposed out of the

range of the fundamental of the human voice is there a change in the priming

effect. This suggests that the higher pitch range for our instrumental contours may

not affect the results much, as it is well within the range of the voice fundamental

in our study. This interpretation could be tested by applying vocal and

instrumental contours that have similar fundamental frequency ranges in future

studies.

3) Gender is not balanced in the musician group of the speech shadowing task.

Though the statistical model applied in our analysis does not require balanced

110

design, the fact that male musicians only counted around 20% of total musicians

may limit the interpretation of the result if the participated male musicians were

not very representative.

4) Phoneme length was not balanced in different lexical categories across different

tones. This may not be a fundamental issue here as we used exactly the same sets

syllables for the levels of many experimental variables. For instance, musicians

and non-musicians listened to the same set of syllables; and in match and control

condition, the same set of syllables was assigned. Though, in some situation, it

may limit some interpretation on some parts of the data analysis, i.e. the

facilitation effect size of specific contour in non-match condition. This is because

for match and control condition, the syllables were exactly same, while for non-

match condition, every syllable theoretically could be assigned with three non-

match options with specific contours (e.g., for a tone 4 syllable, it could be either

with t1, t2, or t3 prime). As we compared the facilitation size of different prime

contours within each categorical target contour (e.g. the facilitation of T4 primes

under match condition would be calculated by RTp4-t4-RTcontrol-t4), the syllables

used in non-match condition with regards to specific contour would cover

approximately 1/3 of syllables used in control condition. Thus, the effect size may

be interfered with the syllable characteristic, i.e. phoneme number.

5) Instrumental contours in our study were produced by Xiao, and it was difficult to

completely glide through the pitches, so the actual sounds are a bit of a mix of

fixed tones and gliding tones (see fig. 5.1). By contrast, vocal contour is

111

continuous. However, based on our knowledge, there is no study that directly

compares whether extracting melody contours are different for discrete tone

sequences and for continuous pitch contours. Thus, it would be interesting to

explore this by using continuous instrumental contours, e.g. contours produced on

silk instrument in future studies.

6) We didn’t take the specific major of musicians into consideration. In our study,

we were more interested in the influence of general music training experience,

however, there is a possibility that different music playing experience (e.g.

vocalists vs. instrumentalists, which may be further divided, e.g. based on whether

it is a melodic instrument or not) may affect contour processing. It would also be

interesting to further explore the influence of their specific music playing

experience in future studies.

6.3 A possible new direction regarding pitch and contour influence: music text-setting.

A possible application of the results of the current study concerns music text-setting in tonal language context, where the realization of lexical tones is constrained by pitch and contour of the melodies. Taking the tonal characteristics into consideration together with melody is necessary and important; otherwise the intelligibility of lyrics may be sacrificed for musicality or vice versa. In this section, we will use Chinese popular music as an example.

In the realm of Chinese popular music, there are many cases where tone sequences of the word syllables are disturbed by melodic direction occurs, for instance,

112

Figure 6.1 An example of confusion from song 鲁冰花 (Lubinghua)

In this case, "夜夜" (yè yè) would confuse audience that it might be heard as "爷爷" (yéyé,

Grandfather), because there exists a mismatch, which refers to the discordance in the relationship between word tones and melody. Note that not all mismatches would cause misunderstanding. In pragmatic situation, a wrong tone in a song may pose no problem for a proposer understanding because the context offers cues so that only one or a few related interpretations are possible. But, it would important to avoid such disturbance when there is not enough contextual information. And the first step is that lyricists and composers need to do music text-setting very carefully.

There are two theoretical music text-setting methods to avoid mismatch. One is based on onset pitch of a tone, and the other one is based on pitch contour. Sun Congyin has proposed a scheme to coordinate the relationship between word tones and melody. His scheme was based on the relationship between pitch onsets of the tones of neighboring syllables. He put forward the scheme as follows25,

25 Sun’s method is based on Linguist Zhao Yuanren’s five degree tone-marking method (五度标记法). One can use Arabic number 1 to 5 to indicate five relative tone pitches, namely, low—secondary low— secondary—secondary high—high. 113

1) For two adjacent syllables with tone 1 and tone 2 respectively, melodic pitch for

tone 2 should be lower than that for tone 1.

2) For two adjacent syllables with tone 2 and tone 3 respectively, melodic pitch for

tone 3 should be lower than that for tone 2.

3) For two adjacent syllables with tone 1 and tone 4 respectively, melodic pitch for

tone 4 should be lower than that for tone 1.

4) For two adjacent syllables with tone 2 and tone 4 respectively, melodic pitch for

tone 2 should be lower than that for tone 4.

5) There is no specific requirement for melodic pitches, if they are for two adjacent

syllables with same tones category (1988, p.103-112).

Besides Sun’s scheme, there is also another scheme which is more popular. In 2006,

Wang Yaohua proposed his scheme dealing with the relationship between tone and melody. He illustrated that, "in general, the direction of tone and the melodic direction26 should be in accord; otherwise, confusions will occur frequently" (p.120). Wang’s scheme actually has some shortcomings. First, according to Wang, syllables with tone 3 cannot be assigned with just one note, which contradicts with the fact that Chinese popular songs are pervasively syllabic. Second, based on his scheme, actually there is a disjuncture between melody and syllable tones in syllabic settings, because the melody moves at a point where the syllable has already ended, and this also holds for Sun’s scheme. This alignment issue probably is caused by the fact that Chinese and music text- setting schemes for Chinese songs were not developed in the same context. Specifically,

26 melody direction refers to the direction from the note(s) of a certain syllable to the first note of the next syllable. 114

Chinese is a language featuring dynamic tones. By contrast, fixed pitch setting is a western concept, which is less suitable for singing Chinese.

In a preliminary study concerning the rationality of these two schemes, Wang (2016) checked the mismatch ratios in ten songs. We found that the ratio is higher by applying

Wang’s pitch contour scheme, which may suggest such a scheme is stricter. In addition, we proposed that Sun’s and Wang’s schemes could be used in different situations: one may be better utilized in syllabic text-setting, and the other perhaps is more suitable for melisma, where each syllable is set with multiple melodic notes.

Though the test-setting study dealt with different phenomenon and asked different questions than our current study, it is complementary to the current pitch contour priming study as both of them are dealing with the more general question of how does linguistic communication in music work: the former one is analytical and examine artifacts (music, texts), but it does not explore the reason why concordance of musical pitches and speech tones is considered important. The latter is experimental and examined possible influences of tonal contours on lexical processing.

6.4 Conclusion

We set understanding how melodic pitch contours affect linguistic processing as our basic research question, and by exploring this topic, we hoped to further our understanding of how musical speech surrogate works. In our experiments we used a lexical decision and a speech shadowing task.

115

Comparing the facilitation size in the lexical decision task, non-words were facilitated in the non-match condition, suggesting that when melodic contours and non-word syllables differ in tones, it may speed up the decision process. Words were the only lexical category that was facilitated by the melodic pitch contour in the match condition. We argue that since tonal information is a part of lexicon, melodic pitch contours lead to the pre-activation of the lexicon, resulting in the facilitation size.

In addition, musicians outperformed non-musicians in both experiment, and we propose that it could be because musical training may enhance listeners’ pitch contour processing ability, though based on current study, we are not able to eliminate other possible influences, e.g. attention. In the meantime, vocal sounds produced greater facilitation size than instrumental sounds, suggesting the enhanced sensibility and attention to human voice.

As already demonstrated by Hung (2011) and Poss (2012), methodology from cognitive science can be an essential aid in explaining certain aspects of musical phenomenon, because merely using ethnomusicological methodology like interviewing listeners and analyzing musical surrogate speech users may not help us understand the underlying process. Our study together with their studies, suggested the possibility and value of involving methodology from cognitive science for a deeper understanding of musical phenomenon.

116

Bibliography

Anonymous. (n.d.). Handedness Questionnaire. Retrieved from https://faculty.washington.edu/chudler/rltablen.html

Anonymous. (n.d.). Lü’s commentaries on History. Retrieved from http://so.gushiwen.org/guwen/bookv_4071.aspx

Anonymous. (n.d.). The Rites of Zhou. Retrieved from http://ctext.org/rites-of-zhou/chun-guan- zong-bo.

Balota, D. A., & Lorch, R. F. (1986). Depth of automatic spreading activation: Mediated priming effects in pronunciation but not in lexical decision. Journal of Experimental : Learning, Memory, and Cognition, 12(3), 336-345.

Balter, M. (2004). Seeking the key to music. Science, 306(5699), 1120-1122.

Belin, P., Zatorre, R. J., & Ahad, P. (2002). Human temporal-lobe response to vocal sounds. Cognitive Brain Research, 13(1), 17-26.

Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309-312.

Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: evidence for bidirectionality between the domains of language and music. PloS One, 8(4), 1-11.

Boyle, R. (1996). Effects of irrelevant sounds on phonological coding in reading comprehension and short term memory. The Quarterly Journal of Experimental Psychology: Section A, 49(2), 398-416.

Cairns, G. F., & Butterfield, E. C. (1975). Assessing infants’ auditory functioning. Exceptional Infant, 3, 84-108.

Carreiras, M., Lopez, J., Rivero, F., & Corina, D. (2005). Linguistic perception: neural processing of a whistled language. Nature, 433(7021), 31-32.

Caughley, R (1976) Chepang whistle talk. In T. Sebeok, & D.J. Umiker-Sebeok (Eds.), Speech Surrogates: Srum and Whistle Systems (966−992). The Hague: Mouton.

117

Charlton, B. D. (2014). Menstrual cycle phase alters women's sexual preferences for composers of more complex music. In Proceedings of the Royal Society B, 281(1784), 1-6.

Cheong, Y.J., Will, U., and Lin, Y.Y (2017) Do vocal and instrumental primes affect word processing differently? An fMRI study on the influence of melodic primes on word processing in Chinese musicians and non-musicians. Proceedings of the 25th Anniversary Conference of the European Society for the Cognitive Sciences of Music, Ghent, Belgium, 35-39.

Chomsky, N. (2006). Language and Mind. Cambridge: Cambridge University Press.

Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: a dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108(1), 204-256.

Conard, N. J., Malina, M., & Münzel, S. C. (2009). New flutes document the earliest musical tradition in southwestern Germany. Nature, 460(7256), 737-740.

Crawford, H. J., & Strapp, C. M. (1994). Effects of vocal and instrumental music on visuospatial and verbal performance as moderated by studying preference and personality. Personality and Individual Differences, 16(2), 237-245.

Dalla Bella, S.; Peretz, I. (1999). Music agnosias: selective impairments of music recognition after brain damage. Journal of New Music Research, 28(3), 209–216.

Darwin, C. R. (1871). The Descent of Man, and Selection in Relation to Sex. : John Murray.

Darwin, C., & Prodger, P. (1998). The Expression of the Emotions in Man and Animals. Oxford: Oxford University Press.

Dissanayake, E. (2000). Antecedents of the temporal arts in early mother-infant interaction. In N. Wallin, B. Merker, & S. Brown (Eds.), The Origins of Music (389-410). Cambridge: MIT Press.

Dowling, W. J. (1999). The development of and cognition. In: (Ed.), Psychology of Music (603-625). San Diego: Academic Press.

Duan, A.J. (n.d.). Yue Fu Za Lu. Retrieved from http://ctext.org/wiki.pl?if=gb&res=370127&remap=gb

Edworthy, J. (1983). Towards a contour-pitch continuum theory of memory for melodies. In D. Rodgers, & J. Sloboda (Eds.), The Acquisition of Symbolic Skills (263-271). New York: Plenum.

Edworthy, J. (1985). Interval and contour in melody processing. Music Perception: An Interdisciplinary Journal, 2(3), 375-388.

118

Fitch, W.T. (2006). The biology and evolution of music: a comparative perspective. Cognition, 100(1), 173-215.

Forster, K. I., & J. C. Forster. (2003). DMDX. University of Arizona.

Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience, 16(6), 1010-1021.

Giuliano, R. J., Pfordresher, P. Q., Stanley, E. M., Narayana, S., & Wicha, N. Y. Y. (2011). Native Experience with a Tone Language Enhances Pitch Discrimination and the Timing of Neural Responses to Pitch Change. Frontiers in Psychology, 2, 146.

Gottfried, T. L., Staby, A. M., & Ziemer, C. J. (2004). Musical experience and Mandarin tone discrimination and imitation. The Journal of the Acoustical Society of America, 115(5), 2545.

Gouzouasis, P., Guhn, M., & Kishor, N. (2007). The predictive relationship between achievement and participation in music and achievement in core grade 12 academic subjects. Music Education Research, 9(1), 81-92.

Hsieh, L., Gandour, J., Wong, D., & Hutchins, G. D. (2001). Functional heterogeneity of inferior frontal gyrus is shaped by linguistic experience. Brain and Language, 76(3), 227-252.

Hung, T.-H. (2011). One Music? Two Musics? How Many Musics? Cognitive Ethnomusicological, Behavioral, and fMRI Study on Vocal and Instrumental Rhythm Processing (Doctoral dissertation). The Ohio State University, Columbus OH.

Huron, D. (1996). The melodic arch in Western folksongs. Computing in Musicology, 10, 3-23.

Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. (2000). Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123(1), 155-163.

Juhász, Z., & Sipos, J. (2010). A Comparative Analysis of Eurasian Folksong Corpora, using Self Organising Maps. Journal of Interdisciplinary Music Studies, 4(1), 1-16.

Kincaid, A. E., Duncan, S., & Scott, S. A. (2002). Assessment of fine motor skill in musicians and nonmusicians: differences in timing versus sequence accuracy in a bimanual fingering task. Perceptual and Motor Skills, 95(1), 245-257.

Kisilevsky, B. S., Hains, S. M., Lee, K., Xie, X., Huang, H., Ye, H. H., … Wang, Z. (2003). Effects of experience on fetal voice recognition. Psychological Science, 14(3), 220-224.

Klyn, N. A., Will, U., Cheong, Y. J., & Allen, E. T. (2016). Differential short-term memorisation for vocal and instrumental rhythms. Memory, 24(6), 766-791.

Kraus, N., & Banai, K. (2007). Auditory-Processing Malleability: Focus on Language and Music. Current Directions in Psychological Science, 16(2), 105-110. 119

Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nature Reviews Neuroscience, 11(8), 599-605.

Kraus, N., & Slater, J. (2015). Music and language: relations and disconnections. The Human Auditory System: Fundamental Organization and Clinical Disorders. Handbook of Clinical Neurology, 129, 207-222.

Kraus, N., Strait, D. L., & Parbery‐Clark, A. (2012). Cognitive factors shape brain networks for auditory skills: spotlight on auditory working memory. Annals of the New York Academy of Sciences, 1252(1), 100-107.

Lee, C.-Y., & Hung, T.-H. (2008). Identification of Mandarin tones by English-speaking musicians and non-musicians. Journal of the Acoustical Society of America, 124(5), 3235-3248.

Lee, Y. S., Janata, P., Frost, C., Hanke, M., & Granger, R. (2011). Investigation of melodic contour processing in the brain using multivariate pattern-based fMRI. Neuroimage, 57(1), 293- 300.

Lenhoff, H. M., Perales, O., & Hickok, G. (2001). Absolute pitch in Williams syndrome. Music Perception: An Interdisciplinary Journal, 18(4), 491-503.

Lévêque, Y., & Schön, D. (2013). Listening to the human voice alters sensorimotor brain rhythms. PloS One, 8(11), 1-9.

Levitin, D. J., & Zatorre, R. J. (2003). On the nature of early music training and absolute pitch: A reply to Brown, Sachs, Cammuso, and Folstein. Music Perception: An Interdisciplinary Journal, 21(1), 105-110.

Levy, D. A., Granot, R., & Bentin, S. (2001). Processing specificity for human voice stimuli: electrophysiological evidence. Neuroreport, 12(12), 2653-2657.

Loui, P., Bachorik, J. P., Li, H. C., & Schlaug, G. (2013). Effects of voice on emotional arousal. Frontiers in Psychology, 4, 675.

Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience, 18(2), 199-211.

Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of Cognitive Neuroscience, 23(10), 2701-2715.

Marslen-Wilson, W. D. (1987). Functional parallelism in spoken word- recognition. Cognition, 25(1), 71-102.

120

Mattei, T. A., Rodriguez, A. H., & Bassuner, J. (2013). Selective impairment of emotion recognition through music in Parkinson's disease: does it suggest the existence of different networks for music and speech prosody processing? Frontiers in Neuroscience, 7, 161.

Matthew, A. (2017). Can the talking drum really talk. Retrieved from Africanstylesandculture.com/2017/03/07/can-the-talking-drum-really-talk/

McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18(1), 1-86.

Meyer, J. (2015). Whistled languages. Heidelberg: Springer Verlag.

Mithen, S. (2005). The Singing Neanderthals: The Origins of Music, Language, Mind and Body. London: Weidenfeld & Nicholson.

Moore, D., & Meyer, J. (2014). The study of tone and related phenomena in an Amazonian language, Gavião of Rondônia. Language Documentation and Conservation, 8, 613–636

Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity. Cerebral Cortex, 19(3), 712-23.

Morton, J. (1970). A functional model of memory. In D. A. Norman (Ed.), Models of human memory (203–260). New York: Academic Press.

Musacchia, G., Strait, D., & Kraus, N. (2008). Relationships between behavior, brainstem and cortical encoding of seen and heard speech in musicians and non-musicians. Hearing Research, 241(1-2), 34–42.

Nettl, B. (1995). Heartland Excursions: Ethnomusicological Reflections on Schools of Music. Urbana: University of Illinois Press.

Ruckert, G., & Widdess, R. (1998). Hindustani raga. In A. Arnold (Ed.), The Garland Encyclopedia of World Music: Volume 5. South Asia: The Indian Subcontinent (90-114). New York: Garland.

Oreskovich, K. (2016). 5 ways drums are used to communicate. Retrieved from https://www.omahaschoolofmusicanddance.com/5-ways-drums-are-used-to-communicate/

Patel, A. D. (2008). Music, Language, and the Brain. New York: Oxford University Press.

Patel, A. D., Peretz, I., Tramo, M., & Labreque, R. (1998). Processing prosodic and musical patterns: a neuropsychological investigation. Brain and Language, 61(1), 123-144.

Patterson, K. E., & Shewell, C. (1987). Speak and spell: Dissociations and word class effects. In M. Coltheart, S. Sartori, & R. Job (Eds.), The Cognitive Neuropsychology of Language (273- 295). London: Erlbaum. 121

Peretz, I., & Coltheart, M. (2003). Modularity of music processing. Nature neuroscience, 6(7), 688-691.

Peretz, I., Vuvan, D., Lagrois, M. É., & Armony, J. L. (2015). Neural overlap in processing music and speech. Philosophical Transactions of the Royal Society B, 370(1664), 1-8.

Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone language speakers. Attention, Perception, & Psychophysics, 71(6), 1385-1398.

Plantinga, J., & Trainor, L. J. (2005). Memory for melody: Infants use a code. Cognition, 98(1), 1–11.

Poss, N., Hung, T.H., Will, U. (2008). The Effects of Tonal Information on Lexical Activation in Mandarin Speakers. Proceedings of the 20th North American Conference on Chinese linguistics, Columbus, OH, 205-211.

Poss, N., Will, U. (2007). The role of pitch in the processing of tonal languages: new evidence on the connection between music and language. Paper presentation at the SEM symposium ‘New Directions in Cognitive ’, Columbus, OH.

Poss, N. F. (2012). Hmong music and language cognition: An interdisciplinary investigation (Doctoral dissertation). The Ohio State University, Columbus OH.

Protopapas, A. (2007). Check Vocal: A program to facilitate checking the accuracy and response time of vocal responses from DMDX. Behavior Research Methods, 39(4), 859-862.

Reisberg, D. (2013). The Oxford handbook of cognitive psychology. New York: Oxford University Press.

Rialland, A. (2005). Phonological and phonetic aspects of whistled languages. Phonology, 22, 237–271.

Roebuck, R., & Wilding, J. (1993). Effects of vowel variety and sample length on identification of a speaker in a line-up. Applied Cognitive Psychology, 7, 475 – 481.

Salamé, P., & Baddeley, A. (1989). Effects of on phonological short-term memory. The Quarterly Journal of Experimental Psychology, 41(1), 107-122.

Schön, D., Magne, C., & Besson, M. (2004). The music of speech: Music training facilitates pitch processing in both music and language. Psychophysiology, 41(3), 341-349.

Scott, S. K., & Johnsrude, I. S. (2003). The neuroanatomical and functional organization of speech perception. Trends in Neurosciences, 26, 2, 100-107.

Sebeok, T. A., & Umiker-Sebeok, D. J. (1976). Speech surrogates: drum and whistle systems. The Hague: Mouton.

Spencer, H. (1857). The origin and function of music. Fraser’s Magazine, 56, 396-408. 122

Stone, G. O., & Van Orden, G. C. (1989). Are words represented by nodes? Memory and Cognition, 17, 511–524.

Stone, G. O., & Van Orden, G. C. (1993). Strategic control of processing in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 19, 744–774.

Strait, D. L., Parbery-Clark, A., Hittner, E., & Kraus, N. (2012). Musical training during early childhood enhances the neural encoding of speech in noise. Brain and Language, 123(3), 191-201.

Stumpf, C. (1911). Die Anfänge der Musik. Leipzig: Verlag von Johann Ambrosius Barth.

Su, S. (n.d.). My First Visit to the Chibi Cliff. Retrieved from http://www.slkj.org/b/21634.html

Sun, C. Y. (1988). The relationship between tonal information and melody in Chinese opera. In Y. L. Yang (Ed.), Language and Music (97-120). Taipei: Danqing Press.

Tillmann, B., Burnham, D., Nguyen, S., Grimault, N., Gosselin, N., & Peretz, I. (2010). Congenital amusia (or tone-deafness) interferes with pitch processing in tone languages. The Relationship between Music and Language, 2, 1-15.

Town, S. M., & Bizley, J. K. (2013). Neural and behavioral investigations into timbre perception. Frontiers in Systems Neuroscience, 7, 88.

Trainor, L. J., Desjardins, R. N., & Rockel, C. (1999). A Comparison of Contour and Interval Processing in Musicians and Nonmusicians Using Event‐Related Potentials. Australian Journal of Psychology, 51(3), 147-153.

Trehub, S. (2003). Musical predispositions in infancy: an update. In I. Peretz & R. Zatorre (Eds.), The Cognitive (3–20). Oxford: Oxford University Press.

Trehub, S. E. (2011). Music lessons from infants. In S. Hallam, I. Cross, & M. Thaut (Eds.), Oxford handbook of (229-234). Oxford: Oxford University Press.

Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants' perception of melodies: The role of melodic contour. Child development, 55, 821-830.

Trehub, S. E., & Trainor, L. (1998). Singing to infants: Lullabies and play songs. Advances in infancy research, 12, 43-78.

Wang, R.M., Yang, J., & Li, L. (2016). Second Language Learning. Shanghai: East China Normal University Press.

Wang, Y. (2016). Mismatches in Fang’s and Lin’s lyrics. Candidacy exam paper. The Ohio State University.

Wang, Y. H. (2006). A study on traditional Chinese score. Fuzhou: Fujian educational press.

123

Weiss, M. W., Trehub, S. E., & Schellenberg, E. G. (2012). Something in the way she sings: Enhanced memory for vocal melodies. Psychological Science, 23(10), 1074-1078.

Wiener, S., & Turnbull, R. (2016). Constraints of tones, vowels and consonants on lexical selection in Mandarin Chinese. Language and Speech, 59(1), 59-82.

Will, U. (2017). Cultural Factors in Response to Rhythmic Stimuli. In J. Evans, & R. Turner (Eds.), Rhythmic Stimulation Procedures in Neuromodulation (279-306). San Diego: Academic Press.

Will, U. (2018). Temporal Processing and the Experience of Rhythm: A Neuro-psychological Approach. In A. Hamilton & M. Paddison (Eds.), The Nature of Rhythm: Aesthetics, Music, Dance and Poetics (in press). Oxford: Oxford University Press.

Will, U., & Poss, N. (2008). The role of pitch contours in tonal languages processing. Proceedings of The Fourth International Conference on Speech Prosody, Campinas, Brazil, 309- 312.

Will, U., Poss, N., & Hung, T.H. (submitted). Melodic contours prime speech processing in tone- language and non-tone-language speakers.

Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10(4), 420-422.

Zatorre, R. J., & Belin, P. (2001). Spectral and temporal processing in human auditory cortex. Cerebral Cortex, 11(10), 946-953.

Zatorre, R. J., & Gandour, J. T. (2008). Neural specializations for speech and pitch: moving beyond the dichotomies. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 363(1493), 1087-1104.

124