Lexical Tone Perception in Chinese Mandarin

Anna Björklund

University of Florida

1

ABSTRACT

Perceptual asymmetry, where Stimulus A is more often confused for B than B is for A or vice versa, has been observed in multiple lexical contexts, such as (Polka & Bohn,

2003) and (Dar, Mariam & Keren-Portnoy, 2018). Because historically perceptual space was assumed to be Euclidean, perception of stimuli was in turn assumed to be symmetrical, and observations of asymmetry were explained as simply response bias (Polka &

Bohn, 2003). However, further examination of such biases has suggested that they are a much more fundamental occurrence. This study examines the effects of memory load and native on perceptual bias in lexical tone perception of Mandarin Chinese for native speakers of Mandarin and English. Native speakers of both were given an AX categorical discrimination task with each combination of the four lexical Mandarin tones. To examine the effects of training on this bias, they were then trained on one of two tones (1 and 4) and given the same sequence of discrimination tasks again. Each pre- and post-training discrimination task featured both 250 ms and 1000 ms ISI intervals.

2

0. INTRODUCTION TO PERCEPTUAL ASYMMETRY

It is often assumed that perception is symmetrical: that is, when presented with two stimuli, A and B, it is equally as easy to discriminate going from stimuli A to B as going from

B to A. Although there had long been data that conflicted with such a model, the aberrant data was explained as response bias (Polka and Bohn, 2003), and subsequently ignored. It was not until the mid-1970s, when research began challenging the assumption of perception as

‘Euclidean’ (Polka and Bohn, 2003)—that is, straightforwardly one-to-one— that new models of perception were considered and previously anomalous data re-contextualized within this new framework. Whereas the ‘Euclidean’ model assumed perception to be symmetrical, this new model of perception posited that perception was asymmetrical. In this paradigm, given stimuli A and B, it is not assumed that distinguishing stimulus A from B is equally as easy as distinguishing B from A. That is, participants may more accurately categorize stimuli when A is the target stimulus, rather than when B is.

Although this paper is concerned with asymmetrical perception in the auditory domain

(specifically, within speech perception), perceptual asymmetry has been reported across many domains, with the best documented being perhaps the visual one. Biases in the visual system have been noted as far back as the 1960s, when Campbell and Kulikowski observed that the visual system is better at horizontal and vertical discrimination than it is at oblique discrimination—that is, objects are more easily discriminable at vertical and horizontal angles than they are at angles, of, for example, 45 degrees (Campell and Kulikowski, 1966). Likewise, perceptual asymmetries have been noted for phenomena such as a preference for foveal (frontal) stimuli over peripheral stimuli (Karim and Kojima, 2010). Even within the strict domain of

3

horizontal/vertical visual perception, studies have shown that contrast sensitivity and spatial resolution are better along the horizontal midline (Carrasco, Talgar, and Cameron, 2001).

Visual asymmetries have also been demonstrated in the perception of texture --it is easier to find a ‘broken’ circle in an array of closed circles versus discriminate a closed circle in an array of ‘broken’ circles (Williams and Julesz 1992)-- as well as in the recognition of faces. In a 1973 study by Gilbert and Bakan, multiple volunteers’ faces were photographed, and then ‘reassembled’ into two new photographs, such that each new photo was comprised only of one half of the original face (a ‘left’ face would, for instance, be comprised of the normal left half of the original photograph and a reversed left half to serve as the ‘right’ side of the face). When participants were asked to compare these new photos to the original photo and rank which one best resembled the original, participants overwhelmingly chose the photo made up of the right half of the volunteers’ faces (Gilbert and Bakan, 1973).

The robustness of the literature in regards to visual asymmetrical perception also provides important speculation as to the reason for why such asymmetry exists in the first place.

Notably, the above literature primarily cites biology as the driving force. For example, foveal vision is believed to be favored over peripheral vision due to the lower presence of cones in the periphery of the cornea (Karim and Kojima, 2010). Perhaps even more helpful to the present project is Williams and Julesz (1992)’s discussion of the asymmetrical perception of visual texture, in which they state that their “results suggest that visual processing of textures is not limited to simply the collective responses of isolated elements. Instead, it appears that an active visual process encodes the stimulus without necessarily keeping the individual elements intact.

The completion of local boundaries into global structures (subjective contours) is fundamental to the process of texture perception” (emphasis mine, pp. 6533). That is, rather than encoding every feature of every stimulus, the visual system ‘sorts’ stimuli into broader super-perceptual categories.

4

1. PROTOYPE THEORY AND THE PERCEPTUAL MAGNET EFFECT

Before prototype theory was introduced in the 1970s, the traditional definition of a category was essentially an item that matched a certain set of definitions. An item was either a member of a category or it was not, with little room for gradation (Rosch, 1973). However, beginning in the 1970s, Eleanor Rosch and other researchers began to suggest a new categorization paradigmː that of prototypicality. In this paradigm, there is not necessarily a strict in-/out-of-category binary. Rather, “categories [are] typically composed of distributions of attributes, and some instances of categories [are] designed to be more ‘typical’ members of the category than others” (Rosch, 1973, pp. 329). In such a paradigm, category members are not defined as merely having a certain list of features. Rather, they may individually possess those features to a greater or lesser degree. Therefore, some category members may be considered more “prototypical” than others, even though they all are de facto members of the category.

One of the most well-known theories of perceptual categorization and application of prototype theory is the perceptual magnet effect, introduced by Patricia Kuhl in a 2001 study.

The study consisted of four experiments. In Experiment 1, Kuhl asked adult English-speaking participants to “rank” a series of vowels from “best” to “worst” in a given category. She discovered that these rankings were not random, but rather that the “best” example of a category was consistently accorded to a specific area of the space. Experiment 2 used the results of this survey to create an AX study where participants were presented with a once-per-second stream of sounds that alternated randomly between a “referent” speech sound and a

“comparison” speech sound. Participants were then asked to press a button when they heard a change in the “referent” sound—that is, when the sound had changed to the other, “comparison”

5

sound. The study found that participants were more willing to overgeneralize (and thus give inaccurate button-pushes) when the “prototype” sounds from Experiment 1 were used as the

“referent” sound. Experiment 3 replicated the conditions of Experiment 2, except with infants conditioned for the head-turn procedure. These infants were presented with the same

“prototype” and “non-prototype” vowels as the adults, and their results were highly correlated with the adults’ as well.

In light of this data, Kuhl 2001 proposes the presence of a “perceptual magnet effect”: not only are certain stimuli within a category considered “better” or “more prototypical” than others, but that these prototypical stimuli have a tendency to “assimilate” the other, less- prototypical stimuli. As Kuhl 2001 puts itː “Surrounding members of the category are perceptually assimilated to it to a greater degree than would be expected on the basis of real psychophysical distance. Relative to a non-prototype of the category, the distance between the prototype and surrounding members is effectively decreased; in other words, the perceptual space appears to be ‘warped,’ effectively shrunk around the prototype” (pp. 99).

In other words, prototypes hold greater perceptual weight than non-prototypes in a speaker’s mind. Thus, although two vowels may differ equally with respect to a given stimuli, they may not be perceived by the listener as being equally different if one of these vowels is a prototype. Listeners are more willing to over-generalize non-prototypes and categorize them as variants of the prototypical category than they are willing to sort prototypes into non- prototypical categories. Prototypes could then be said to exhibit a sort of masking effect on non-prototypes, making them harder to distinguish when compared to the prototype. This is a fundamental theory used in explaining perceptual asymmetry, which will be explored later in this paper.

6

Another fundamental observation noted in Kuhl 2001 is that prototypes do not merely exist within one person’s perception. Rather, they appear to have a measure of universality.

The adults in Experiment 1 consistently and collectively grouped certain stimuli as “better” than others, and these “better,” more prototypical prototypes were then demonstrated to be easier to distinguish against non-prototypes than the reverse, across multiple adults and infants.

2. PERCEPTUAL ASSYMMETRY OF VOWELS

The study of symmetrical vowel perception began ostensibly with Polka and Werker

1994’s study of infant vowel perception across native/non-native vowel contrasts. The study’s initial purpose was not related to asymmetrical perception at all, but rather examining if and when infants’ phonetic perception of vowels and consonants began to be affected by native language sound inventories. The study examined English-learning infants’ ability to discriminate the German /u/–/y/ and /ʏ/ - /ʊ/ contrasts, embedded in the carrier word dVt.

However, examining the results of the study, Polka and Bohn also noticed that in conducting their initial screening of the infants to determine which could reliably be conditioned for the head-turn procedure used in the experiment, only 35-40% of infants reached the discrimination criterion. This was markedly lower than the typical 80-90% average success rate in screening infants for similar studies (Polka and Bohn, 1994).

In trying to understand this phenomenon, the researchers turned to the perceptual magnet effect first mentioned by Patricia Kuhl in 1991, wherein some members of a category are considered more “prototypical” of that category than others, and demonstrate a “magnetic” effect on perception wherein the prototype assimilates neighboring members of the category.

Polka and Bohn examined the infants who had passed the discrimination criterion for the non- native contrast and noticed that almost all of them had been trained on the vowel pair in the

7

same order. This order was such that NP  P (that is, the pair began with a “non-prototypical” sound and moved towards a “prototypical” sound). Likewise, most of the infants who did not pass the discrimination criterion were tested on vowel pairs such that P  NP.

To further investigate this phenomenon, Polka and Bohn investigated vowel discrimination among native and non-native vowel pairs in English- and German- learning infants (1996). In this study, infants of both languages were tested on one English vowel contrast (/ɛ/ - /æ/) and one German vowel contrast (/u/ - /y/). The hypothesis was that perceptual asymmetries would be observed in infants’ respective non-native contrasts, but not native ones, because native speakers of a language typically perform near-perfectly across discrimination tasks involving native phonemes that are necessary for word meaning. However, it was the opposite of this prediction that bore outː infants from both languages demonstrated the same asymmetrical bias across English and German vowel pairs. That is, for the German /u/ - /y/ contrast, both groups of infants found /y/  /u/ to be more discriminable than vice versa; for the English /æ/ - /ɛ/ contrast, both groups found /ɛ/  /æ/ to be more discriminable. (In terms of the perceptual magnet effect theory, we would therefore say that within the /y/ - /u/ pair, /u/ is more “prototypical” than /y/, and that within /ɛ/ - /æ/ pair, /æ/ is more prototypical.)

Discussion of specific prototypes aside, what is significant about the Polka and Bohn

1996 study is its discovery that prototypes can exist across languages, irrespective of native language bias. This result was unexpected precisely because previous theories largely operated under the assumption that perceptual biases were the result of language-specific higher-level processes, such that a speaker would typically only display biases for information that is “new” and not automatically sorted into existing native-language categories. However, the Polka and

Bohn 1996 study instead suggests a biological or otherwise physical component to the phenomenon of asymmetrical perception—something that infants across languages would share as either part of their biological hardware and/or because of some inherent property of

8

the speech signal. This conception of bias as having an important biological component dovetails neatly with the previous studies in asymmetrical visual perception discussed in the previous section.

Polka and Bohn went on to conceptualize a framework for the biases observed in the previous literature. This framework, called the Natural Vowel Referent (NVR) framework, posits that prototypical vowels are universal due to their “peripheral” position in the vowel space (Polka and Bohn 2010). Within this framework, the prototypicality of vowels is therefore decided by inherent properties of the speech signal, most notably F1 and F2 values.

The NVR framework was created after examining data from three experiments. The first looked at Danish adult perception of the English /ɒ/ - /ʌ/ vowel contrast. /ɒ/ is a back open rounded vowel, and /ʌ/ is a back open-. The study found that /ʌ/  /ɒ/ was preferred over /ɒ/  /ʌ/; that is, the more prototypical vowel was the one situated further to the periphery of the vowel space. In the second experiment, Polka and Bohn examined adult English perception of German vowels /u/ - /y/ and /ʊ/ - /ʏ/. /u/ is a close back rounded vowel, and /y/ a close ; /ʊ/ is a near-close back rounded vowel, and /ʏ/ is a near-close front rounded vowel. In accordance with the NRV hypothesis, the study found that English adults made significantly fewer errors when going from less peripheral vowels (/ʊ/ - /ʏ/) to more peripheral (/u/ - /y/). The study also tested two groups of -speaking adults on the same German contrasts; these Cantonese speakers also demonstrated the same asymmetry as the English speakers. The only data that did not strictly bear out the NVR framework was in examining Danish infants’ discrimination of the Danish /e/ - /ø/ contrast. In this experiment, the infants found it easier to discriminate the contrast going in the direction of /e/  /ø/.

However, it should be noted that /e/ and /ø/ are mid-close vowels, differing in .

9

3. INTRODUCTION TO MANDARIN TONE

Lexical tone is defined as the use of pitch to distinguish lexical items (Maddieson, 2013).

It is considered a supersegmental feature that differs from intonation. While intonation conveys information such as sentence type and other information related to information structure, lexical tone is merely used to differentiate lexical items. Tonal languages are found throughout the world, including Southeast Asia, East Asia, Africa, and the Americas. In a survey of 526 of the world’s languages conducted by the World Atlas of Language Structures, 220 (41.8%) of the languages surveyed were tonal, with this overall estimate likely being lower than in reality (Maddieson, 2013). Tones are broadly understood to exist within two categoriesː level and contour (Maddieson, 2013). Level tones maintain the same pitch throughout their duration; contour tones do not.

Chinese Mandarin is a Sino-Tibetan language typically analyzed as having four tones.

The F0 (pitch) contour of each tone determines its category (Tsao, 2008). Traditionally, these tones are numbered as Tone 1, 2, 3, and 4. Tones may also commonly be annotated using a different numerical system that seeks to roughly approximate the contour of the tone itself, using numbers on a scale of 1 (lowest pitch) to 5 (highest pitch). In such a system, Tone 1 is represented as [55], Tone 2 as [35], Tone 3 as [214], and Tone 4 as [51] (Wang and Zhu 2015).

Consequently, Tone 1 is typically described as a “high-level” tone, Tone 2 as a “rising” tone,

Tone 3 as a “low-rising” tone, and Tone 4 as a “falling” tone. When tone is romanized in ,

Tone 1 is represented with a macron (mā),Tone 2 by an acute (má), Tone 3 by the caron (mǎ), and Tone 4 by a grave (mà) accent.

A notable phenomenon in Mandarin that is particularly important for our purposes is that of tone sandhi. “Sandhi” refers to a phonological process whereby one underlying tone is

10

produced as another in certain environments. In the case of Mandarin, the most common is the

2-3 tone sandhi rule, wherebyː

[Tone 3]  [Tone 2] / __ [Tone 3]

That is, Tone 3 changes to Tone 2 when it precedes another Tone 3. We observe this sandhi pattern in common sentences such as 你好 ‘Hello’, where the underlying structure Nǐ hǎo becomes Ní hǎo.

Another common sandhi pattern applies to the numeral 一 (yī) ‘one’, whereby:

[Tone 1]  [Tone 2] / __ [Tone 4]

As well as:

[Tone 1]  [Tone 4] / __[Tone 1].

And, lastly, the Tone 4 word 不 (bù) ‘not’, becomes a second tone when followed by another fourth tone.

Tone sandhi is particularly relevant to questions of lexical tone perception because it is a phonological process which nonetheless may be at work when observing perception of AX combinations of tones. If we observe dramatic differences in particular native and non-native tone contrasts, it may be fruitful to consider the effects of sandhi on native perception, and therefore also consider at which level the perception is occurring—phonetic or phonological.

4. PERCEPTUAL ASSYMETRY OF TONE

Tsao (2008) noted that around 12 months of age, infants begin to demonstrate signs of language-specific speech perception. That is, infants begin to increase sensitivity to phonetic contrasts used in their native language, while their sensitivity to such contrasts in non-native

11

languages decreases. Presumably, this is intended to decrease the functional load required of a speaker to perceive all of the necessary phonemes for making meaningful contrasts in their L1.

However, this state of affairs is not strictly permanent. Tsao (2008) also notes that exposure to non-native phonemes, even temporarily, increases a speaker’s ability to discriminate those non-native phonemes. Thus, it is possible to train a person to “re-discover” their natal discriminatory abilities. This state of affairs was also incidentally noted in Polka and

Bohn (2011), during their study testing adult perception of native and non-native perception of

German vowel contrasts. In testing for English and German speakers’ perception of the German

/u/ - /y/ and /ʊ/ - /ʏ/ contrast, Polka and Bohn also tested two groups of Cantonese adults. What they discovered was that Cantonese adults not trained in demonstrated the same asymmetries as the English group that was tested on German, but that the Cantonese group trained in transcription did not demonstrate the asymmetry. Asymmetries are more often observed (or more strongly observed) in non-native speakers, as native speakers typically demonstrate near-perfect performance in contrasts that are important to distinguishing meaning in the language. This indicates that after training, Cantonese speakers were made to exhibit similar perceptual acuity to native German speakers. These results further the claim that a person’s perception of a language is not permanent, and can be influenced to a greater or lesser degree of “native-ness.”

Much research has been devoted to examining the categorical perception of tone in native Mandarin and native non-tonal language speakers. The bulk of the literature seems to agree that the two groups do not use the same cues to discriminate tone, and do not exhibit the same performance in tonal categorical perception. In a 1976 study, Wang asked speakers of

Mandarin and English to distinguish between Mandarin high-level tones and high-rising tones

(as cited in Wang and Zhu, 2015). He found that Mandarin speakers more accurately perceived differences across categories than within them; that is, Mandarin speakers performed better at

12

perceiving tones that were different, rather than the same. Conversely, English speakers were better at discriminating level and rising pitches, rather than two rising pitches that merely rose at different rates (common in many tonal systems as a phonemic tonal contrast). These results seem to suggest that English speakers are more sensitive to purely acoustic differences in tone, likely owing to a lack of meaning-making lexical tone in their own language. Meanwhile,

Mandarin speakers’ perception, which uses tone to actually distinguish meaning, has been influenced by phonology and higher-level processing (Wang and Zhu, 2015).

Several cues are used to perceptually discriminate pitch. Most broadly, speakers typically use two cuesː height and direction (Wang and Zhu 2015). Height refers to the average pitch of the pitch contour (“high”, “low”, etc.), whereas direction refers to the direction of pitch movement (“rising”, “falling”, etc.). Together, these two perceptual markers help indicate the overall tone contour. When pitch height is understood to be the average pitch across the entire contour, the order of the tones (from highest to lowest pitch) is Tone 1 = Tone 4 > Tone 2 >

Tone 3. Thus, the Tone 2/3 pair has relatively close pitch, whereas Tone 1/3 is the most dissimilar (Tsao, 2008). Tsao (2008) also notes that in Mandarin-learning children, Tones 2 and 3 are acquired more slowly than Tones 1 and 4, possibly reflecting Tones 2 and 3’s dissimilarity in average pitch height. Additionally, when processing pitch, speakers of tone languages place more weight on the tone contour’s direction than non-tone speakers (Wang and Zhu 2015).

When Mandarin-learning infants were tested on Mandarin tone contrasts in a head-turn procedure, Tsao (2008) found data to bear out the above observations. Mandarin infants were significantly better at discriminating the Tone 1 vs 3 contrast (M = 73.39%) than the Tone 2 vs

3 contrast (M = 60. 74%), and the Tone 2 vs 4 contrast (M = 57.81%), in the order predicted by average pitch height if pitch height were used as a main discriminatory cue. In a follow-up test, infants were found to exhibit directional asymmetry in only one contrast, Tone 1 vs 3, in

13

the direction Tone 1  Tone 3. In keeping with previous observations about vowel asymmetry, tonal asymmetry also seems linked at least partly to physical features of the speech signal for native speakers.

Interestingly, this order of tone discrimination does not strictly bear out for English- speaking adults. While English speakers also find Tone 1 vs 3 to be the easiest contrast to discriminate, they also discriminated the Tone 2 vs 4 contrast more easily than the Tone 2 vs 3 contrast (Tsao, 2008). Recall that Mandarin infants found Tone 2 vs 4 and Tone 2 vs 3 to be equally hard to discriminate. Tsao (2008) speculates that these data indicate that Mandarin- speaking infants weigh pitch height more than contour when discriminating tones, just as

English speakers were reported to do.

5. PURPOSE OF THE CURRENT STUDY

The object of the current study is to study the interaction between lexical tone asymmetry and native language, ISI, and training effects.

5.1 METHODS

Participants

The participants in this study were 5 native American English speakers from the

University of Florida (all female), 5 native Mandarin Chinese speakers from the University of

Florida (one male, four females), and 10 native Mandarin Chinese speakers living in Hong

Kong (four males and six females). The English speakers had no previous experience with tonal languages. None of the participants were reported as having a history of playing musical instruments, dyslexia, or hearing issues. One additional male Mandarin speaker from Hong

14

Kong was tested, but was excluded from the data presented here due to being a strong outlier.

All subjects were between the ages of 18-35.

Participants received either course credit or financial compensation for their time.

Stimuli

32 stimuli were created for this study. 16 stimuli were created with two male native speakers of Mandarin, and 16 identical stimuli were created with two female native speakers.

The recordings consisted of the male and female speaker pronouncing each Mandarin tone, realized on the syllable “ma”. These stimuli were then paired for use in an AX discrimination task, providing all 16 possible tone combinations: 1 x 1, 1 x 2, 1 x 3, 1 x 4, 2 x 1, and so on.

Male-produced tones were always paired with male-produced tones, and female-produced tones with female-produced tones.

5.2 PROCEDURE

Each trial consisted of four major parts. Participants were seated at a keyboard in a quiet room wearing sound-insulated headphones.

Practice Trial

Before beginning the main trials, participants were given a practice trial. During the practice trial, participants were presented with an AX discrimination task with ten stimuli. In an AX discrimination task, the participant was presented with one sound (“A”), then another

(“X”) and was asked to identify if X was the same or different than A. Participants were asked to press “1” on their keyboards if the sounds were the same, and “2” if they were different.

They were then presented with feedback indicating if their answer was correct or not. This task typically did not take more than a few minutes.

15

Pre-Training

After completing the practice trial, participants were presented with two pre-test trials, each at a different ISI: 250 ms and 1000 ms. ISI has been demonstrated to affect speech perception in non-native speakers. While there is some debate as to exactly how short a “short”

ISI need be, it is generally hypothesized that while longer ISIs allow time for a listener to access higher level modes of categorization (primarily phonological information), a short ISI only allows enough time for the listener to access more “basic”, phonetic perception (Wayland and

Li 2008). Thus a 250 ms ISI can reasonably be expected to stand in for phonetic-only level processing, while a 1000 ms ISI is representative of higher-level, phonological processing.

Under this logic, we should expect native English speakers to perform better under the 250 ms

ISI condition, as the 1000 ms ISI condition allows time for these listeners to access the phonological level (which in English does not contain tones) and potentially mis-map the tonal stimuli.

Participants were given the 250 ms pre-training trial and 1000 ms pre-training trial in randomized order. Each trial contained 144 stimuli: 18 trials of the “same-same” 1 x 1, 2 x 2,

3 x 3, and 4 x 4 tone pairs, and 6 trials of every other tone pair. The reason for this was to ensure that “same-same” tone pairs were equal in number to “same-different” / “different-same” tone pairs, in order to prevent random guessing. Participants were then asked to press “1” on their keyboards if the two tones per stimuli were the same, and “2” if they were different. Each

ISI trial took about ten to fifteen minutes to complete, and participants were not given feedback.

Training

After both pre-training trials, participants were given training on two tones: Tone 1 and

Tone 4. The order of training was randomized between all participants. Participants were given

50 stimuli per tone, and merely had to click through and listen to the tone repeatedly. They

16

were not asked to do any other task during the training. The next tone would not play unless the previous tone had completely finished playing.

Post-Training

After training, participants were given the same pre-training task described above. The order of the ISI trials was randomized across participants, but the same per participant. That is, if a participant was given 250 ms ISI and then 1000 ms ISI during pre-training, he or she would be presented with the ISI trials in the same order during the post-training.

17

6. RESULTS

The general results of the trials are presented below:

250 ISI-- English 100 95 90 85 80 75 70 65 60 55 50 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44

pre post

Figure 1: Tone pairs as a function of percentage correct for American English speakers tested at 250 ms ISI.

1000 ISI-- English 100 95 90 85 80 75 70 65 60 55 50 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44

pre post

Figure 2: Tone pairs as a function of percentage correct for American English speakers tested at 1000 ms ISI.

18

250 ISI-- Chinese 100 95 90 85 80 75 70 65 60 55 50 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44

Pre Post

Figure 3: Tone pairs as a function of percentage correct for Chinese Mandarin speakers tested at 250 ms ISI.

1000 ISI-- Chinese 100 95 90 85 80 75 70 65 60 55 50 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44

Pre Post

Figure 4: Tone pairs as a function of percentage correct for Chinese Mandarin speakers tested at 1000 ms ISI.

19

Simply from a cursory look at the data, we can see a rough sketch of a few key observations: performance seems to vary by L1, asymmetry is not present across every tone pair, and training seems to perhaps have a marginal effect. To more clearly understand the results, we will examine them for statistical significance. (For the rest of this paper, tones will be written simply as numbers, such that ‘11’ denotes the combination Tone 1 x Tone 1.)

These data were submitted to a repeated measure ANOVA with L1 (Mandarin and

English) and ISI (250 ms and 1000 ms) as the between-subject factors and Contrast and Time

(pre- and post-training) as the within-subject factors. Tests of between-subjects-effects yielded a significant main effect of L1 [F(1, 36)=50.51, p=.000] and Contrast [F(1, 7.6), = 70.07, p

=.000) but not of ISI [F(1, 36) = .31, p =.58] or Time [F(1, 36) =.20, p =.66] indicating that, overall, native Mandarin speakers outperformed native English speakers (97.6% vs 92.4%), and their performances vary as a function of contrast (ranging from 75% to 99.7%), but not of

ISI (94.8% for 250 ms ISI and 95.2% for 1000 ms ISI) or training (95.% for pre-training and

94.9% for post-training).

In addition, a significant two-way interaction between Contrast and L1 [F(1, 36) =

37.88, p = .000] and between Time and L1 [F(1, 36) = 4.83, p=.03] indicating that differences in the performance between Mandarin and English speakers depends on the contrast and test time (pre- and post-training). Follow-up tests (repeated measures ANOVAs) were performed to further examine these significant interactions.

For the Contrast x L1 interaction, the analyses revealed that for English speakers, their performance in some trials, notably the 11, 33, and 44 contrasts (during both pre-and post- training) were significantly worse than all other contrasts. On the contrary, no significant difference was found across any contrasts for Mandarin speakers. The only difference that

20

nearly reached significant was between contrast 11 versus contrasts 13 and 14 pre-training

(ps=.06).

For Time x L1 interaction, results of the follow-up tests (multivariate ANOVAs) revealed that native Mandarin speakers outperformed native English speakers on contrasts 11,

22 and 33 before training, and on contrasts 11, 22, 33 and 44 after training.

7. DISCUSSION

From the statistical analysis, we can see that although Mandarin speakers exhibited no significant tonal asymmetries (as expected), English speakers displayed asymmetries on contrasts 11, 22, and 33 before training, and 11, 22, 33, and 44 after training. Immediately noticeable in these data is how all of the pairs that English speakers demonstrated asymmetry with are within-category contrasts. This means that English speakers consistently ranked same- same tone pairs as different (when they in reality weren’t), but didn’t do the same in reverse for between-category, same-different tone pairs. That is, English speakers heard more tones pairs as “different” overall than really were different.

This observation seems to suggest that English speakers demonstrated a failure to properly categorize the stimuli into abstract, phonological categories. Recall that each tone was pronounced with four different speakers (two male, two female), and that therefore the exact acoustic signal of each tone, even those phonologically in the same tonal category, was not identical. A failure for English speakers to properly categorize these phonologically identical tones, with their slightly unique acoustic signals, into the same tonal category may have led them into mistaking two identical tones as different ones owing to over-sensitivity to acoustic cues. Without an understanding of what cues are not crucial for discrimination of the speech

21

signal, or of what deviations from the norm of a given cue are acceptable, listeners therefore flagged all acoustic differences as differences in tone category.

What can we make of English speakers’ success with tone 22 during the pre-tests, despite their failure with other within-category stimuli? If it seems as though English speakers’ general poor performance in within-category discrimination is indicative of a failure to form abstract tonal categories, then perhaps speakers’ success with 22 indicates that they are mapping this contrast to something present in English that is ‘standing in’ for a tonal category.

In English, tone is not used for lexical discrimination, but is used prosodically to convey elements of information structure. Tone 2, a rising tone that begins in the mid-range and ends in the high range, can be perhaps said to have a rough analogue in English rising intonation, commonly used in English for questions. Could it be that English speakers’ success with Tone

2 was due to mapping Tone 2 onto English rising intonation?

Notable also in the data is how after training, English speakers’ performance of 22 drops, leading to poor discrimination of all within-category pairs. What could explain this? The purpose of the training given to participants between trials was ostensibly so that participants’ greater exposure to the tones would allow them better mastery over their categorization. Yet, it seems like the opposite effect occurred—English listeners not only failed to categorize the tones they had previously had little success with, but failed to categorize the one tone pairing that they had had success with. Here, it seems as though English speakers not only failed to develop abstract tonal categories with the help of training, but because of this failure, their increased sensitivity to the acoustic differences between each instantiation of each tone lead to an increased judgement that each individual instantiation of each tone was unique, i.e. of a separate category.

22

In future research, we may want to examine the above hypothesis —wherein English speakers incorrectly categorized same-category tone pairs as different owing to speaker variation—by testing speakers with tone pairs spoken only by one speaker. Moreover, would speakers show preference for a particular gender? Would they find a female’s more discriminable than a male’s? Is there no difference? If there is no difference, and participants continue to demonstrate this within-category asymmetry even when all tokens are produced by the same speaker and all stimuli of a given tone are identical, then what might account for this asymmetry? It cannot be anything in the auditory signal itself. Although research on novelty in the auditory domain is relatively untapped, it has been noted in previous literature that novelty in the visual domain can enhance visual perception (Schomaker and Meeter, 2012).

In an experiment to test visual perception, Schomaker and Meeter (2012) presented participants with a series of Gabor patches in an oddball test (a Gabor patch is a stimulus in which a sinusoidal grating is presented under a Gaussian envelope, producing an effect where it appears as though several blurred bars are tilted at a given angle), and asked to report if the patches were vertically aligned or angled. The study found that the participants’ sensitivity for the novel patches increased in general as participants grew more familiar with the default, familiar cue. Although this is not a direct analogue to the question of the auditory perception of tones, it provides the start of another potential avenue for exploration and explanation.

Within the present study, English participants overall marked more tone pairs as different than they marked tone pairs the same. Could English speakers’ relative inability to discriminate between-category pairs have any explanation in a relative lack of novelty for such pairs?

Additionally, future research may want to explore questions of English to Mandarin prosody-to-tone mapping. Is it the case that English speakers can successfully utilize English prosody as an analogue to Mandarin tone contours? If they can do so, do they do so?

23

7. CONCLUSION

In light of these data, we conclude that tonal perception varies significantly as a result of native language, as well as tonal pair. It does not vary significantly as a result of ISI or training. As expected, Mandarin speakers do not exhibit significant tonal asymmetry presumably due to the demands of their native language requiring constant, near-perfect recognition of tones. However, English speakers exhibited significant asymmetry affects. In particular, English speakers found difficulty with contrasts 11, 33, and 44 before training, and contrasts 11, 22, 33, and 44 after training. Such a result is perhaps surprising, because although it demonstrates that asymmetrical tonal perception is significant among English speakers, it does not offer any particular insight into what perceptual cues English speakers may be using to distinguish tones since the tone pairs English listeners had difficulty with were within- category, same-same tone pairs.

Instead, the English speakers’ poor performance in discriminating within-category pairs seems to indicate a failure on the part of the English speakers to develop abstract tonal categories. In marking both true same-different pairs as well as same-same pairs as they perceived there to be more different pairs within the stimuli than there really were. A failure to develop abstract tonal categories would mean that English speakers did not have a framework for acceptable variation within a category, or perhaps even what a prototype of each category might be. Thus, while the English speakers correctly discriminated difference in tonal pairs, they did not correctly discriminate similarity. Connected to this observation is the possibility that English speakers’ failure to discriminate within-category pairs was influenced by the variety of speakers used to produce each stimulus. Further investigation should examine if the

24

English speakers’ asymmetries persist if all stimuli per category are identical and spoken by the same speaker.

In the case of English speakers’ success in discriminating the 22 pair before training, a possible explanation is that of mapping English rising intonation (used prosodically to signal questions) onto the Mandarin rising tone, thereby creating an ad-hoc tonal category paradigm.

This ad-hoc paradigm seems to disappear after training, and the training seems to over-sensitize

English listeners to individual acoustic cues without prompting the listeners to create abstract tonal categories. Thus, after training, English speakers lost their ad-hoc rising intonation/rising tone mapping paradigm and did not replace it with anything, leading to equally poor performance on the 22 pair, as with 11, 33, and 44. Future research should consider examining the potential relationship between mapping intonation to tone cross-linguistically.

ACKNOWLEDGEMENTS

I would like to acknowledge Ratree Wayland, Edith Kaan (and the University of Florida

Linguistics Lab), and all of the participants in this study. Thanks also to Caroline Wiltshire for further reviewing this paper and offering her extremely valuable edits and comments.

25

REFERENCES

Campbell, F and J J Kulikowski (1966). Orientational selectivity of the human visual system. Journal of physiology, 187, 437-45.

Cao, Rui, Ratree Wayland, and Edith Kaan. (2012). The role of creaky voice in Mandarin Tone

2 and Tone 3 perception. 13th Annual Conference of the International Speech Communication

Association 2012. 1. 426-429.

Carrasco M., Talgar C. P., Cameron E. L. (2001). Characterizing visual performance fields:

Effects of transient covert attention, spatial frequency, eccentricity, task, and set size. Spatial

Vision, 15, 61-75.

Dar, Miriam, et al. (2018). An order effect in English infants’ discrimination of an contrast. The Journal of , 67, 49-64.

Gilbert, Christoper and Paul Bakan. (1973). Visual Asymmetry in Perception of Faces.

Neuropsychologia, 11, 355-362.

I-Ping Wan, and Jeri Jaeger. (1998). Speech Errors and the Representation of Tone in Mandarin

Chinese. Phonology, 15, 417–461.

Karim, A. K. M. Rezaul and Haruyuki Kojima. (2010). The what and why of perceptual asymmetries in the visual domain. Advances in Cognitive Psychology, 6, 103-115.

Kuhl, PK. (1991). Human adults and human infants show a “perceptual magnet effect” for the prototypes of speech categories, monkeys do not. Perception and Psychophysics, 50, 93-107.

26

Ian Maddieson. (2013). Tone. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World

Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary

Anthropology.

Politzer-Ahles, et al. (2016). Asymmetries in the Perception of Mandarin Tones: Evidence

From Mismatch Negativity. Journal of Experimental Psychology. 42, 1547-70.

Polka, Linda, and Ocke-Schwen Bohn. (1996). A cross-language comparison of vowel perception in English learning and German-learning infants. Journal of the Acoustic Society of

America, 100, 577-92.

Polka, Linda, and Ocke-Schwen Bohn. (2003). Asymmetries in vowel perception. Speech

Communication, 41, 221-231.

Polka, Linda, and Ocke-Schwen Bohn. (2011). Natural Referent Vowel (NRV) framework: An emerging view of early phonetic development. Journal of Phonetics, 39, 467-478.

Polka, Linda and Janet F. Werker. (1994). Developmental Changes in Perception of Nonnative

Vowel Contrasts. Journal of Experimental Psychology: Human Perception and Performance,

20, 421-435.

Rosch, Eleanor. (1975). Cognitive representations of semantic categories. Journal of

Experimental Psychology, 104, 192-233.

Rosch, Eleanor. (1973). Natural Categories. Cognitive Psychology, 4, 328 – 350.

Schomaker, Judith, and Martijn Meeter. (2012). Novelty enhances visual perception. PloS one 7 (12).

Tsao, Feng-Ming. (2008). The Effect of Acoustical Similarity on Lexical-Tone Perception of

One-Year-Old Mandarin-Learning Infants. Chinese Journal of Psychology, 50, 111-124.

27

Tyler, et al. (2014). Perceptual Assimilation and Discrimination of Non-Native Vowel

Contrasts. Phonetica, 71, 4-21.

Wang, Caiyu, and Xiaonong Zhu. (2015). Tone Perception. The Oxford Handbook of Chinese

Linguistics, edited by William S.-Y. Wang and Chaofen Sun, Oxford University Press, 503–

525

Wayland, Ratree and Bin Li. (2008). Effects of two training procedures in cross-language perception of tones. Journal of Phonetics, 36, 250-267.

Williams, Douglas, and Bela Julesz. (1992). Perceptual Asymmetry in Texture

Perception. Proceedings of the National Academy of Sciences of the United States of America,

89, 6531–6534.

28