Macquarie University PURE Research Management System

This is the author version of an article published as:

Tsukada, K., & Kondo, M. (2019). The Perception of Mandarin Lexical Tones by Native Speakers of Burmese. and Speech, 62(4), 625–640.

Access to the published version: https://doi.org/10.1177/0023830918806550

Copyright 2019. Version archived for private and non-commercial, non- derivative use with the permission of the author/s. For further rights please contact the author/s or copyright owner.

The perception of Mandarin lexical tones by native speakers of Burmese

Kimiko Tsukada

Macquarie University, Australia

The University of Melbourne, Australia

University of Oregon, USA

[email protected]

Mariko Kondo

Waseda University, Japan

[email protected]

Editorial correspondence:

Kimiko Tsukada

Macquarie University

North Ryde NSW 2109

AUSTRALIA e-mail: [email protected]

Abstract

This study examined the perception of Mandarin lexical tones by native speakers of Burmese who use lexical tones in their first language (L1) but are naïve to Mandarin. Unlike Mandarin tones which are primarily cued by pitch, Burmese tones are cued by type as well as pitch. The question of interest was whether Burmese listeners can utilize their L1 experience in processing unfamiliar Mandarin tones. Burmese listeners’ discrimination accuracy was compared to that of Mandarin listeners and Australian English listeners. The Australian

English group was included as a control group with a non-tonal background. Accuracy of perception of six pairs (T1-T2, T1-T3, T1-T4, T2-T3, T2-T4, T3-T4) was assessed in a discrimination test. Our main findings are 1) Mandarin listeners were more accurate than non-native listeners in discriminating all tone pairs, 2) Australian English listeners naïve to

Mandarin were more accurate than similarly naïve Burmese listeners in discriminating all tone pairs except for T2-T4, and 3) Burmese listeners had the greatest trouble discriminating

T2-T3 and T1-T2. Taken together, the results suggest that merely possessing lexical tones in

L1 may not necessarily facilitate the perception of non-native tones, and that the active use of phonation type in encoding L1 tones may have played a role in Burmese listeners’ less than optimal perception of Mandarin tones.

Keywords: cross-language speech perception, phonation, Mandarin lexical tones, Burmese listeners, Australian English listeners

1. Introduction

Cross-linguistic processing of suprasegmental characteristics such as lexical tones is attracting increasing attention of researchers who are involved in speech science/technology and language pedagogy. For adults who are learning non-native , the impact of previous linguistic experience is particularly significant (Strange, 1995). However, it is not always clear what kind of linguistic experience helps or hinders cross-language speech processing. This study reports on the perception of Mandarin lexical tones by 16 native speakers of Burmese who were naïve to Mandarin. Their tone discrimination accuracy was compared to that of ten native Mandarin listeners and ten Australian English listeners. The results of these two control groups of listeners from native and non-native backgrounds were presented in our previous research (Tsukada et al., 2015, 2016; Tsukada & Han, in press).

The aim of the present study was to test the hypothesis that speakers of a tonal language such as Burmese are better able to process unfamiliar non-native Mandarin sounds compared with native speakers of a non-tonal language like English. This is because the Burmese speakers would be facilitated by their L1 tone knowledge. Below we provide a comparison of the

Burmese and Mandarin tonal systems, which motivated this study. Mandarin is one of the most representative tonal languages, with four lexical tones1. It differentiates the four tones

(Tone 1 (T1): high level (ā), Tone 2 (T2): high rising (á), Tone 3 (T3): dipping (ǎ), Tone 4

(T4): high falling (à)), which means that incorrect use of tone directly impacts on communication (e.g. 妈 mā in T1 ‘mother’ vs 马 mǎ in T3 ‘horse’ or 买 mǎi ‘to buy’ vs 卖 mài in T4 ‘to sell’). The four Mandarin tones are differentiated from each other in features such as pitch onset and shape (e.g. Li et al., 2017; Liu & Samuel, 2004; Xu, 1997). T3 is the most variable tone in its phonetic realization (e.g. Chang et al., 2016; Gandour, 1978; Hallé et al. 2004; Wang & Li, 1967). Depending on the context, it is variably realized as low dipping

1 Although there is the fifth neutral tone, it has a different function and will not be discussed in this paper.

rising, low dipping (without the rise) and as rising (T2) in the case of tone sandhi where it is directly followed by another T3 (e.g. Chang & Yao, 2016; Lin, 1985; Wang & Li, 1967), thus possibly adding to its perception and/or production difficulty. To alleviate non-native learners’ burden, Lin (1985) proposed a novel pedagogical approach, namely that T3 should be treated as the norm with the other tones as derivations from this norm.

Fundamental frequency (F0) is widely recognized as the primary cue for Mandarin lexical tones (e.g. Chuang et al., 1975; Gandour, 1978; Wang et al., 2006, see, however, Yang, 2015 for the role of phonation cues). Furthermore, it has been reported that Mandarin listeners weigh pitch direction more heavily than pitch height in tone identification (Gandour, 1983).

The perceptual relevance of other features such as duration and amplitude has been examined in several studies (e.g. Blicher et al., 1990; Liu & Samuel, 2004; Whalen & Yu, 1992), and there is general agreement that those cues are secondary to F0.

Burmese is also classified as a tonal language (e.g. Chang, 2009; Thein Tun, 1982; Watkins,

2001, 2003; Wheatley, 2009). While there is disagreement between different researchers as to the number of categories in the Burmese tone system (varying from three to five), the accepted wisdom seems to acknowledge four tones (Thein Tun, 1982; Wheatley, 2009), i.e. low, high, creaky and killed (Watkins, 2001, 2003). This makes Burmese and Mandarin comparable in terms of the tone inventory size, each with four categories.

Watkins (Figure 1, 2003) pointed out that the pitch contours of creaky and killed tones are very similar; both are short and fall sharply (Figure 1), similar to T4 in Mandarin. Following the Perceptual Assimilation Model (PAM) proposed by Best and her colleagues (Best, 1995), which aims to predict the discrimination levels of diverse non-native contrasts, Mandarin T4 may be assimilated to both creaky and killed tones by native Burmese listeners. If T4 is produced with a creaky , then Burmese listeners may associate it with a creaky tone rather than a killed tone. However, without cross-linguistic perceptual assimilation data, it is

difficult to predict if native Burmese speakers would assimilate Mandarin T4 equally well to these two Burmese tone categories, or to different degrees.

Figure 1 about here

Another striking feature of the Burmese tone system is the apparent lack of level tone, as none of the four tones seems to remain static (Figure 1). The low tone falls and the high tone rises throughout the tone-bearing . This raises the possibility that T1, which is the only level tone in Mandarin (Burnham et al., 2015; Reid et al., 2015), may not be readily assimilated to any of the L1 Burmese tone categories and that tone pairs involving T1 may pose some perceptual difficulties to the Burmese listeners depending on how (dis)similar T1 and the other member of the tone pair are perceived to be. However, it is also possible that both T1 and T2 are assimilated to a single Burmese high tone, i.e. Single Category assimilation according to the PAM (Best, 1995). Alternatively, following Flege’s Speech

Learning Model (1995, 2007), if the high-level tone is not “equated” with any existing L1 sounds, and instead is perceived as a “new” (as opposed to “similar”) L2 sound by Burmese listeners, we might predict the formation of a new L2 category for T1, and tone pairs including T1 may be accurately distinguished in the long run.

Furthermore, the way in which lexical tones are encoded seems to differ substantially between Burmese and Mandarin; the former employs phonetic properties other than F0, such as phonation type and duration, to a greater extent than the latter (e.g. Brunelle & Kirby,

2016; Thein Tun, 1982; Watkins, 2001; Wheatley, 2009). Some scholars even regard

Burmese as a rather than a tonal language (Bradley, 1982). Thus, the two languages may differ in informativeness of pitch variations.

Most previous research on the processing of Mandarin lexical tones involved speakers of other tonal languages (e.g. Cantonese: Gandour, 1983; Lee et al., 1996; So & Best, 2010;

Hmong: Wang, 2013; Thai: Li, 2016; Li et al., 2017; Wu et al., 2014; Vietnamese: Li et al.,

2017) or non-tonal European languages (e.g. Dutch: Leather, 1987; English: Hao, 2012,

2018; Lee et al., 1996; Pelzl et al., in press; Shen & Froud, 2016; So & Best, 2010; Wang et al., 1999; Wang, 2013; French: Hallé et al., 2004; German: Peng et al., 2010; Swedish: Gao,

2016). There have been a few exceptions (So & Best, 2010; Tsukada et al., 2016; Wang,

2013 for Japanese; Tsukada & Han, in press; Zhang, 2016 for Korean), but to our knowledge, this is the first study to empirically assess Burmese listeners’ cross-linguistic perception of

Mandarin tones. In the study, Burmese listeners’ perception accuracy was compared to that of native Mandarin listeners and Australian English listeners. These two groups were included to serve as controls and their results were reported previously (Tsukada et al., 2015, 2016;

Tsukada & Han, in press). In this study, we focused on the Burmese group with a view to adding to our current understanding of the manner in which suprasegmental features such as lexical tones are processed, by conducting an empirical cross-linguistic study on the perception of non-native tones. In sum, Burmese was selected, because Burmese tones are cued by phonation type as well as pitch whereas Mandarin tones are primarily cued by pitch, even though the two languages may be comparable in the tone inventory size (i.e. four categories each). We were interested in finding out if and how the unique feature of Burmese, i.e. active use of phonation type, may affect cross-linguistic perception of non-native tones by native Burmese listeners.

Of relevance to the present study, naïve, non-native speakers have been shown to differ from native speakers in their perception/production of Mandarin lexical tones with the former frequently confusing T2 and T3 (e.g. Hallé et al. 2004; Huang & Johnson 2010; Lee et al.

1996; So & Best 2010; Tsukada et al., 2015, 2016; Tsukada & Han, in press; Wang et al.,

1999). However, there is still lack of consensus as to whether the role of knowledge and experience of L1 tones facilitates or inhibits, as briefly reviewed below.

Li (2016) found that native Thai speakers were better than native speakers of British English at identifying Mandarin lexical tones, even though neither group had prior exposure to

Mandarin. Lee et al. (1996) also found that native Cantonese speakers who had experience of lexical tones in their L1 outperformed native speakers of American English in distinguishing different Mandarin tones. However, in the same study, the reverse did not hold true when native Mandarin speakers listened to Cantonese tones. The difference in the results was attributed to the difference in complexity between the Cantonese and Mandarin tonal systems, with the former being more complex than the latter. Apart from L1 tonal experience, there are also studies that have examined the effect of musical experience on the processing of non-native tones (e.g. Cooper & Wang, 2012; Lee & Hung, 2008; Li & DeKeyser, 2017).

In general, music experience was found to facilitate the processing of non-native tonal languages including Mandarin (see, however, Zhao & Kuhl, 2015).

On the other hand, So and Best (2010) found no significant difference between Canadian

English and Cantonese-speaking listeners in their identification of Mandarin tones. Wang

(2013) also found that native speakers of Hmong, a language with seven tone categories, identified Mandarin lexical tones less accurately than native speakers of English and

Japanese. However, despite initial difficulties, the Hmong speakers’ perception improved after training and they were not disadvantaged by their L1 tonal system in learning Mandarin tones compared to the English and Japanese groups of learners. Focusing on both tones and intonation of Mandarin, Yang and Chan (2010) found no effect of Mandarin learning experience (i.e. native vs non-native listeners) on the identification of intonation signaling statements or questions as a function of lexical tones. Somewhat surprisingly, native listeners were not necessarily better than American learners in the identification of question

intonation. Specifically, first-year learners were better than advanced learner and native listener groups in the identification of question intonation. On the other hand, native listeners were more accurate than learner groups in the identification of statement intonation. A lack of positive L1 transfer effect was also demonstrated by Wu et al. (2012), who found that

Mandarin and English groups did not differ in the identification of Japanese pitch accent.

This was despite the expectation that the Mandarin listeners who use pitch variations linguistically as lexical tones in their L1 may be more efficient than the English listeners in processing pitch accent in Japanese.

If it is the case that pitch variations are less informative in Burmese than in Mandarin, due to its functional use of phonetic cues other than pitch, then the need to reinterpret unfamiliar

Mandarin tones with respect to existing L1 tone categories may pose a significant challenge to Burmese listeners. On the other hand, the absence of L1 tone categories may not necessarily disadvantage Australian English listeners’ cross-linguistic tonal perception, because it might enable them to evade possible L1 interference and so aid unbiased speech processing. In fact, some researchers observed that English (non-tonal language) speakers made greater post-training progress than Mandarin (tonal language) speakers in recognizing

Cantonese tones (Francis et al. 2008). Thus, we predict that, while both groups of naïve listeners would be less accurate in discriminating Mandarin tones than native listeners, how the performance of Australian English and Burmese listeners would differ from each other may be determined by factors such as the acoustic (dis)similarity between the two Mandarin tones being compared, between Mandarin tones and their L1 categories and perceptual cues that are salient to each group of listeners. For example, based on previous studies (e.g.

Gandour, 1983; Hallé et al. 2004; Huang & Johnson 2010; Lee et al. 1996; So & Best 2010;

Tsukada et al., 2015, 2016; Tsukada & Han, in press; Wang et al., 1999), we predict that both groups of non-native listeners would be inaccurate in their responses to T2-T3 (acoustically

similar) and accurate in their responses to T1-T3, T2-T4 and T3-T4 (acoustically dissimilar).

T1-T2 and T1-T4 may show between-group differences depending on how the two tones are assimilated to L1 Australian English or L1 Burmese categories.

The novel aspect of the present study is the inclusion of native speakers of Burmese as they process non-native Mandarin tones. To our knowledge, perception of unfamiliar, non-native tones by native speakers of Burmese has not been previously reported in readily accessible publications. Thus, we can gain valuable insights into how listeners from an underexplored tonal language which differs qualitatively from Mandarin may process non-native tone categories. Specifically, we were interested in determining if functional use of phonation in

Burmese may affect L1 Burmese listeners’ perception of unknown Mandarin tones.

Furthermore, information on how Australian English speakers process/acquire Mandarin tone is limited (see Kiriloff, 1969 for a notable exception) because most previous studies on this topic examined speakers of other varieties of English (i.e. American or British English).

Another novelty of this research is the use of a discrimination task which facilitates testing of listeners without any knowledge of Mandarin. Previous studies typically assessed listeners’ categorical perception and/or identification (Li et al., 2017 for T1-T4; Shen & Lin, 1991 for

T2-T3; Shen & Froud, 2016 for T1-T4 and T2-T3; Hallé et al., 2004 for T1-T2, T2-T4 and

T3-T4). However, such a method presupposes listeners’ familiarity with Mandarin tone categories and so would not be suitable for the naïve listeners in this study. It would be of practical importance if we could simulate the initial state of Mandarin tone processing and determine which tone pairs are most or least distinguishable for listeners from tonal

(Burmese) and non-tonal (Australian English) L1 backgrounds. The results will have implications for the role of previous tonal experience in cross-language speech perception and will be of use to foreign language pedagogy as well as tone perception/acquisition.

2. Methods

We assessed Mandarin tone discrimination accuracy by three (Australian English, Burmese,

Mandarin) groups of listeners via the four-alternative forced-choice discrimination task described in 2.3 Procedures.

2.1 Speakers and speech materials

The experimental stimuli and procedures were identical to those used in previous research

(Tsukada et al., 2015, 2016; Tsukada & Han, in press). Eight (4 males, 4 females) native

Mandarin speakers were recruited from the undergraduate student population at Macquarie

University in Sydney. They all received primary and secondary education in standard

Mandarin prior to arriving in Australia as young adults, and identified themselves as native speakers of Mandarin. The speakers were recorded in a sound-attenuated studio on the university campus under the supervision of a Mandarin-English bilingual experimenter, and were compensated for their participation. Figure 2 shows the mean F0 contours of four lexical tones by the eight native Mandarin speakers averaged across 7 words (Table 1). The

F0 contours are generally in good agreement with comparable data from previous studies (So

& Best, 2010; Wang et al., 1999; Xu, 1997).

Figure 2 about here

For the recording, the speakers uttered a total of 76 monosyllabic words, including the 28 test words shown in Table 1. The words were presented on a computer screen one word at a time in random order, and the speakers produced each word twice in isolation and once in a short carrier sentence (我读______这个字 wǒ dú ___ zhè ge zì “I read the word ___”).

Table 1 about here

The recorded speech materials were digitized at 44.1 kHz and the target words were segmented and stored in separate files. The tokens produced in isolation were used as the stimuli for this study. The stimuli presented to listeners were seven CV (where

C=/p, t, m/ and V=/i, a, u/) across all four Mandarin tones. All materials were transcribed in

Chinese characters along with pinyin (the Romanized spelling system of Chinese characters with tone symbols indicated by diacritics) to minimize any ambiguity of pronunciation. The pace of presentation to listeners was controlled by the experimenter. The listeners’ tone discrimination accuracy was assessed in a categorial discrimination test (CDT) with a four- alternative forced-choice oddity task, as described in the Procedures section.

2.2 Participants

Three groups of listeners participated in this study. The first group included 16 (6 males, 10 females) native Burmese-speaking listeners with a mean age of 35.2 years (sd = 9.5). They were born in Myanmar and lived there until they came to Japan as adults. Their average length of residence in Japan was 4.0 years. They were all naïve to Mandarin (i.e. had no knowledge of Mandarin) nor had any extensive training in music. They were either undergraduate or postgraduate students at Waseda University in Japan or members of the local Tokyo community. The second group included ten (1 male, 9 females) functionally monolingual speakers of Australian English with a mean age of 30.5 years (sd = 13.6) who were also all naïve to Mandarin with no extensive training in music. All of them participated in the study in Australia. The third group included ten (2 males, 8 females) college-educated native Mandarin speakers (mean = 25.4 years, sd = 4.3), none of whom participated in the recording sessions. They participated in the study in Australia (n = 5), Japan (n = 1) or

Singapore (n = 4) according to their place of residence and availability. The second and third groups were included as controls. All the participants were tested in a sound-attenuated booth or quiet classroom on the university campus between 2013 and 2016 and received monetary compensation for their participation. All participants reported having normal hearing and no language impairment in their L1s.

2.3 Procedures

This study used a categorial discrimination test (CDT) with a four-alternative forced-choice oddity task, which was employed in previous L2 speech research (e.g. Flege, 2003; Flege et al., 1999, Wayland & Guion, 2003, 2004). This task does not require lexical access, and it is suitable for examining phonetic processes used in cross-language speech perception by participants who have no prior experience with the target language. As described in Wayland and Guion (2003: 118), this is ‘a version of ABX discrimination task’ and ‘is designed to minimize response bias (guessing)’. A high level of performance in this task would require not only the use of purely auditory information but also the formation of phonetic categories for one or both sounds in any sound pair. The listeners were tested individually in a session lasting approximately 45 to 60 minutes. The presentation of the stimuli and the collection of perception data were controlled by UAB software (University of Alabama at Birmingham;

Smith, 1997). The listeners responded to the stimuli on a notebook computer at a self-selected comfortable level using high-quality headphones.

The stimuli, which consisted of monosyllabic CV words differing in lexical tones, were presented in triads and the listeners were given four (‘1’, ‘2’, ‘3’, ‘NO’) response categories.

The following six combinations of tones were tested: T1-T2, T1-T3, T1-T4, T2-T3, T2-T4,

T3-T4. Each of these six pairs was tested by change and no-change (catch) trials. The three

“word” tokens in all trials were spoken by three different speakers, so even in no-change

trials the three presented sounds were always physically different, because this was considered a better measure of listeners’ perceptual capabilities in real world situations

(Strange & Shafer, 2008).

The listeners were asked to decide if there was any “odd word” that was different from the other two. The change trials always contained an odd item. For example, a change trial testing the T1-T2 pair might consist of ‘mā2’-‘mā1’-‘má3’ (where the subscripts indicate different speakers). The correct response for change trials was the button (‘1’, ‘2’, or ‘3’) indicating the position of the odd item, which occurred with equal frequency in all three possible positions. The position of the “odd word” was not fixed, which increased task uncertainty. The change trials tested the participants’ ability to respond appropriately to relevant phonetic differences between tokens and to distinguish tones drawn from two different categories.

The correct response to no-change trials, which contained three different instances of a single category (e.g. /dǐ/3 /dǐ/1 /dǐ/2 or /bà/1 /bà /3 /bà/2), was a fourth button marked ‘NO’. The no- change trials tested the participants’ ability to ignore audible but phonetically irrelevant within-category variation (in e.g. voice quality). The listeners were required to respond to each trial, and were told to guess if uncertain. A trial could be replayed as many times as the listener wished, but responses could not be changed once given. The inter-stimulus interval in all trials was 0.5 s.

A total of 360 trials were presented in three blocks of 120 trials. A different randomization was used for each block. The first eight trials in each block were for practice and were not analyzed. The resulting 336 (3 blocks x 112) trials consisted of 252 change trials testing six pairs (42 trials each for T1-T2, T1-T3, T1-T4, T2-T3, T2-T4, T3-T4) and 84 no-change trials

(21 trials each for T1, T2, T3, T4). In selecting the stimuli, care was taken so that tokens by

each of the eight speakers would be distributed as evenly as possible. The experimental session was self-paced and the listeners could take a break after each block if they wished.

Responses to the change and no-change trials were used to calculate A-prime (A') scores

(Snodgrass et al., 1985), which are an index of discrimination accuracy. These scores were based on the proportion of ‘hits (Hs)’ obtained for each tone pair and the proportion of ‘false alarms (FAs)’. If the proportion of Hs equalled the proportion of FAs, then A' was set to 0.5.

If H exceeded FA, then A' = 0.5+((H–FA)∗(1+H–FA))/((4∗H)∗(1–FA)). However, if FA exceeded H, then A' = 0.5–((FA–H)∗(1+FA–H))/((4∗FA)∗(1–H)). An A' score of 1 indicated perfect sensitivity, whereas an A' score of 0.5 or lower indicated a lack of sensitivity.

3. Results

3.1 Overall between-group differences

Figures 3 and 4 show the distribution of discrimination scores (A') as a function of group and tone pair, respectively. The A' scores for each tone pair for each group are shown in Table 2.

The mean A' scores averaged across the six tone pairs were 0.72, 0.65 and 0.98 for Australian

English, Burmese and Mandarin groups, respectively. Not surprisingly, the Mandarin listeners who were included as controls showed a more accurate level of discrimination than the two non-native groups (Table 2). However, it is notable that the Australian English group had a higher mean A' score than the Burmese group, despite the lack of any L1 lexical tones.

The pattern of between-group difference depended on the tone pair.

Figure 3 about here

Figure 4 about here

Table 2 about here

A two-way ANOVA with Group (Australian English, Burmese, Mandarin) as the between- subjects factor and Tone Pair (T1-T2, T1-T3, T1-T4, T2-T3, T2-T4, T3-T4) as the within- subjects factor was run to explore differences between the three groups of listeners. Both

Group (G) and Tone Pair (T) showed significant effects, as well as showing a two-way interaction [G: F(2, 33) = 24.8, p < .05, T: F(5, 165) = 34.9, p < .05, G x T: F(10, 165) =

14.3, p < .05]. Subsequent Welch’s F-tests with Group as the independent variable showed significant differences for all six tone pairs. Table 3 shows the results of Welch’s F-tests and

Dunnett's Modified Tukey-Kramer pairwise multiple comparison post hoc tests. The

Mandarin group was significantly more accurate than the two non-native groups in discriminating all six tone pairs. The Australian English and Burmese groups did not significantly differ from each other for T1-T2, T1-T4, T2-T4 and T3-T4. However, the

Australian English group outperformed the Burmese group for T1-T3 and T2-T3 with substantial mean differences in A' scores (0.21 for T1-T32 and 0.20 for T2-T3, respectively).

Table 3 about here

As mentioned above, the Mandarin group was highly accurate in their tone discrimination for all six pairs (0.97-0.99). Welch’s F-test showed no significant difference in Tone Pair variables for the Mandarin group, but there were significant differences for the non-native groups [Australian English: F(5, 24.6) = 9.1, p < .001, Burmese: F(5, 41.6) = 15.0, p < .001].

Post hoc analyses for the non-native groups are presented in the next section.

2 An anonymous reviewer pointed out that “it is peculiar why the Burmese listeners could not exploit the presence of a in Mandarin tone 3 and perform better than the Australian listeners on T1-T3 contrast in their perception if this phonation type is encoded in their tone system”. As the creaky tone has a falling contour in Burmese (Figure 1), native Burmese listeners may associate a creaky voice with the falling contour only (not with T3) and benefit from L1 knowledge only when a creaky voice occurs with T4 (not with other tones).

3.2 Effect of tone pair for Australian English and Burmese groups

Figure 5 shows the distribution of discrimination scores (A') by the two groups of non-native listeners as a function of tone pair. Table 2 and Figure 5 show that except for T1-T4 and T2-

T4, the Australian English group had higher mean and median A' scores than the Burmese group despite the lack of L1 tones for either language group. One-sample t-tests showed that the Australian English and Burmese listeners’ A' scores did not differ significantly from 0.5

(i.e. threshold of sensitivity) for T1-T2 (t(9) = 1.9, ns) for the former and T1-T2 and T2-T3 pairs (t(15) = -1.0 – 0.2, ns) for the latter, respectively.

Figure 5 about here

Table 4 about here

As stated above, Welch’s F-tests with Tone Pair as the independent variable reached significance only for the Australian English and Burmese groups. While the highest mean A' score for both non-native groups was obtained for T3-T4 (0.89 for the Australian English group and 0.80 for the Burmese group), the two non-native groups differed in the accuracy with which the rest of the tone pairs were discriminated. Table 4 shows the results of Welch’s

F-tests and Dunnett's Modified Tukey-Kramer pairwise multiple comparison post hoc tests.

The Australian English group discriminated T3-T4 significantly more accurately than T1-T2,

T1-T4 and T2-T3. In contrast, the Burmese group discriminated T1-T4, T2-T4 and T3-T4 significantly more accurately than T2-T3. In addition, they discriminated T2-T4 and T3-T4 significantly more accurately than T1-T2.

4. Discussion and conclusions

We examined the discrimination accuracy of Mandarin lexical tones by three groups of listeners: Australian English, Burmese and Mandarin. Neither of the two non-native groups had previous experience of Mandarin, but the Burmese listeners use lexical tones contrastively in their L1. While the Mandarin listeners were consistently more accurate than the Australian English and Burmese listeners in discriminating all six tone pairs in their L1

Mandarin, the perception by the non-native listeners varied substantially between different tone pairs. Specifically, the Australian English and Burmese listeners discriminated T2-T3 and/or T1-T2 poorly, and T3-T4 accurately.

The Burmese listeners did not appear to benefit from their L1 tonal experience and were significantly less accurate than the Australian English listeners for T1-T3 and T2-T3.

Although the Burmese listeners may be experienced in interpreting F0 information linguistically in L1, reinterpreting unfamiliar non-native tones may be even more confusing and challenging for them than it might be for the Australian English listeners who are free from the preconceived notion of lexical tones.

In the case of the Burmese listeners, the difference between how lexical tones are phonetically realized in their L1 and in Mandarin may have made the cross-linguistic tone perception difficult. As described in the Introduction, Burmese is known to use phonetic features other than F0 substantially, such as phonation type and duration (e.g. Brunelle &

Kirby, 2016; Thein Tun, 1982; Watkins, 2001; Wheatley, 2009). Therefore, it is possible that the Burmese listeners in this study were not accustomed to devoting their attention to variations in F0, which may be required for the processing of Mandarin lexical tones. If that is true, it would be interesting to examine, cross-sectionally and longitudinally, how Burmese learners of Mandarin may shift their cross-language perception strategies. The results of such research would have implications for L2 pedagogy and plasticity of speech perception.

Despite being poorer at Mandarin tone discrimination at this stage, greater success may be expected for the Burmese listeners at a later stage in the language learning process relative to the English listeners (Cooper & Wang, 2012). In this connection, data from another Burmese participant with extensive (more than 20 years) Mandarin learning experience are highly suggestive. Her data were not included in the present study, but her results (T1-T2: 0.95, T1-

T3: 0.99, T1-T4: 0.96, T2-T3: 0.90, T2-T4: 0.99, T3-T4: 0.99) were very close to those of the native Mandarin listeners. Of course, we cannot exclude the possibility that she was simply an exceptional learner, but since native Burmese listeners have experience of using pitch variations in a lexically contrastive manner, the results might generalize to other learners.

It is worth pointing out that both T1-T2 and T2-T3 posed difficulty to the non-native listeners

(0.51 and 0.47 for the Burmese listeners and 0.62 and 0.66 for the Australian English listeners, respectively). Typically, T2-T3 is known to be highly confusing for listeners from diverse L1 backgrounds (e.g. Chuang et al., 1975; Hao, 2018; Kiriloff, 1969; Shen & Lin,

1991; Tsukada et al., 2015, 2016; Tsukada & Han, in press; Wong et al., 2005; Wong, 2013).

Frequent bi-directional T1-T2 confusion was reported in a recent study by Gao (2016) who examined 16 Swedish-speaking learners of Mandarin at two different high schools. These young Swedish learners identified T3 most accurately (83.8%) and T2 least accurately

(44.4%).

In our current study, it is possible that T1 and T2 sounded similar to both Australian English and Burmese listeners, but for different reasons. As Gandour (1983) found for the English listeners, the Australian English listeners in this study may have attached more importance to the pitch at the end of a tone and perceived T1 (high level) and T2 (rising) to be similar based on the local acoustic similarity. The Burmese listeners, on the other hand, may be accustomed to paying more attention to F0 contour than to average F0, as has been reported for other tonal language listeners (e.g. Mandarin, Thai). For the Burmese listeners, T1-T2 may be

difficult not just because of a lack of a level tone in their L1, but also because T1 is not sufficiently distinct from the existing L1 tones. This would be characterized as a Single

Category assimilation within the PAM framework.

In the Introduction, we raised the possibility that the lack of the high-level tone in L1

Burmese which could facilitate the formation of a “new” L2 category (Flege,

1995, 2007), may result in accurate discrimination of tone pairs involving T1. Our results did not support this prediction. A high-level tone (T1) in natural (as opposed to synthetic) speech may present perception problems, especially when produced by multiple speakers with different voice characteristics (e.g. high vs low pitched voice), because level tones may be less informative as a reference or less clearly defined than contour (dynamic) tones in the tonal system (Khouw & Ciocca 2007). In another study, inexperienced (first-year) learners identified T1 more poorly than the other tones in questions (Yang & Chan, 2010). T1 was misidentified as T4 (25%) and T4 was misidentified as T1 (about 20%) in statements by first- year learners. However, T1 was no longer problematic for more experienced (second-year and advanced) learners.

Our data demonstrate that possessing lexical tones in L1 may not necessarily facilitate the perception of these tones in unfamiliar languages. It suggests that active use of phonation type in L1 phonology may have affected the efficiency of non-native tone processing for the native Burmese listeners. While our findings are consistent with those of previous studies

(e.g. Francis et al., 2008; So & Best, 2010; Wang, 2013) in which listeners from tonal language backgrounds did not show any advantage in their cross-language tone perception compared with listeners from non-tonal language backgrounds, there are other studies that have reported a positive influence of having tone language experience (e.g. Li, 2016). This split in the literature may be partially related to the methodological characteristics as well as the linguistic environments/demands facing the participants.

Although the two non-native groups in this study were equally naïve to Mandarin, in addition to the presence vs absence of lexical tones in their L1, their surrounding linguistic environments also differed substantially. The reason is that while the Australian English listeners participated in the study in a familiar environment in their home country, the

Burmese participants took part in the study in Japan. At the time of testing, they had been removed from living in a Burmese-speaking environment for more than 4 years on average.

They did have opportunities to use Burmese regularly with family members and/or Burmese friends, but because most of them were not sufficiently proficient in Japanese, they used

English at work or university with non-Burmese colleagues or fellow students. Therefore, processing unknown Mandarin tones in this type of complex multilingual environment may be very different from being engaged in the same task in a familiar home environment. The same Burmese listeners may have found the task simpler and performed better if they had participated in the study in their home country where the linguistic demands are less.

One of the limitations of the current study is the relatively small number of listeners, which limits the possibility of drawing strong general conclusions from the results. However, we are confident that the between-group differences which we uncovered are real and that they provide rare and valuable insights into the Burmese listeners’ cross-linguistic speech perception. We hope this preliminary study will promote further investigations of the dynamic nature of spoken language processing.

Acknowledgements

This research was supported by the 11th Hakuho Foundation Japanese Research Fellowship

(2016-2017) and the 2018 Endeavour Research Fellowship awarded to the first author. We thank the Associate Editor and two anonymous reviewers for their thoughtful comments and

suggestions, Yue Sun and Yumi Ozaki for providing research assistance, and participants for making the study possible.

References

Best, C. T. (1995). A direct realist perspective on cross-language speech perception. In W.

Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 171-204). Timonium, MD: York Press.

Blicher, D. L., Diehl, R. L., & Cohen, L. B. (1990). Effects of duration on the perception of the Mandarin tone 2/tone 3 distinction: Evidence of auditory enhancement.

Journal of Phonetics, 18, 37-49.

Bradley, D. (1982). Register in Burmese. In D. Bradley (Ed.), Papers in South-East Asian linguistics. No. 8: Tonation (pp. 117-132). Canberra: Australian National University.

Brunelle, M. & Kirby, J. (2016). Tone and phonation in Southeast Asian languages.

Language and Linguistics Compass, 10, 191-207.

Burnham, D., Kasisopa, B., Reid, A., Luksaneeyanawin, S., Lacerda, F., Attina, V., Xu

Rattanasone, N., Schwarz, I.-C., & Webster, D. (2015). Universality and language-specific experience in the perception of lexical tone and pitch. Applied Psycholinguistics, 36, 1459-

1491.

Chang, C. B. (2009). English loanword adaptation in Burmese. Journal of the Southeast

Asian Linguistics Society, 1, 77-94.

Chang, C. B. & Yao, Y. (2016). Toward an understanding of heritage prosody: Acoustic and perceptual properties of tone produced by heritage, native, and second language speakers of

Mandarin. Heritage Language Journal, 13, 134-160.

Chuang, C. K., Hiki, S., Sone, T., & Nimura, T. (1972). The acoustical features and perceptual cues of the four tones of standard colloquial Chinese. Proceedings of the 7th

International Congress on Acoustics, 297-300.

Chuang, C. K., Hiki, S., Sone, T., & Nimura, T. (1975). Acoustical features of the four tones in monosyllabic utterances of standard Chinese. The Journal of the Acoustical Society of

Japan, 31, 369-380.

Cooper, A. & Wang, Y. (2012). The influence of linguistic and musical experience on

Cantonese word learning. The Journal of the Acoustical Society of America, 131, 4756-4769.

Ding, H., Hoffmann, R., & Jokisch, O. (2011). An investigation of tone perception and production in German learners of Mandarin. Archives of Acoustics, 36, 509-518.

Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In W.

Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233-277). Timonium, MD: York Press.

Flege, J. E. (2003). Methods for assessing the perception of in a second language. In

E. Fava & A. Mioni (Eds.), Issues in clinical linguistics (pp. 19-44). Padova: UniPress.

Flege, J. E. (2007). Language contact in bilingualism: Phonetic system interactions. In J. Cole

& J. I. Hualde (Eds.), Laboratory Phonology (pp. 353-382). Berlin: Mouton de Gruyter.

Flege, J. E., MacKay, I. R. A., & Meador, D. (1999). Native Italian speakers’ perception and production of English vowels. The Journal of the Acoustical Society of America, 106, 2973-

2987.

Francis, A., Ciocca, V., Ma, L., & Fenn, K. (2008). Perceptual learning of Cantonese lexical tones by tone and non-tone language speakers. Journal of Phonetics, 36, 268-294.

Gandour, J. T. (1978). The perception of tone. In V. A. Fromkin (Ed.), Tone: A linguistic survey (pp. 41-76). New York: Academic Press.

Gandour, J. T. (1983). Tone perception in Far Eastern languages. Journal of Phonetics, 11,

149-175.

Gao, M. (2016). Perception of lexical tones by Swedish learners of Mandarin. Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for

Language Acquisition at SLTC 2016. Linköping Electronic Conference Proceedings 130: 33-

40.

Gårding, E., Kratochvil, P., Svantesson, J.-O., & Zhang, J. (1986). Tone 3 and tone 4 discrimination in modern standard Chinese. Language and Speech, 29, 281-293.

Hallé, P., Chang, Y.-C., & Best, C. T. (2004). Identification and discrimination of Mandarin

Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32, 395-421.

Hao, Y.-C. (2012). Second language acquisition of Mandarin Chinese tones by tonal and non- tonal language speakers. Journal of Phonetics, 40, 269-279.

Hao, Y.-C. (2018). Second language perception of Mandarin vowels and tones. Language and Speech, 61, 135-52.

Hiki, S., Imaizumi, K., Sunaoka, K., & Yang, L. (2004). Teaching tonal discrimination based on statistical properties and acoustical characteristics of the Chinese four tones: With regard to the contrast between tone-2 and tone-3. Proceedings of the IWLeL 2004: An interactive workshop on language e-learning, 53-62.

Huang, T. & Johnson, K. (2010). Language specificity in speech perception: Perception of

Mandarin tones by native and non-native listeners. Phonetica, 67, 243-267.

Khouw, E. & Ciocca, V. (2007). Perceptual correlates of Cantonese tones. Journal of

Phonetics, 35, 104-117.

Kiriloff, C. (1969). On the auditory perception of tones in Mandarin. Phonetica, 20, 63-67.

Leather, J. (1987). F0 pattern inference in the perceptual acquisition of second language tone.

In A. James & J. Leather (Eds), Sound patterns in second language acquisition (pp. 59-81).

Dordrecht: Foris.

Lee, C.-Y. & Hung, T.-H. (2008). Identification of Mandarin tones by English-speaking musicians and nonmusicians. The Journal of the Acoustical Society of America, 124, 3235-

3248.

Lee, Y.-S., Vakoch, D. A., & Wurm, L. H. (1996). Tone perception in Cantonese and

Mandarin: A cross-linguistic comparison. Journal of Psycholinguistic Research, 25, 527-542.

Li, B., Shao J., & Bao, M. (2017). Effects of phonetic similarity in the identification of

Mandarin tones. Journal of Psycholinguistic Research, 46, 107-124.

Li, M. & DeKeyser, R. (2017). Perception practice, production practice, and musical ability in L2 Mandarin tone-word learning. Studies in Second Language Acquisition, 39, 593-620.

Li, Y. (2016). English and Thai speakers’ perception of Mandarin tones. English Language

Teaching, 9, 122-132.

Lin, W. C. J. (1985). Teaching Mandarin tones to adult English speakers: Analysis of difficulties with suggested remedies. RELC Journal, 16, 31-47.

Liu, S. & Samuel, A. G. (2004). Perception of Mandarin lexical tones when F0 information is neutralized. Language and Speech, 47, 109-138.

Miracle, W. C. (1989). Tone production of American students of Chinese: A preliminary acoustic study. Journal of the Chinese Language Teachers Association, 24, 49-65.

Pelzl, E., Lau, E. F., Guo, T., & DeKeyser, R. (in press). Advanced second language learners’ perception of lexical tone contrasts. Studies in Second Language Acquisition.

DOI:10.1017/S0272263117000444.

Peng, G., Zheng, H.-Y., Gong, T., Yang, R.-X., Kong, J.-P., & Wang, W. S.-Y. (2010). The influence of language experience on categorical perception of pitch contours. Journal of

Phonetics, 38, 616–624.

Reid, A., Burnham, D., Kasisopa, B., Reilly, R., Attina, V., Xu Rattanasone, N., & Best, C.

T. (2015). Perceptual assimilation of lexical tone: The roles of language experience and visual information. Attention, Perception & Psychophysics, 77, 571–591.

Shen, G. & Froud, K. (2016). Categorical perception of lexical tones by English learners of

Mandarin Chinese. Journal of the Acoustical Society of America, 140, 4396-4403.

Smith, S. (1997). User manual for UAB software. University of Alabama at Birmingham

(UAB), Department of Rehabilitation Sciences.

Snodgrass, J. G., Levy-Berger, G., & Haydon, M. (1985). Human experimental psychology.

New York: Oxford University Press.

So, C. K. & Best, C. T. (2010). Cross-language perception of non-native tonal contrasts:

Effects of native phonological and phonetic influences. Language and Speech, 53, 273-293.

Strange, W. (1995). Speech perception and linguistic experience: Issues in cross-language research. Timonium, MD: York Press.

Strange, W. & Shafer, V. L. (2008). Speech perception in second language learners: The re- education of selective perception. In J. G. Hansen Edwards & M. L. Zampini (Eds.),

Phonology and second language acquisition (pp. 153-191). Amsterdam: John Benjamins.

Thein Tun, U. (1982). Some acoustic properties of tones in Burmese. In D. Bradley (Ed.),

Papers in South-East Asian linguistics. No. 8: Tonation (pp. 77-116). Canberra: Australian

National University.

Tsukada, K. & Han, J.-I. (in press). The perception of Mandarin lexical tones by native

Korean speakers differing in their experience with Mandarin. Second Language Research.

DOI: 10.1177/0267658318775155.

Tsukada, K., Kondo, M., & Sunaoka, K. (2016). The perception of Mandarin lexical tones by native Japanese adult listeners with and without Mandarin learning experience. Journal of

Second Language Pronunciation, 2, 225-252.

Tsukada, K., Xu, H. L., & Xu Rattanasone, N. (2015). The perception of Mandarin lexical tones by listeners from different linguistic backgrounds. Chinese as a Second Language

Research, 41, 141-161.

Wang, W. S-Y. & Li, K-P. (1967). Tone 3 in Pekinese. Journal of Speech and Hearing

Research, 10, 629-636.

Wang, X. (2013). Perception of Mandarin tones: The effect of L1 background and training.

The Modern Language Journal, 97, 144-160.

Wang, Y., Jongman, A., & Sereno, J. A. (2006). L2 acquisition and processing of Mandarin tones. In P. Li, E. Bates, L. H. Tan & O. Tseng (Eds.), The handbook of East Asian psycholinguistics (pp. 250-256). Cambridge: Cambridge University Press.

Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106, 3649-

3658.

Watkins, J. W. (2001). Illustrations of the IPA: Burmese. Journal of the International

Phonetic Association, 31, 291-295.

Watkins, J. W. (2003). Tone and intonation in Burmese. Proceedings of the 15th International

Congress of Phonetic Sciences, 1289-1292.

Wayland, R. & Guion, S. (2003). Perceptual discrimination of Thai tones by naïve and experienced learners of Thai. Applied Psycholinguistics, 24, 113-129.

Wayland, R. P. & Guion, S. (2004). Training English and Chinese listeners to perceive Thai tones: A preliminary report. Language Learning, 54, 681-712.

Whalen & Yu, (1992). Information for Mandarin tones in the amplitude contour and in brief segments. Phonetica, 49, 25-47.

Wheatley, J. K. (2009). Burmese. In B. Comrie (Ed.), The world’s major languages (pp. 724-

740). London: Routledge.

Wu, X., Munro, M. J., & Wang, Y. (2014). Tone assimilation by Mandarin and Thai listeners with and without L2 experience. Journal of Phonetics, 46, 86-100.

Wu, X., Tu, J.-Y., & Wang, Y. (2012). Native and nonnative processing of Japanese pitch accent. Applied Psycholinguistics, 33, 623-641.

Xu, Y. Contextual tonal variation in Mandarin. Journal of Phonetics, 25, 61-83.

Yang, C. & Chan, M. (2010). The perception of Mandarin Chinese tones and intonation by

American learners. Journal of the Chinese Language Teachers Association, 45, 7-36.

Yang, R.-X. (2015). The role of phonation cues in Mandarin tonal perception. Journal of

Chinese Linguistics, 43, 453-472.

Yip, M. (2002). Tone. New York: Cambridge University Press.

Zhang, H. (2016). Dissimilation in the second language acquisition of Mandarin Chinese tones. Second Language Research, 32, 427-451.

Zhang, X., Samuel, A. G., & Liu, S. (2012). The perception and representation of segmental and prosodic Mandarin contrasts in native speakers of Cantonese. Journal of Memory and

Language, 66, 438-457.

Zhao, T. C. & Kuhl, P. K. (2015). Effect of musical experience on learning lexical tone categories. The Journal of the Acoustical Society of America, 137, 1452-1463.