TONAL PRODUCTION AND PERCEPTION PATTERNS OF CANADIAN RAISED SPEAKERS

Kwok Lai Connie So B.Sc., University of Victoria, 1996

THESIS SUBMi'iTED IN PARTIAL FüLFiLLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF ARTS

in the Department of Linguistics

O Kwok Lai Connie So 2000 SIMON FRASER UNIVERSITY July 2000

Al1 rights reserved. This work may not be reproduced in whole or in part. by photocopy or other means, without permission of the author- uigitions and Acquisitions el "BBib iographic Senrices senricss bibliographiques

The author has granted a non- L'auteur a accordé une Licence non exclusive licence dowhgthe exclusive permettant A la Nationai Llbrary of Canada to BiblioWque nationale du Canada de reproduce, loan, distniute or seU reproduire, prêter, distribuer ou copies of this thesis in rnicroform, vendre des copies de cette thèse sous paper or electronic fonnats. h forme de microfichelfilm, de reproduction sur papier ou sur format électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fiom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle- ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract

This thesis examines the tonal systems of two groups of

Canadian Raised Cantonese speakers (CRCs) in Vancouver, British

Columbia, from the perspectives of the production and perception of Cantonese tones. Thirty participants, aged from 18 to 24 years, were recruited as both speakers and Iisteners for production and perception experiments. They were divided into three groups. The first group consisted of ten Canadian Raised Cantonese speakers, who had immigrated to Canada in their teens (TCRC). The second group consisted of ten Canadian Raised Cantonese speakers, who were either bom in Canada or had moved to Canada as young children (YCRC). The third group (the cornparison group) was made up of ten native (Hong Kong) Cantonese speakers (NCAN), who moved to Canada less than two years at the time of recordings.

Twelve Chinese words from root-words, /si/ and //, associated with the six lexical tones in Cantonese, were used as target words as well as listening stimuli for the production and perception experiments respectively.

6.. Ill In the production experiment, two acoustic correlates of tones, vowel duration and fundamental frequency (Fo), were examined from 3547 samples. With regard to duration, the CRC speakers maintain relative durational patterns among the Cantonese tones.

The analysis of fundamental frequency resulted in two important findings. First, the tonal patterns of the YCRC group deviated to a greater degree than those of the TCRC group. Second, tonal reductions of the two CRC groups appear to be constrained by certain hierarchies.

In the perception experiment, the thirty participants identified the six lexical tones associated with the target words in two sets of stimuli, /si/ and /fut. Forty-eight stimuli in citation forms, produced by four experienced Cantonese language instructors, were presented to the listeners. The results were consistent with those of

the production experiment.

The findings of this thesis reveal two important implications.

First, there exists a correlation between tonal reduction and the

degree of mastery of tones. Second, an inverse relationship can be recognized between the acquisition of and the hiecarchical order apparent in the tonal reduction patterns. To rny beloved parents with deep appreciation for titeir endless love and support w Acknowledgments

Hallelujah, thank God for guiding me through al1 the difficulties and fmstrations, so that my thesis could become a reality .

1 must thank my supervisor Dr. Zita McRobbie for her encouragement throughout my years of graduate study, and for her thoughtful and valuable comments on this thesis. Special thanks also go to Dr. Murray Munro for his insightful and critical comments on this thesis; and to Dr. Edwin Pulleyblank for his invaluable comments on the Cantonese materials in this thesis.

1 should express my thankfulness to the Department of

Linguistics for providing me with financial support in the forms of teaching assistantship and university fellowship during my academic years. 1 would also Like to thank Sheilagh, Georgina, Rita, Carol,

Gladys, and Grace for their secretarial assistance.

Special thank goes to Renee McCallum for her friendship and for her important contribution to the editing and proof-reading of my thesis. My gratitude also goes to Dr. Teresa Yu for helping me to get assistance €rom Cantonese language instructors at Simon Fraser

University. My sincere gratefulness also goes to Dr. Billy Ng, Jenny Tse, and Alison Winters (from Sm), Salina Leung and Mr. Lo (from the Chinese Culture Center in Vancouver), as well as al1 the participants in this thesis study. 1 sincerely thank them al1 for their contxibution to my thesis.

1 must mention my gratitude to Dr. Wyn Roberts for his consistent and kind encouragement and generous offerings of tea and coffee during the "cuppa-time" throughout the years of my studies. 1 also thank my fellow graduate students in the Linguistics

Department for their support and friendship.

Finaily, my deepest gratitude goes to my parents and family members for their support; and to my forever friends: Herman,

Derek, Ken, Andersen, Esther, May, Karen, Memory, Pen, Queenie, and Anita for their endIess support, encouragement, and friendships throughout the years. Table of Contents litle page ...... i Apvalpage ...... ii Abstract ...... iii Dedication ...... v i Acknowledgements ...... vii Table of Contents ...... ix Lis@of Tables ...... xii Lists of Figures ...... xiv

CHAPTER ONE: INTRODUCTION ...... 1

1 .1. Tone Language ...... 1.1.1. Lexical Tones 1.1.2. Production of Tones 1.l. 3. Perception of Tones 1 .1.4. Representation of Tones: Chao's Tone Letters

1.2. Hong Kong Cantonese ...... 1 -2.1. Structure 1.2.2. Tonal System 1.2.3. Factors Affecting Tonal Contours

1.3. Literature Review of Cantonese Studies ...... 1.3.1. Production studies 1.3.2. Perception studies

1.4. Purpose of the Present Study ...... 1.4.1. Limitation in Fo Analyses 1-4.2. Different Bilingual Groups: CRC Speakers vs. Hong Kong Speakers 1.4.3. The Outline of the Present Study

CHAPTER TWO: TONAL PRODUCTION EXPERIMENT ...... 47 2.1. Methd ...... 52 2.1.1. Participants 2.1.2. Materials 2.1 -3. Description of the Experiment 2.1.4. Analyses for Two Acoustic Comlates of Tones 2.1 3. Measurements 2.1 .S. 1. Vowei Duration Measurement 2.1.5.2. Fundamental Frequency (Fo) Measurement

2 .2. Vowel Durations ...... 67 2.2.1. Measurement Results 2.2.2. Discussion 2.2.2.1. Durational Patterns of The CRC Groups 2.2.2.2, Durational Patterns among Lexical Tones in Speaker Groups

2.3. Fundamental Frequency (Fo) ...... 79 2.3.1. Tonal Patterns for the Speaker Groups 2.3.2. Tonal Deviation 2.3.2.1 Analysis of Tond Deviation 2.3.2.2. Unit of Measurements 2.3.2.2.1. Fo Ratios 2.3.2.2.2. Percentage Change in Fo 2.3.3. Results 2.3.3.1 . Tonal Space 2.3.3.2. Spatial Relaiionships for Level Tones 2.3.3.3. Fo Intervals for Tones 2.3.3.4. Rising Tones 2.3.3.4.1. Contour Shapes (in A% ) 2.3.3 A.2. Spatial Relationship 2.3.4. Discussion 2.3.4.1. Reduction Patterns in the Two CRC Groups 2.3.4.2. Fo Ratios 2.4. Conclusion ...... 130

CHAPTER THREE: TONAL PERCEPTION EXPEWENT ...... 132

3 .l. Method ...... 135 3.1.1. Participants 3.1 -2. Matecials: Prepuation of Natural Stimuli 3.1.3. The Pilot Study and the Selected Stimuli 3.1.4. Stimuli Identification Task 3.1 S. Procedures 3.1.6. Analyses 3.2, Results ...... 143 3 -2.1. Identification Test 3.2.2. Confusion Matrices

3.3. Discussion ...... 152 3.3.1. Groups Differences 3.3.2. Level Tone Confusions 3.3.2.1. Tone 3 and Tone 6 3.3.2.2. Tone 6 and Tone 4

3.4. Conclusion ...... 165

CHAPTER FOUR: CONCLUSION ...... 167

4.1. Tonal Patterns Observed in the Two CRC groups .. 167 4.1.1. Findings of the Production Study 4.1.1.1 Vowel duration 4.1.1.2 Fundamental Frequency (Fo) 4.1.2. Findings of the Perception Study 4.1.2.1. Identification Test 4.1.2.2. Confusion Matrices

4 .2. Deviant Tonal Patterns ...... 18 1 4.2.1. Mastery of Tonal System 4.2.2. Hierwchical Orders of Tonal Reductions

4.3. Conclusion ...... 189

4.4. Limitation and Future Research ...... 190

Appendices ...... 192

Appendix A Language Background Questionnaire Appendix B /si/ Reading List Appendix C /fu/ Reading List Appendix D Identification Paradigm in Yiu & Fok (1995) Appendix E /si/ Identification Paradigm in the Present Study Appendix F /€II/ Identification Paradigm in the Present Study

Rfeences ...... 199 List of Tables

C hapter One Table 1-1. Syllable structure of Cantonese ...... 1 4

Table 1-2. Initial consonants in Cantonese .+...... 1 6

Table 1-3. The 53 finals in Cantonese ...... ,, ...... 1 7

Table 1-4. Vowels in Cantonese ...... -. 1 7

Table 1-5. Cantonese Tones ...... 1 8

Chapter Two Table 2-1. The means (in ms) and standard deviations of vowef durations of the lexical tones produced by the three speaker groups ...... 68

Table 2-2, Results of the ANOVA for vowel duration ...... 72

Table 2-3. The ranked mean durarions of the lexical tones for the speaker groups (in ms) ...... 75

Table 2-4. Mean Fo values at the five percentage points and their corresponding values ,.,.... 8 1

Table 2-5* The T-space ratios of the three speaker groups. 10 1

Table 2-6. The mean L-spatial ratios of the level tones of the three speaker groups ...... 103

Table 2-7. Results of the ANOVA for L-spatial ratio ...... 1 0 4

Table 2-8. The mean C-space ratios of the contour tones of the three speaker groups ...... - ...-... 1 O6

Table 2-9. Results of the ANOVA for C-space ratio ...... 107

Table 2-10. Means and standard deviations of percentage change at the four durational range-sections of tone 2 and tone 5 of the speaker groups ...... 1 1 O

xii Table 2-1 1. Results of the ANOVA for rising tones ......

Table 2-12. Summary of contour shapes for tone 2 for the three speaker groups ......

Table 2-13. Summary of contour shapes for tone 5 for the three speaker groups ......

Table 2-14. Surnmary of the spatiai relationship for tone 2 and tone 5 for the three speaker groups ......

Table 2-15. Summary of the results of the statistical analysis of fundamental frequency ......

Table 2-16, The hierarchical orders of the reduced tonal patterns of the CRC speakers .* ......

Chapter Three Table 3-1. The mean correct scores by the speaker groups in the perception test ...... ,...,......

Table 3-2. Confusion matrix for the NCAN group ......

Table 3-3. Confusion matrix for the TCRC group ......

Table 3-4. Confusion matrix for the YCRC group ......

Table 3-5. Summary of the results of the identification test ......

Table 3-6. Summary of the tonal confusions of the three listener groups ......

Chapter Four Table 4-1. Cornparison of hierarchical orders of tonal reduction in the present study and tonal acquisition in Tse's study ...... 18 7

xiii List of Figures

Chapter One Figure 1-1. Tone letters for Cantonese ...... 1 1

Figure 1-2. Relationship of durationai range-sections and their AFo ...... 3 1

Chapter Two Figure 2-la. The mean vowel durations of the three speaker groups ...... 6 9

Figure 2-lb. The mean vowel durations of the three female speaker groups ...... 7 0

Figure 2-lc. The mean vowel durations of the three male speaker groups ...... 7 0

Figure 2-2. The tonal patterns for NCAN female speakers ... 84

Figure 2-3. The tonal patterns for NCAN male speakers ...... 8 5

Figure 2-4, The tonal patterns for TCRC female speakers .... 8 7

Figure 2-5. The tonal patterns for TCRC male speakers ...... 8 7

Figure 2-6. The tonal patterns for YCRC femaie speakers .... 8 9

Figure 2-7. The tonal patterns for YCRC male speakers ...... 8 9

Figure 2-8a. Durational range-sections and percentage change ...... 99

Figure 2-8b. Corresponding durational range-sections and percentage change in bar chart ...... 9 9

Figure 2-9. The four durationai range-sections of tone 2 for the three speaker groups ...... 1 12

Figure 2-10. The four durational range-sections of tone 5 for the three speaker groups ...... 1 12

xiv Chapter Three Figure 3-1. The correct percentages of the three speaker groups in the perception test ...... 145 CHAPTER ONE

INTRODUCTION

The present study investigates the tonal system of two groups of Canadian Raised Cantonese (CRC) speakers1 by way of speech production and perception experiments. In the production study, speech signals, in terms of vowel duration and fundamental frequency, were compared with those of native Hong Kong

Cantonese speakers. In the perception study, identification scores were examined. It was found that (i) the tonal patterns of these two

CRC groups showed various degrees of deviation from those of the

native Hong Kong Cantonese speakers, and that (ii) the tones, which deviated, were subject to certain hierarchical orders (e.g., contour

tones deviate more easily than level tones). Implications of the

findings will be discussed.

This thesis consists of four chapters. This introductory chapter

provides a brief description of characteristics of tone languages,

some background information about Hong Kong Cantonese, and its

tonal system with a review of the relevant literature. In the

'The term Canadian Raised Cantonese (CRC) speakers refers to those speakers whose first language (Ll) is Cantonese, and they were al1 raised or grew up in Canadian English environment. This term will be defined in detail in Chapter Two.

1 concluding section of this chapter, an outline of the present study will be presented. Chapter Two and Three contain detailed descriptions of the production and perception experiments, and the findings are reported, and accompanied by discussions. The concluding chapter (Chapter Four), gives a general discussion of the results.

1.1. TONE LANGUAGE

The majority of the world's languages are tone languages

(Gandour, 1994: 3116). According to Pike (1948: 3), "a tone language may be defined as a language having lexically significant, contrastive, but relative pitch on each syllable". That is to say, words from the same root-word syllable (i.e., with the same sequence of segments) are considered as phonologically distinguished if associated with different pitches (McQueen and

Cutler, 1997: 578). Slightly different uses of pitch will affect the

Lexical or dictionary meaning of a (native) word in a tone language

(Ladefoged, 1993: 253). Pike (1948: 3) suggests that each syllable of a tone language carries at least one significant pitch unit, which is known as a lexical tone. Examples can be seen in Cantonese: the root-word Iji] prodnced with a high level tone means "doctor', but it means "child" if the root-word is produced with a low falling tone. Tone languages aiso include languages chat employ pitch differences in t heir morpholog y to make changes in grammatical

(morphologicai) meaning. For example, the notion of possession in

Igbo (equivalent of English "of*) could be expressed by a high tone

(Ladefoged, 1993: 252). In the phrase meaning, "the jaw of a monkey" [@a &IJW&], the words "jaw" [&$hl and "monkey" [&gwè] both originally occur separately with two Iow tones, but within the phrase, the second syllable of the words "jaw" [hsà] will be replaced by a high tone (ibid.).

1.1.1. Lexical Tones

Lexical tone is a suprasegmental feature, which "is superimposed on segments" (Ioup & Tansomboon, 1987: 342;

Lehiste, 1996: 227). Tone is different from . The former is considered to be phonemic2 and it functions distinctively at word level, while the latter is freely applied at the sentential level or on longer stretches of speech (Ioup & Tansornboon, 1987: 342; Lehiste,

IWO/ 1996). Concerning the development of lexical tones, prev ious studies in Thai (Tuaycharoen, 1977: cited in Li & Thompson, 1978:

274), Mandiirin (Chao, 1951; Li & Thompson, 1977), and Cantonese

Tone is considered phonemic, like other phooemic segments, since it is lexically associared with specific words (Ioup & Tansomboon, 1987: 342).

3 (Tse, 1977; Tse, 1982; Clumeck, 1980: 260) have shown that children are able to control tonal production before the completion of segmental production. Generally, acquisition of tones3 is completed by 2 years of age (ibid.).

Lexical tones can be classified into two major categories, contour and level tones. Contour tones involve (gliding) movements of pitch throughout the time of the vocalic portion of a syllable

(Ladefoged, 1993: 254). Contour tones or dynamic tones can be subdivided into simple and complex contour tones. Rising tones and falling tones are considered simple contour tones, while rising- falling and falling-rising tones are treated as complex contour tones.

Level tones or static tones ideally refer to a single point in the pitch range, and there is no gliding movement throughout the time of the vocalic portion of a syllable (Ladefoged, 1993: 243, 254).

However, previous studies have found that production of level tones

could involve a limited degree of falling movement (Abramson,

1978, 1997 for Thai; Fok-Chen 1974; and Yiu and Fok, 1995; Bauer

& Benedict, 1997: 115 for Cantonese). Level tones can be classified

as high, mid, and low.

A discussion of the acquisition of lexicai tones is not included in the present study. The reason for mentioning it hen will become clear in Chapter Four, where reference to the acquisition of tones will be given in relation to some of the findings in this thesis. A classic example to illustrate the classifications of lexical tones is seen with the root-word, [ma], in ? When the word is pronounced with a tome 1 (high level tone), it means

"mother". However, when it is pronounced with a tone 3 (low falling rising tone), it means "horse"; with a tone 2 (high rising tone), it becomes "hemp;" and with a tone 4 (high falling tone), it is the word

"scold" (Ladefoged, 1993: 255). From this example, we see that tone

1 is the only level tone in the system, while the other three are contour tones, in which, tone 3 is an example of a complex contour tone.

1.1.2. Production of Tones

Fundamental frequency (Fo) and duration are the most prominent acoustic correlates of tone.:' Fo is considered to be the primary acoustic correlate of tone (Fok-Chen, 1974). because the primary physiological correlate of tone is the vibration (i.e., opening

Mandarin tonal system coosists of four lexical tones: tone 1 (high level), tone 2 (high rising), tone 3 (low falling-rising), and tone 4 (high falling). In addition, there is a neutral tone, which assigned to many grammatical words in Mandarin (Mattbews & Yip, 1994: 22). * Intensity may also be considered as an acoustic correlate of tone, It is measured in decibel (dB), and its perceptual correlate is loudness. There exists relatively little research on intensity in relation to tones (Fok-Chen's Cantonese study, 1974; Zee's Taiwanese Study, 1978). The results appear to be predictable, because both their rate of change and their overall average level seern to have a direct celationship Fo (Fok-Chen, 1974; Zee. 1978; Gandour, 1994: 3118). and closing) of the vocal folds. Fo is measured in Hertz (Hz), and its perceptual correlate is pitchP There is a direct relationship between the vibrations of the vocal folds and the obtained values of Fo. The greater the number of vibrations of the vocal folds per second, the higher the Fo value that wiH be obtained. For example, if a Fo value is 180 Hz, it indicates that the vocal folds have opened and closed

180 times within one second. There are two basic mechanisms that produce changes in the rate of vibration of the vocal folds

(Ladefoged, 1993: 251; Lehiste, 1996: 232; Bauer & Benedict, 1997:

110-111). The first one is the changing of the air pressure below the vocal folds (the subglottal pressure). For instance, a high tone will be produced when the subglottal pressure is increased. The second mechanism is related to the tension of the vocal folds. Greater tension of the vocal folds produces a higher pitch. For the majority of Chinese tond studies in phonetics (Howie, 1976; Tseng, 1990;

Fon & Chiang, 1999 for Taiwanese Mandarin; Fok-Chen, 1974; Bauer

& Benedict, 1997; Bauer, 1999, So, 1998 for Cantonese; Zee, 1978 for Taiwanese), Fo is used to describe the contour patterns of tones.

The other acoustic correlate of tone is duration. It refers to time as it is measured in connection with the production of a certain

Although the terms pitch and Fo are different, this study uses the terms interchangeably, following the practice established in the literature (for example, Bauer & Benedict, 1997).

6 tone. In phonetic studies (Kao, 1971; Kong, 1987; Tseng, 1990; among others) the vocalic of a syllable (i.e., the vowel), corresponding to its Fo contour, is measured. Durational values are given in milliseconds (ms), and its perceptual correlate is .

The duration of a vowel varies depending on the tone. For example, in acoustic studies of Thai (Abramson, 1962), Taiwanese Mandarin

(Tseng, 1%O), Taiwanese (Zee, 1978), and Cantonese (Kao, 197 1;

Kong, 1987), it has been documented that a rising tone tends to be longer than other tones. Another well-known fact is that the glottalized tones in some Chinese languages? such as Cantonese

(Fok-Chen, 1974: 27; Matthews and Yip, 1994: 23) and Shanghainese

(Rose, 1988: 56) are associated with a relatively short durational pattern. For instance, according to Fok-Chen's experimental study

(1974) on Cantonese tones, the vowel duration of a glottalized tone8 is generally half the duration of other normal (non-glottalized) tomes. Rose considers the durational parameter of tones as a secondary correlate9 for Modern , because it can be "linguistically analyzed as phonemically non-distinctive" (ibid.).

Here 1 prefer to use "languages" instead of "dialects". on the basis of the mutually intelligibility criterion (Steinbergs, 1992: 350) The term, glottalized tones" refers to the Entering tones in Cantonese (Le., syllabIe with an unreleased stop in coda position). Similar to duration, another well-known secondary correlate or concomitant feature of tones is the creaky phonation associated with falling tones in Chinese, and some southeast Asiaa languages (Sagart 1988: 84; Rose. 1988: 56). This latter correlate of tone is not examined in this study. 1.1.3 Perception of Tones

It is well known that Fo varies with the production of individual speakers, ranging from high Fo for children and women to low Po for men. In order to perceive lexical tones in speech, normalization of tonal patterns is employed (Ching, 1984; Gandour,

1994: 3 116; and Abramson, 1997). Listeners are assumed to infer the speech source in order to extract similar tonal patterns from the acoustic input (Ching, 1984: 317). In other words, listeners need to infer the speaker's voice range, ignore the absolute Fo values, and extract the invariant Fo patterns of the speech (Abramson, 1976;

Leather, 1983). According to Fourcin (1978: 49), normalization depends on an ability to perceive or extract the (tonal) patterns; thus, this perceptual ski11 is not innate, but acquired. Ching (1984:

325) further suggests that it is a learning process that exists at the basic level of lexical tone labeling.

A considerable amount of research has provided evidence for the normalization of lexical tones in Chinese languages (Ching,

1984; Wong 1998 for Cantonese, Leather, 1983; Fox & Qi, 1990;

Moore & Jongman, 1997 for Mandarin), Because the purpose of this thesis is not to investigate evidence for the process of normalization of tones, the issue of this kind of auditory accommodation of tonal patterns will not be discussed further.

In the majority of perception studies, only Fo has been frequently examined, because it is considered to be the primary perceptual cue (Abramson, 1976, 1978, 1997 in Thai tones; Fok-

Chen, 1974; Vance, 1977; Gandour, 1979, 1981, 1984; Ching, 1984 for Cantonese). In his study on Thai, Abramson (1978) found that though Fo height is sufficient to distinguish the level tones, slow Fo movements enhance the identification of tones. He also found that fairly rapid Fo movements were required for high intelligibility of contour tones. Similarly, Gandour's Cantonese studies (1979, 1981,

1984) found that three perceptual cues, (i) Fo height, (ii) contour

(slope), (iii) and direction (rising or falling) were important for

Cantonese tonal perception. Furthermore, Tseng (1990) found that vowel duration is not a crucial acoustical parameter of Mandarin tonal production. Consistent with Tseng's findings, Moore &

Jongman (1997) reported that vowel duration alone did not distinguish tone 2 (rising tone) from tone 3 (low falling-rising) in

Mandarin. However, it became an important cue when combined with another cue, namely the change in Fo (in the falling direction) from the onset of the tone. Thus, it appears that combinations of perceptual cues could be crucial to certain tones, especially to those

tones that have similar patterns.

1.1.4. Representation of Tones: Chao's Tone Letters

In the past, (before the 1960s), tone patterns were commonly

analyzed by eu, and perceptual impressions were represented by

musical notesL0(such as Jones & Woo, 1912; Chao, 1956 for

Cantonese), and stylized miniature graphs (Chao, 1928 (for Wu

dialect); 1930; 1947: 12 (for Cantonese)). Tone patterns may also

be described by words (Eitel 1947, cited in Fok-Chen, 1974: 18).

More recentiy, lexical tones (especially the Chinese ones) are

typically described by tone letter values devised by Chao in 1930

(Hashimoto, 1972; Yip, 1980/ 1990; Fu, 1995; Bauer & Benedict,

1997; among others). This system provides "simplified time-pitch

graphs of the voice" to make the tones visible as graph miniatures

(Chao 1930: 24). Tone letters are represented on two vertical lines

and tonal movement is from left to right, representing the starting

(initial) and ending (final) points of a contour. The left verticai iine

is usually not shown on the miniature graph. The tonal range is

divided into four equal parts with five reference (or scale) points on

'O The advantage of using musicd notation is that it is possible to discount the différences in individual ranges; however, it is uncertain if Fo is perceived in the same way as musical tones (Lehiste, 1970: 81). J each vertical line, The tone letter with the value 1 indicates the lowest tonal value, while the letter with the value 5 indicates the highest tonal value. Tones are represented by two or three letter values ta indicate the initial and the final, or the initial, the media1 and the final pitch levels (Chao 1930: 24-27). Figure 1-1 is an example of Chao's tone letters for Cantonese;' the corresponding values for the tone letter are presented in square brackets [ ] beside the toncs.

Figure 1-1. Tones letters for Cantonese12

+Tone 1 [55] +Tone 2 1251 -+-Toae 3 1331 +Tom 4 [21] +Toae 5 [23] +Tone 6 [22]

" Individual tone letters for ihe six lexical tonts in Cantonese can also be sccn in Table 1-5. The reasoa for prcsenting the system in this way will become ctear in Chapter Two, whm I prtsent the tonal system of the tbrce participant groups in this form. " in Bauer & Benedict, 1997; So, 1997; So, 1999a, 1999b, it was found bat the onscts of the two Rsing tones start from a similar Fo level. Thenfore, I pnfer to use [25] to dcscribe tone 2 and [23] to describe tone 5 throughout the thcsis. There are two advantages of using Chao's tone letters. First, sirnilar to musical notes, Chao's tone letters can possibly discount the differences in individual ranges (Lehiste, 1970: 81). The second advantage is related to versatility (or flexibility). For people who do not have a musical background, Chao's tone letters are "quite helpful in symbolizing tones" (Bauer & Benedict, 1997: 119). As a result, because of its convenience and versatility, Chao's tone letter system was adopted by the International Phonetic Association (Bauer

& Benedict, 1997: 114), in 1989 (Fu, 1995: 8) to represent graphically the tone contours of other languages.

For this thesis, the description of the Cantonese tonal contours is depicted with Chao's tone letter values rather than the graph miniatures, and the values are placed at the end of the root-word as superscripts, or they are presented in square brackets

[ ] when they stand on their own.

1.2 HONG KONG CANTONESE

Cantonese, a tone language, is a variety of one of the Chinese languages13, Yue. Yue language is distributed across southeast China, including Guangzhou, Hong Kong, and Macao (Arendrup, l994:S2 1).

l3 The other six Chinese languages are Mandarin, Wu, Gan, Xiang, Kejia (or Hakka), and Min (Bauer & Benedict, 1997: XXXV). It serves as a regional standard (Ramsey, 1987: 99) and lingua franca for many parts of southern China and Southeast Asia (Lee,

1993: 3; Tang & Maidment, 1996: 2; Bauer & Benedict, 1997: xxxi).

Cantonese is wideiy spoken in many overseas Chinese communities in Malaysia, Indonesia (Tang & Maidment, 1996), San Francisco,

Sydney and Vancouver. Et is also one of the Chinese languages taught in many universities in North America, and in weekend Chinese schools set up by local Chinese communities (UCLA Language Profite,

2000).

Cantonese includes four sub-varieties: Siyi, Gaoyang, Yuehai and Guinan (ibid.). Hong Kong Cantonese belongs to the Yuehai dialect, represented by the dialect of Guangzhou City (ibid.). Unlike

Guangdong, Cantonese in Hong Kong plays an important role in

Society, because it is (i) the medium of instruction in many schools,

(ii) the language used on most television and radio programs, and

(iii) the choice for official government functions (ibid.).

Since this study focuses only on the tonal system of Hong Kong

Cantonese, the term, native Cantonese speakers, is used to represent native (Hong Kong) Cantonese speaker. In the following sections, a brief description of the Cantonese syllable structure and its tonal system, as well as the factors, tonal sandhi, tonal coarticulation and tone change, affecting the tonal contour, will be provided (Section

1.2.1 - 1.2.3, respectively).

1.2.1. Syllable Structure

The syllable in Cantonese is typically considered to be the

Tone Bearing Unit (TBU) (Wang 1967; Hashimoto 1972; Yip 19801

1990)14. A lexical tone is assumed to be superimposed on segments

of the whole syllable. The structure of Cantonese syllables is simple.

Table 1-1. Syllable structure of Cantonese (adapted from Hashimoto, 1972; and Bauer & Benedict, 1997).

where Ci : optional initial consonant1 glide. V : obligatory vowel(s) as a nucleus. Csyl : syllabic consonant (only present when it stands by its own) Cf : final consonant/ nlide

The basic structure of a Cantonese syllable and its relationship

with Lexical tones are demonstrated in Table 1-1 above. A syllable is

made up of maximally three segments from two components: an

optional "Initial" (onset) and a "Final" (rhyme). The optional initial

(if present) is either a consonant or a glide. Cantonese does not allow a consonant cluster in the syllable. The final consists of a nucleus and an optional coda. A final can be a single vowel. or a diphthong in open syllables, or be a vowel with a consonant / glide.

When it is a closed syllable, the coda can be (i) one of the semivowels /w/ or Ijl, (ii) one of the nasals lm/, ln/, or /g/, or

(iii) one of the voiceless stops /pl, /t/, or /k/.

Another type of Cantonese syllable is made up of one single syllabic nasal, [qi] or [q]. This type of syllabic assai syllable is dso 8 found in other Yue dialects and behaves as if each syllabic nasal were linked to both the onset and the nucleus (Yip 1982:659). Some

Cantonese examples are shown below.

(a) [m 'not' [Note: (b), (cl and (4 are (b) [gZL] 'Ng'(surname) (51 homophones, but they (c) [XJ23] 'five' [El are different in their (d) [y ] 'noon' [Tl Chinese characters.]

Regarding the status of syllabic nasal syllables in Cantonese, a number of different proposals can be found in the literature. The prevalent pr~posal'~that is found in most of the recent literature

l4 For a different view, sec Duaamu (1990). He bas argucd that tht TBU of al1 Chinese languagcs is noi the syllable, but "always the moraic segment" (Duanmu 1990: 152), or the scgmtnt(s) in the rhymt (Duanmu, 1990:149). lS The proposal was suggested first by Wong (1954 S. 9-10). He found that the segments could be syllabic or non-syllabic, and syllabic nasals arc ueated as the allophones of lm/ and fqf. However, since then exists no concrete evidence to prove that the syllabic nasals are finals, Wong (1982: 100) tentatively assumed that each syllabic nasal "may be an independent final" and placed them as finals. (Kao, 1971; Hashimoto, 1972; Cheung, 1972; and Bauer & Benedict,

1997), treats the syllabic nasals as nasal nuclei (similar to those vocalic nuclei). The advantage of this treatment is that it ensures each syllabic nasal to be an obligatory and a central component of the syllable (Kao, 1971: 34). Therefore, the syllabic nasals are treated as members of the 53 finals in Cantonese (see Table 1-3).

The foltowing tables provide the inventories of (i) Cantonese initial consonants (Table 1-2), (ii) the 53 final combinations at the phonetic level (Table 1-3), (iii) and the vowels in Cantonese (Table

1-4). as references.

Table 1-2. Initial consonants in Cantonese (adapted Zee, 1999; Bauer & Benedict, 1997)

Bilabial Labio- Alveolar Palatal Velar Labio- Glottal ' POA Dental Ve br MOA Stops k k" 8 ; kh k* Affricatts ts tsb (t1 ( tsh 1 Fricatives t s (1) h Nasols PI II rj Lateral Approximant I A raximant j w (&de)

Notes: the palatal consonants placed in parentheses, [tl], [tSh], and u] are phonetic variants of /ts/, /tsh/ and /s/. Table 1-3. The 53 finals in Cantonese (adapted from Hashimoto, 1972; Cheung, 1972).

Table 1-4. Vowels in Cantonese (Bauer & Benedict, 1997).

Front Central Back UR R UR R UR R HIGH I Y u (1) (u) MID (4 (0) (0) E (E 3

The vowels placed in parentheses, [I],[e] ,[ni] ,[u ],[O], are the phonetic variants of the vowels /il, /E/, /a/,lu/, /3/ respectively.

1.2.2. Tonal System

The tonal system of Cantonese is more complex than that of the other Chinese languages, such as Mandarin which has only four lexical tones,I6 and Taiwonese which hos only five lexicai tones (Zee,

1977). Cantonese has preserved the four types of tones from ancient Chinese,'' and they have developed into the Upper and the

Lower registersln (Fok-Chen 1974: 10) following two historical tonal splits (Bauer & Benedict, 1997: 154- 157).

Table 1-5. Cantonese tones (in tone letters).

See the previous section (1.1.1.). l7 Ancient Chinese refers to the timc period about 1300 years ago, approximately the 7th Century (Bauer & Benedict, 1997: 154). la Today, many Chincst languagcs do not have the Entering Tones in iheir tonal systems, but Cantonese and Shanghainesc have them. l9 The high Ievel tone [55] in the literature (Bauer Br Benedict, 1997: 162; Chao 1947: 26; Cheung 1972: 10; Hashimoto 1972: 112; Kao 1971: 84) is described as a result of Cantonese (dctails in the next section 1.2.3.). It is derived from the high falling tone 1531, whenever the high falling tone (531 is followed by another high falling tone [53] or an upper Entering tone [SI. However, in the pnscnt day, most Hong Kong speakers seem to lack the high falling toncs (531 or they do not use this tone in the same way as Guangzhou speakers (Bauer & Benedict, 1997: 117). As seen in Table 1-5 above presents the modern Cantonese tonal categories that have developed from the ancient four-tone system: "Even" (Ping ). "Rising" (Shang _t ), "Going" ( =& ), and

"Entering" (Ru A ). This classification was documented by the 5th

Century scholar, 441-513 AD (Wang & Cheng, 1987: 515).

After the Great Tone Split20 took place, the original single split into two registers: the "Upper' (Yin) and the "Lower" (Yang).

The combinations of the four types of tones and the two registers yielded eight different tones. The Second Tone Split affected the

Upper Entering tone21. This second split created a new Middle register in the Entering tone category. At present a total of nine tones are found in Cantonese.

With respect to the number of tones in Cantonese, several proposals have been suggested, depending on the method of classification. According to Killingley (1985), Cantonese has five

It was the first split, which resulted from the loss of voicing .distinction of the original voiced obstruent initial consonants (Bauer & Benedict, 1997: 155). '' The splitting of the "Upper Entcring tone" (the 2nd tone split) yielded the "Upper Entering tone" (tonc 5) and thc "Mid Entering tone" (tone 3). The splitting was coaditioned by : syllables with phonetically short vowels moved into the Upper Entering tone category, and syllables with long vowels moved to the Mid Entering tone category (Bauer & Benedict, 1997: 156). Examples are shown below. Tm (a) [pits] Ca] "must, certainly" [p1k3] [a)"to force" (b) [futs] (Jljl] "wide" [fuk3] (a]"a roll of mapl pictures" tones; Cheung (1972) considers there to be seven tones?* Most other scholars considered Cantonese to have nine tones in phonetic f orm23 (see Table 1-4). Among the nine tones, tone 1 to tone 6 are considered as t~nemes~~(similar as phonemic forms) in Jones and

Woo, 1912; Chao 1947; Kao, 1971; Hashimoto, 1972; Fok-Chen,

1974; Yip, 19801 1990; Kong, 1987; Matthews and Yip; 1994; Zee,

1995; among others. Under this classification, the three Entering tones, tone 7, tone 8, and tone 9, me typically treated as the allophonic variants of tone 1, tone 3, and tone 6, respectively. The phonetic environment is highly predictable (Chao, 1947; Fok, 1974;

Hashimoto, 1972; and Lee 1993), because the syllables which carry the Entering tones end with one of the following stops, /pl, /t/, or

/k/, that have short duration. As mentioned above (Section 1.1.2), in her experimental study, Fok-Chen (1974) found that the vowel duration of an Entering tone (tone 7, 8, or 9) is generally half the duration of the other tones.

22 Cheung (1972) considered the high Ievel tone [55] and the high falling tone [53] to be two tones. However, according to Bauer (1999: 3-4). most Cantonese speakers in present day Hong Kong have the high level and their high faiiing tone, which wu originaily in Guangzhou, seems to be lost. *' Cantonese does not have a neutcal tone (Matthews & Yip, 1994: 22). '* Killingley (1985) proposed that there are only five lexical tone, instead of six. Tone 5 is not necessary to be counted in the inventory. Her study was based on Maiayan Cantonese. These nine tones cm be described as either glottalized or non- glottalized. The Entering tones are considered to be glottalized tones, because the final stops are produced with glottalization. They are associated with a distinctively short and abrupt quality (Bauer &

Benedict, 1997: 159). Examples of these tones are:

Tone 7 [siks] "CO t our" CM Tone 8 [sik3] "a kind of metal" Tone 9 [sik2] "to eat" kt%]

Al1 other syllablcs can be callcd non-glottalized. Each may carry one of the six contrastive tones. The syllable /ji/, provides an example: Tone 1: /jiss/ "cure" [ml Tone 2: /ji2'/ "chair" [#f) Tone 3: /jif3/4'opinion" (a) Tone 4: /ji21/ "child" [RI Tone 5: Ijiz3/ "eu" ('IF) Tone 6: /jiZ2/ "two" [z]

It should be pointed out that not every root-word syllable has rcferential meaning for al1 six contrastive tones or three glottalized tones. For example, the syllable /tail is associated with two tones, tone 2 "a married woman" and tone 3 "too (much)". Similady, glottalized syllable root-words may not carry al1 three entenng tones. For instance, 1f.V carries only tom 7 "suddenly" [],and tone 9 "buddha" [#) , but not tone 8. 1.2.3 Factors Affecting Tonal Contours

Although the six contrastive tones have their own tonal values as stated in the previous section, it should be mentioned that three factors can affect the Fo values of Cantonese tones: (i) tone sandhi,

(ii) tonal coarticulation and (iii) tone change.

Tone sandhi (TS) occurs when a tone is conditioned by the tonal environment. Although Cantonese has a large number of contrastive tones, the phenomenon of tone sandhi is relatively limited. Two kinds of tone sandhi have been documented

(Hashimoto, 1972: 1 12):

(TSI) 53 --> 55 / 53/55/5 (TS2) 21 --> 22 / 211 22

For TS1 the high falling tone [53] will become high level [55], when it is followed by another high fallinglhigh levell upper Entering tone. For TS2 the low falling tone [21] becomes the lower level [22] when it precedes another low falling or level tone. In general, it can be concluded from these two environments that the falling tone [53] or [21] will become level whenever they are followed by a tone that starts with the same pitch values, regardless of the type of tone.

Tone sandhi occurs in normal discourse in connected speech

(Hashimoto, 1972: 1 13). Tonal coarticulation is a factor that causes modifications of Fo contours of contrastive tones due to the influence of adjacent tones on syllables in connected speech (Gandour, 1994: 31 17). Tonal coarticulation is different from tone sandhi. The former occurs freely in speech, while the latter tends to occur in only one of two specific tonal environments.

In a Mandarin acoustic study, Shen (1990) found that the influence of tonal CO-articulation on tones could be bi-directional, both "carryover" and "anticipatory". Similar findings have been reported by Peng (1995) and Xu (1997). Abramson (1978: 320-1) also found the phenomenon in his study on Thai tones: (i) the levei tones are not very different from the contour tones; (ii) a high tone can be described as a high rising tone; (iii) a high rising tone can be described as a low rising tone; and (iv) a low tone could be viewed as a iow falling tone, etc. The author of this present study also observed the phenonemon when examining the Fo contours of the collected data in sentential forms. For example, a rising tone became a level tone. Similar observations in tonal coarticulation were also reported in her study of Cantonese songs, Chan (1987: 36) suggested that this pattern might be due to the interaction between tones and intonation. A faster rate of speech would "destroy the initial pitch shape on the rising tones, while greater constraint would exist to minimize loss".

The third factor aifecting tones in Cantonese is referred to as

"changed tones" (pin jam @ s)or "tone change". Tone change may be defined as follows: a tone becomes another tone due to certain morphological or syntiwtic environments (Hashimoto, 1792:

93). or semantic factors (Matthews & Yip, 1994: 23). Tone change tends to bccur more in colloquial speech, rcduplications, and compounds.

There are two changed tones that are frequently observed in

Cantonese: a high level tone (tone 1)['55]* or a kind of high rising tone [*25]F6 The [*25] is very similar to, but different from, the high rising tone, which used to be described as [35] (Chao, 1947:

34). In other words, they were differeat in the initial tone values.

Examples of the two changed tones are illustrated below.

(a) a33 ji21 --> a33 jieJJ "aunt'" EN@ (b) kaSStse2s --> kasS tscms5 "older si~ter'~~sjdi

2s The raised circler] is adopted from Chao's expression (1947). indicating that the original tone has been changed. 26 This expression was first used by Chao (1947: 34) and latter adopted by other scholars (Hashimoto, 1972, Bauer & Benedict, 1997). The asterisk (*) indicates the original tone has been changed. The tonal value [25] is assigned to the word, since Chao found tbat tôc changed tone is different from the regular high rising tone (35) at thai time. 27 Hashimoto (1972: 97). Bauer & Benedict (1997: 169). (c) hug2I hug teiZs --> huqz1 hui]*25 tei2s "a Iittle red" z9 (d) socgJ3 -- > sœq *2S 4fKb't)

According to Bauer & Benedict (1997: 169-71), today the Fo contour of the high rising changed tone [*25] is phonetically identical and indistinguishable from the high rising tone -- tone 2 with tone value [25]. It is possible that the two tones have merged together, since native speakers now use the same tone value [25] ta described the high rising changed tone as well as the high rising tone

(tone 2).

13. LITERATURE REVIEW OF CANTONESE STUDIES

1.3.1. Production Studies

Eitel (1947: cited in Fok, 1974: 18) provided detailed impressionistic descriptions of the tones in Cantonese. He made the following observations:

(i) The rnid level tone (tone 3) is the nearest to that of the

speaker's voice (pitch values associated with one's normal

speaking);

(ii) The low level tone (tone 4) has the lowest pitch, while the

high level tone (tone 1) has the highest pitch;

-- -- l9 Hasbimoto(l972: 99). 'O huer & Benedict (i997: 169). (iii) In both high level and Low level tones (tone 1 & tone 6),

the voice is sustained for a movement with a smooth and easy

effort;

(iv) Both the high rising and low rising tones (tone 2 & tone 5)

are marked by inflection and by a gradua1 sliding upwards;

(v) The high rising tone (tom 2) gradually ascends and the low

rising tone (tone 5) starts from a relatively lower pitch level,

but it does not ascend as high as others;

(vi) The mid low level tone is hard to differentiate from the

low level tone. They both are similar in pitch level, and quality

of voice, and are relatively shocter than the other tones.

Fok-Chen (1974) recruited seven speakers (4 males and 3 females), two of whom produced the root-word /sig/, while the other five speakers produced the root-word /jyn/. She examined three physical properties: fundamental frequency (Fo), vowel duration, and vowel intensity. Her observations were the followings:

(i) Fo ranges varied among speakers but speakers manipulated

the six tones according to the typicai patterns,

(ii) There was no consistent pattern evident in duration and

intensity among the six tones, On the basis of her findings, she

concluded that pitch variation was the most important parameter in Cantonese tones, while intensity and duration

were of less importance.

Contrary to the findings of Fok-Chen (1974), Kong (1987) observed that there was a correlation between vowel duration and tones, thus vowel duration should be considered important in

Cantonese. He examined the six lexical tones for the root-word, /si/ from the productions of three speakers, focusing on the relationship between the vowel duration and the six tones. His findings are listed belo w :

(i) The high rising (tone 2) was the longest tone and the low

level tone (tone 4) was the shortest one.

(ii) Among the 4 level tones (tone 1, tone 3, tone 4", and tone

6), the mid level (tone 3) was the longest, the mid low level

tone (tone 6) was intermediate, and both the high level (tone

1) and the low level (tone 4) were the shortest;

(iii) Among the four level tones, there was a mid-point

(average Fo) in the Fo range, from the high level (tone 1 -- the

highest) to the low level (tone 4 -- the lowest). The farther the

Fo was from the mid point, the shorter the vowel duration was

'' Kong treated tone 4 as a level tone in his study. for that tone. Conversely, the closer the Fo was to the mid

point, the longer the vowel duration for that tone (i.e., the

vowel duration of tone 3 is the longest, while other tones

located on both sides are shorter).

Zee (1995) investigated the temporal organization of syllable production in Cantonese. He examined a list of Cantonese monosyllsbles of the types (Cl)V:, (Cl)D, (C)V:C2, and (C1)VCâ associated with 9 different tones. The target words were al1 embedded in the middle of a carrier frame. He made several observations:

(i) Temporal compensation does not take place between C and

V in the CV syllables;

(ii) Tone plays a part in determining the temporal pattern of

the syllables -- the duration of the vowel or diphthong in CV

syllables associated with a tone 4 is shorter than the vowel

durations of the corresponding syllables with the other five

lexical tones;

(iii) Reduction in terms of vowel duration occurred in the fist

vocalic segment of the syllables containing a diphthong or a

VN(asa1). So (1998) investigated differences in tonal (Fo) patterns and vowel duration between two groups of Canadian Raised Cantonese speakers32(YCRC and TCRC) and a group of native Cantonese speakers (NCAN)33. The YCRC group consisted of young male high school speakers who hrid moved to Canada before the age of seven.

The TCRC group consisted of adult male speakers who had moved to Canada between 10-15 years of age. Six tones associated with target words €rom the root-word syllable /si/ were embedded into different sentence carrier frames to elicit relatively casual speech signals. The findings are as follows:

(i) The TCRC group had similar tonal patterns to the NCAN

group, while the YCRC group showed to have irregular tonal

patterns (e.g., the falling tom was produced with a rising

pattern). This kind of deviation could be attnbuted to the fact

that the YCRC speakers were too young when they moved to

Canada, and had not fully mastered the tonal patterns. The

new environment that they were exposed to may not have

provided proper and sufficient linguistic input for them to

develop the tonal system of Cantonese.

'' See the beginning of this chapter for definition. '3 These abbreviations, YCRC, TCRC, ;uid NCAN wilI be explained io Chapter Two, Section 2.1.1. (ii) Vowel duration pattern of the YCRC graup was the longest

while the one of the TCRC group had the shortest. The longer

durational pattern for the YCRC group rnight be a-result of

interference frorn the durational patterns of English, because

English is the major means of communication in their daily

activities. The shorter durational pattern for the TCRC group

rnay be due to the unconscious emphasis of the relatively

shorter durational patterns for Cantonese as compared to

Englis h.

In her other study (1999a). So focused on comparing the two

Cantonese rising tones (tone 2 and tone 5) of adult YCRC speakers and adult NCAN speakers. Four target words in citation form were employed for /si/ and /fu/. Two acoustic correlates of tones, duration and Fo, were examined. Duration was compared between the tones and speaker groups for three domains: (i) the whole syllable, (ii) the prevocalic consonant, and (üi) the vowel. For the Fo parameter, the normalized vocalic duration associated with the Fo pattern for each group, was divided into 4 percentage regions at

25% intervals; hereafter, they are labelled as durational range- sections: R1 (0-258). R2 (25%-50%), R3 (50%-75%), and R4

(75%-100%). The changed Fo values of the two rising tones in each range-section were ~ornpared.~~Figure 1-2 shows the relationship of the 4 durational range-sections and their changes in Fo values

(AFo) for a high rising tone (tone 2) with hypothetical data for easy illustration.

Figure 1-2. Relationship of durational range-sections and their AF O

4 durationid mae-sections Normalized Vocalic Duration

The following were the observed findings:

(i) Durations for the three domains (syllable, prevocalic

consonant, and vowel) of tone 2 tended to be longer than

those of tone 5, but the differences were not statistically

significant;

34 This type of cornparison will be described in detail in Chapter Two.

3 1 (ii) No significant durational difference was found between the

two speaker groups (the NCAN and the YCRC) in any domain; (iii) The Fo range (Fomax - Fomin), describing the two rising

tones of the NCAN group, was significantly larger than that of

the YCRC group;

(iv) The contours of tone 2 and tone 5 could be differentiated

in terms of AFo in the four durationai range-sections in each

speaker group: For the NCAN speakers, significant differences

in AFo values were found between tone 2 and tone 5 in the R2,

R3, and R4 range-sections, whereas for the YCRC group,

significant differences in AFo values were found between tone

2 and tone 5 in the R3, and R4 range- section^?^ This implies

that there is a difference between the NCAN speakers and the

YCRC speakers in terms of the timing for differentiating the

two rising tones: the timing for the NCAN speakers to

differentiate the two rising tones starts earlier than for that of

the CRC speakers (R2 vs. R3).

'' This implies that the difference in AFo values between tone 2 and tone 5 in the R1 range-section was non-significant. This supports the findings of Bauer & Benedict (1997) and So (1997) that the two rising tones start at similar Fo values. So's more recent study (1999b) attempted to find characteristic differences between the two Cantonese rising tones in two native Hong Kong Cantonese speaker groups (female speakers vs. male speakers). Four target words of the syllables /si/ and /fu/ were used, and acoustic correlates of tone (duration and Fo ) were investigated. In the study, So used the citation form for the analysis of the Fo parameter, so as to eliminate the effect of tonal coarticulation (mentioned in Section 1.2.3), and the sentential form for the durational analyses in order to minimize the variation in durational measurements from the citation form. Durations were compared on three dimensions, (i) syllable, (ii) prevocalic consonant, and (iii) vowel. Further, durational ratios (C/V) were also examined. The fundamental frequencies were compared between the speaker groups in a fashion similar as to that of her pervious study (So, 1999a).

With respect to the durational parameter, it was found that

(i) There exists no durational difference between tone 2 and

tone 5 with regard to sex.

(ii) Difference in the vowel durations between the two tones

was non-significant, but differences in the durations of

syllables, prevocdic consonants, as well as in the CV ratios

between tone 2 and tone 5 were observed to be significant. These results correlate with consonant duration. The findings suggest that the tones do not influence vowel duration, but they appear to affect consonant duration only.

Concerning the Fo parameter, the following findings emerged:

(i) The starting points and the dip points of the contours of the two rising tones, for both females and males, are located at similar Fo levels. No statisticaliy significant difference was found. Thus, these results support the observations made in recent literature (Bauer & Benedict, 1997, So 1997), according to which Cantonese high rising tones should be described with tone letter [25] rather than [35].

(ii) There were significant differences in Fo ranges of the tones between the female and male speakers* This is possibly related to their physiologicd differences. Therefore, So (1999b) attempted to use Fo ratios (Fo maximum / Fo minimum) to compare each durational range-section, of the rising tones, between the two groups, so as to factor out gender difference.

The results, as expected, revealed that the Fo ratios between tone 2 and tone 5, in both gender groups, were significantly different. This suggests that using the Fo ratio proved to be more appropriate than using Fo ranges for Fo analysis when

two sex groups are involved in a study.

(iii) For both the female and male speaker groups, the timing

for tone 2 and tone 5 showed differences in terms of AFo

values, which were located in the R2 range-section (Le., 25-

50%) throughout the vocalic duration. This corresponds to the

results in So's previous study (1999a).

Bauer (1999) conducted an acoustic analysis of Cantonese Fo contours. Four speakers were instnicted to read the target words from a list containing al1 possible Cantonese tones in citation form.

From each Fo contour, Fo values at onset, dip (turning points) and end point were extracted and compared. The observations of these cornparisons are listed in the following:

(i) The speakers did not distinguish between the high level

tone [55] and high falling tone [53]

(ü) The tone letter values for tone 2 (high rising tone) were no

longer [35] but [25]. Tone 2 and tone 5 begin at a rnid-low

point (i.e., equivalent to tone letter [2]).

(iii) The high rising changed tone [*25] proved to be identical

to the tone 2 [25] with similar Fo values. This implies that the

two have merged. (iv) Chao's tone letter value of tone 5 is [23] but not [13].

1.3.2. Perception studies

Perception studies of tone languages tend to focus on the Fo cue rather than other acoustic cues. Fok-Chen (1974) in

investigating the performance of participants in lexical tone

perception of Cantonese found that the six contrastive tones were

not equally vulnerable to perceptual confusion. Tone 1, tone 2, and tone 4 were the most salient among the tones in the perception of

Cantonese tones (e.g., Fok-Chen 1974). Confusion was confined to

tones with similar patterns. For example, rising tones were confused

with other rising tones, and falling and level tones were confused

with one another (tane 4 & tone 6).

Vance (1977) attempted to find the range of variation of

pitch-time contours for each lexical tone in Cantonese, using

synthetic stimuli in order to vary Fo trajectories. A few observations

were made:

(i) Listeners had more confidence in labeting their responses

as tone 1 when high level or high failing contours were given

to them;

(ii) Slope or "gradient" was essential in discriminating between

tone 2 and tone 5; (iii) Tone 3 was perceived within the entire range;

(iv) Tone 4 did not receive high identification scores; and

(v) Tone 6 was the default tone label when listeners felt

unsure in identifying a certain tone.

Gandour (1979, 1981) employed a multidimensional analysis of Fok's confusion data for Cantonese. He found that three underlying dimensions, CONTOUR (i.e., slope), DIRECTION (i.e., rising or falling), and HEIGHT (i.e., average pitch), were important to perception. The first two dimensions proved to be of particular importance for the variance in listener responses. However, in a later perception study (1984) with synthetic tokens, Gandour included Mandarin and Taiwanese listeners, and found that

Cantonese listeners attached relatively more importance to the

HEIGHT dimension than did Mandarin and Taiwanese listeners.

Ching (1984) used both natural and synthetic tonal tokens in order to assess the ability of Cantonese children to perceive the differences in Fo patterns. The ages of the children ranged from 4 to

10 years. Five types of tokens were used: (a) natural speech, (b) natural speech with vocal tract information essentially absent using a low pass filter at 1 kHz, (c) synthetic speech approximated to natural tokens, (d) synthetic speech with tonal patterns being transposed logarithmically, and (e) synthetic speech with tonal patterns expanded logarithmically by almos t an octave. Ching' s findings are listed as follows.

(i) Recognition ability improves with age.

(ii) Children at age six or younger require natural tokens to

make confident judgments.

(iii) Children older than six years of age are able to make

linguistic decisions (identification of tones) based on patterns.

(iv) At about age ten, children make confident judgments

based on both speech and pattern forms.

(v) Children who are good at labeling transposed and

expanded tonal patterns, also responded well to natural

tokens.

(vi) Synthetic tokens with transposed tonal patterns had better

responses than the ones with expanded tonal patterns.

(vii) The performances of children who make confident

judgments in response to the transposed stimuli are

compatible with that of the adults.

(viii) Tone 4 is best identified in al1 stimuli types.

(ix) Much of the significant confusions in the responses of the

children were between tone 3 and tone 6, and between tone 5

and tone 6. 1.4. PURPOSE OF THE PRESENT STUDY

Previous studies on bilingualism (Mack, 1984, 1988; Flege,

MacKay, & Meador, 1999) found a relatively strong correlation between age at the onset of second language acquisition and the second language performance: early bilinguals may perform similarly (or approximately) to monolingual native speakers of the second language (e.g., voice onset time (VOT) and vowel production and perception).

This raises a question as to whether bilinguals are able to maintain their native language equally well? With respect to the issues involving the maintaining of the first language by bilingual speakers, some previous studies reported that the native language of speakers was subject to phonetic change (e.g., VOT) due to second language learning (Major, 1992, 1997; Sancier & Fowler, 1997).

What would be the case for the tonal system of the Canadian Raised

Cantonese (CRC) speakers? Although there exists a considerable amount of research in production and perception of Cantonese tones, relatively little attention has been paid to the tonal system of

Cantonese speakers who grew up in North America. It is to be hoped that this study will shed some light on this issue. This present study focuses on the issues involving the maintainance of the tonal system of CRC (bilingual) speakers in ternis of (i) the extent of differences in connection with the tonal system between the native Cantonese speakers and the CRC speakers, and (ii) the extent of differences in connection with tonal system in relation to the age of arriva1 (AOA) to Canada of the speakers of two CRC groups. In fact, this study was an extension of the author's previous studies (1998, l999a), in which both duration and fundamentai frequency patterns of CRC speakers were found to be different from those of native Cantonese speakers. In this study, the tonal systems of two groups of bilinguals in Vancouver were examined through tonal production and perception experiments.

The two groups of bilinguals in this study had moved to Canada from Hong Kong at a relatively young age, or were born into immigrant families. They were called Canadian Raised Cantonese

(CRC) speakers, because they were al1 raisedi grew up in an English speaking environment -- Canada, and the first language that they

Learned and/ or were exposed to, was Cantonese. The speech production and perception scores of the two CRC bilingual groups were compared with those of the age-matched native Hong Kong

Cantonese speakers (as a cornparison group). Before proceeding to the outline of this study, two issues must be discussed. First, there exists a limitation in the methodology of Fo analysis. Second, although it is generally believed that Hong Kong

Cantonese speakers are considered bilinguals, they are actually more like monolingual speakers and differen t from bilinguals who were raised in North Amerka.

1.4.1. Limitations of Fo Analyses

In most previous studies, the ways to analyze the acoustic parameter, Fo, generally involved comparisons on the bases of (i) descriptions (as in Eitel, 1947; cited in Fok-Chen, 1974: tg), (ii) descriptions with comparisons of Fo contour patterns (as in Fok-

Chen, 1974), or (iii) descriptions, comparisons of Fo contour patterns, plus a cornpuison of the extracted Fo values at a few reference points dong the contours, such as the start, the dip, and the end points (Hashimoto, 1972: 122-126; Bauer & Benedict, 1997:

128-130; Bauer, 1999: 10-22). However, these studies were limited in a number of ways.

Their comparisons were made on the basis of the Fo measurements of the individual speakers and no generalizations about the Fo parameter for a larger speaker group were made-

Another problem is that aven though the Fo contour for a group can be obtained through normalizing the vowel duration and extracting the Fo values at certain percentage points, as in So' studies (1998;

1999a), the way to average the Fo data for female and male speakers is problematic: the computed mean Fo values represent neither the female group, nor the male group.

Moreover, the prevalent method -- using Fo values (in Hz) to illustrate or describe tonal contours -- in the majority of acoustic studies in Cantonese (Fok-Chen, 1974; Ching, 1984; Bauer &

Benedict, 1997; Bauer, 1999; among others), does not provide a direct relationship between the obtained Fo contour patterns and the tone letter values employed in most of the literature for the description of tones (e.g., Kao, 1971; Yip, 1980/ 1990, Matthews &

Yip, 1994; among others). It is uncertain whether the obtained Fo patterns match the description of tones. For example, a subject

(HKF2) in Bauer's study (1999: 10) produces the low rising tone

(tone 5) in Cantonese from the onset of 211.6 Hz to the peak of

251.1 Hz. On the basis of these Fo values, however, the data would not show how much they match the tone letter values, [23], for tone

5.

Furthemore, there is no systematic analysis that examines the relationship among the lexical tones within a tonal system. Previous studies, such as Fok-Chen (1974) and Bauer & Benedict (1997), mainly focused on describing the overall shapes or patterns of the

Fo contours of their speakers. Recently, both Gandour (1994: 3 121-

3122) and Abramson (1997) have suggested that the notion of

"tonal space" should be introduced in order to investigate normalized Fo contours for large groups of speakers.

Accordingly, there is a demand for a method that provides a more representative way to depict the Fo contours of speaker groups. There is also a need for andyzing the Fo parameter in a more systematic way with the notion of tonal space being considered. This study will attempt to resolve the issues discussed above.

1.4.2. Different Bilingual Croups: CRC speakers vs. Hong

Kong speakers

It should be pointed out that CRC bilinguals (English and

Cantonese) in this study were considered to be different from the

"bilinguals" in Hong Kong, since they are different in their degree of bilingualism. Although most Hong Kong (young) people today are considered to be bilingual because they all learn English at a young age, they actually behave more like "monolingual Cantonese speakers" than bilingual speakers in their society. It is true that most of them started Iearning English at age 3 or 4, however, outside the classroom environment, these Hong Kong people rarely use

English for their daily activities (e.g., shopping) in their society. To most Hong Kong people, the status of bilingual is doubtful.

In their survey of 870 Hong Kong people in 1993, Bacon-Shone

& Bolton (1998: 76-79) asked the question "do you consider yourself to be a bilingual?". They found that more interviewees

(16.1%) considered themselves to be bilingual, when compared with the survey results (6%) in 1983. However, the majority of interviewees also demonstrated that they were unclear about their bilingual status: 19.7% of them did not consider themselves to be bilingual; 4% chose "partly bilingual", 5.6% selected "uncertain" about their status, and 54.7% answered that they "did not understand" the question.

Taking these facts into consideration, native Hong Kong

Cantonese speakers should not be thought of as bilinguals having the same degree of usage of English in their daily activities as bilinguals in North Amerka. In fact, they behave much the same as monolingual speakers in their society. Thus, it is appropriate to employ native Hong Kong Cantonese speakers as a cornparison group in this study. 1.4.3. The Outline of the Present Study

In order to examine the tonal systems of the two CRC groups, both production and perception experiments were conducted.

Twelve target words for the root-words /si/ and /fu/, carrying the six lexical tones, were used in the two experiments.

The production experiment (as reported in Chapter Two) was used to investigate the differences in the tonal systems of the two

CRC groups. The acoustic correlates of tone, duration and Fo, of the groups were compared with those of native Hong Kong speakers.

The way that duration was compared was similar to previous studies

(Kong, 1987; Zee, 1995), cornparison was made based on vowel duration only. For Fo analysis, the limitations, which were mentioned in Section 1.4.1, are dealt with. First, instead of describing the Fo contours in Hz, this study adapted the method employed in the study of Fon & Chiang (1999) to depict the Fo contours of the speaker groups in Chao's tone letters. Then, in order to analyze the Fo contours systematically, the following parameters were examined: (i) tonal space, (ii) spatial relationships among level tones, (iii) Fo interval of the contour tones were examined; and (iv) rising tones were investigated for the overall contour shape and their spatial relationship. The fiist three parameters were analyzed in terms of Fo ratios, as in the one in So's study (1999b), in order to eliminate the unnecessary sex-related differences. The last parameter was analyzed in terms of percentage change in Fo values.

The purpose of the perception experiment (as reported in

Chapter Three) was to provide additional evidence for the existence of the differences in the tonal systems of the speaker groups, observed in the production study. Listeners of the three groups were required to identify given natural stimuli from experienced

Cantonese instructors. The perception test adapted the identification paradigm of Fok-Chen (1974; Yiu & Fok, 1995) as a way for listeners to assess the target words, The identification scores (in term of correct percentage) of the three groups were compared. Then, confusion matrices were constructed, the same way as in previous studies (Fok-Chen, 1974; Yiu & Fok, 1995), in order to show the responses of the listeners, in the participant groups when they perceived the target tones. The patterns obtained in the confusion matrices reveal the kinds of tonal confusions the listeners had.

Finally, correlations and implications observed of the results from the two experiments are discussed in the concluding chapter

(C hapter Four). CnArnR TWO

TONAL PRODUCTION EXPERJMENT

The objective of the present study is to explore the extent to which the tonal system of two groups of CRC speakers may differ from that of the NCAN group.' The analysis of the tonal productions of these two groups was achieved by investigating two acoustic correlates of tone: duration and fundamental frequency (Fo).

Two root-words, /si/ and /fui were employed to elicit the production of the six lexical tones represented in two sets of target words (total of 12 words). These target words were embedded into twelve reading materials, consisting of three forms: citation, phrasal, and sentential forms. Among the three forms, the sentential and the citation forms were employed for analyzing vowel duration and the

Fo respectively (see the discussions in Section 2.1.4).

As in previous studies (Kao, 1971; Kong, 1987; Lee, 1983; Lee,

1993), when analyzing the acoustic parameter of duration, only vowel durations were considered. For fundamental frequency (Fo), the method of the analysis employed in this study may be

For the description of participant groups, see Section 2.1.1. considered innovative when compared to the methodology used in most descriptive studies (Kao, 197 1; Hashimoto, 1972, Fok-Chen,

1974; Bauer & Benedict, 1997; Bauer, 1999)? As in Abramson's study (1976), Fo data for each speaker group was extracted at €ive locations (in percentage points), O%, 2596, 508, 758, and 100%, throughout the vocalic durations of the syllables. The mean Fo values for the three speaker groups were transformed into Chao's tone letter values in order to examine the tonal patterns of the groups. This process of transformation was required, because tonal patterns illustrated in Fo values (in Hz) do not directly indicate how much the patterns match the descriptions of tones used in the

Cantonese literature, in which Chao's tone letters were employed

(Kao, 1971; Fok-Chen, 1974, 1979, 1984, Yip, 1980/1990; Bauer &

Benedict, 1997; among others). The method used to obtain Chao's tone letter values was adapted from the approach in the study of

Taiwanese Mandarin by Fon & Chiang (1999). In their study, mean

Fo values at the €ive percentage points were first transformed into semitones, a kind of musical unit; then they were converted into

* See Sections 2.1.5.2, and 2.3.2 for the description of Fo analysis in this study. tone letter values (see Section 2.3 for detds). Subsequently, tonal patterns of the two CRC groups were compared to those of the NCAN group by submitting the results of the Fo measurements to a series of statistical analyses in order to evaluate the deviations observed in the production of tones by speakers of the CRC groups. The tonal systems of the speaker groups were systematically examined, within the parameters of: (i) tonal space, (ii) the spatial relationship among level tones, (iii) the Fa intervals (tonal space) of the contour tones, and (iv) the overall contour patterns and the spatial relationship of the rising tones (see Section 2.3.2). The first three parameters were compared in terms of Fo ratios and the last one was compued in terms of percentage change in Fo values (see Section 2.3.2.2).

Three hypotheses were tested in this study. The first hypothesis predicted that the mean vowel durations of the two CRC speaker groups would be parallel to the results of the study by So

(1998): The YCRC speakers would have the longest durational pattern and the TCRC speakers would have the shortest durational pattern. As mentioned earlier, the longer durational pattern produced by the YCRC speakers (teenagers) appears to be influenced by the Engfish durational patterns due to the language in contact situation. If this is the case, the relatively longer patterns should also be observed in this study, For the shorter pattern of the

TCRC speakers, the exact cause was uncertain. They may have unconsciously produced the vowel in a shorter pattern, because they subconsciously realized the difference in the vowel durational patterns between the English and Cantonese. This study will look further at the durational pattern of the TCRC speakers to see if the patterns are consistently observed.

The second hypothesis predicted that the two CRC speaker groups should be able to maintain the relative durational patterns among the six tones3 the same way as the NCAN speakers.

Previous studies of other Chinese languages have suggested that the rising tone tends to be the longest, while the falling tone tends to be the shortest, for examples, in Taiwanese (Zee, 1978) and in

Mandarin (Tseng, 1990): For Cantonese, similarly, Kong (1987) reported that tone 2 (the high rising) was the longest among the six

'The literature has documented that vowel durations of native speakers are somehow affected by type of tone, certain tones (e.g., rising tones) were produced with a longer duration pattern, and some (e.g., falling tones, glottalized tones) were with a shorter pattern. Also, see the discussion in Chapter One, Section 1.3.1. Tseng (1990) found that Mandarin tone 3 (the falling-cising tone) was desccibed as the longest amony the four lexical tones, while tone 2 (the rising tone) was in fact described as the second longest one. tones, and tone 4 (the low falling) was the shortest (ibid.; Zee,

1995). Moreover, Kong (1987) observed that Cantonese tone 3 was the longest among the level tones (tone I, tone 3, tone 4, and tone

6): Thus, if the CRC speakers had acquired the tones, they should be able to maintain the relative durational patterns among the six lexical tones in the same way as the native Cantonese speakers.

The third hypothesis predicted that the Fo patterns produced by the two CRC groups would be different €rom those produced by the NCAN group. This hypothesis was made based on the results of previous studies (So, 1998, 1999a). In these studies, it was found that (i) the teenagers in the YCRC group have irregular tonal patterns (So, 1998), and that (ii) the adults in the YCRC group had a smaller Fo range for the two rising tones and showed delayed timing to differentiate the two rising tones (So, 1999a, see Chapter One for details). Thus, if the tonal patterns produced by the CRC speakers were different from those of the NCAN group, the patterns would be observed in this study as well.

The results of the anaiyses in this study, concerning vowel duration and Fo, revealed a number of findings. First, no difference

Kong considered tone 4 to be a level tone ia his study.

5 1 in durational patterns exists among the three speaker groups, and they appear to be aware of the relative durational patterns among the six tones. Second, tonal pattern deviations were less evident in the TCRC group than in the YCRC group. It will be argued that different degrees of tonal pattern deviation exist in the two CRC groups, and that tonal reduction patterns are subject to certain hierarc hical orders .

This chapter will commence with the description of the experiment (Section 2.1). It will be followed by the analyses of the vowel durations (Section 2.2), and fundamental frequency (Fo)

(Section 2.3). A summary of the findings of this production study will conclude this chapter (Section 2.4).

2.1. METHOD

2.1.1. Participants

Thirty adults (15 males and 15 fernales), between the ages of

18 and 24 years, were recruited as paid subjects participating in both the production and the perception studies. According to their age of arriva1 (AOA) in Vancouver, they were assigned to one of three speaker groups (Le., each group consisted of five male and five female speakers).

The first group, Teenage Canadian Raised Cantonese

(TCRC), consisted of speakers who were al1 born in Hong Kong and had moved to Canada during their early adolescence (from 10 to 15 years of age).6 They had learnt Cantonese at school in Hong Kong, as well as at home. At the time of the recordings, the speakers' ages ranged from 18 to 22 years (M = 19.60 years). Their mean AOA was

12 years, and their mean length of residence (LOR) was 7.6 years.

According to the questionnaires given to them (see Appendix A),

Cantonese was their primary language for communication with their family members and friends, as well as in their daily activities -- reading and speaking, and watching television (TV). This implies that in general they preferred to use Cantonese rather than English.

The second group, Young Canadian Raised Cantonese

(YCRC), consisted of seven speakers who were born in Canada, and three additional speakers who had moved to Canada from Hong

Kong before the age seven. The participation of the three Hong Kong

Among the 10 participants of the group, there was only one speaker who came to Canada at age 15. The range of the AOA for the other nine participants was between 10-13 years. Thus, it is justified to label them as TCRC. born speakers may be questioned. Their inclusion, however, can be justified by the fact that their performance on the tonal production and their identification scores on the perception test were the same as those of the Canadian born Cantonese participants. Their ages ranged from 18 - 24 years (M = 19.60 years) at the time of the recordings. The AOAs of the three Hong Kong born speakers were 6,

6.5, 6.5 years, respectively, The mean length of residence (LOR) was

20.29 years for the seven Canadian born speakers, and 1 1.17 years for the other three speakers. As reported in their questionnaires, these speakers had learnt Cantonese at typical Chinese schools in

Vancouver,' for between 6 months and II years, cxcept for one speaker, who reported that his Cantonese was learnt only from his parents. As expected, Cantonese was their primary language used for communication with their parents, while English generally was the primary language used to communicate with their siblings and friends, as well as in their daily activities -- reading, speaking and watching TV. This indicates that they preferred to use English rather than Cantonese in t hese activities,

- -- - Typical Chinese schools in Vancouver, refers to institutes where children learn Chinese grammar, vocabulary, and pronunciation etc. Usually. these institutes provide part time courses to the communities, such as weekend classes, or evening classes, from 1.5 10 3 hours- The third group was a cornparison group consisting of Native

Hong Kong Cantonese speakers (NCAN), who were al1 born and raised in Hong Kong, and had üved in Canada less than two years at the time of the recordings. Their ages ranged from 18 to 22 years,

(M = 19.5 years). Their average AOA was 18.8 years. Their average

LOR was 9.6 months (less than a year). Cantonese is their primary and preferred language for communication. The implication of these percentages reveals that they have a stronger preference to use Cantonese than English, the percentages for their use of English were relatively low.

2.1.2. Materials

Each participant was required to fil in a "Language

Background Questionnaire" (see Appendix A), before completing the experiment, in order to provide general information about AOA,

LOR, and the language backgrounds of the speakers in the three groups. This study used two Cantonese root-words /si/ and /fui, for both the production study and perception test. Each root-word syllable can independently carry the six contrastive tones with unique lexical meaning (represented by different Ch inese characters). The target words for this study are listed below.

Isil lful

Tone 1: hiss/ (a] "lion* /fu"/ (ft] "man" Tone 2: /si2/ [ '+istoryn /fu2'/ (1"tiger" Tone 3: /si3/ [a] "attempt" /lu3'/ (#] "pants" Tone 4: /si2'/ [el "time" /fu2'/ ($f] "symbol" Tone 5: /si23/ ($1 "city" fi2'/ [&] "woman" Tone 6: /siz2/ [k] "trained /fuZ2/ [BI "tofu" person"

These 12 Cantonese words were carefully selected to ensure each word was associated with an unambiguous semantic meaning. In the perception test, semantic differences were very important to listeners, especially to the YCRC listeners, in associating the perceived stimulus to the given answer choices.

The 12 target words embedded in 12 groups of reading

materialss were given to the speakers in 2 sets of Chinese reading

lists (/si/ & /fun. One set was composed of the 6 groups of reading

materials associated with the root-word /si/, and the other one was

made up of the other 6 groups of the reading materials associated

with the root-word Ifut. Twenty different versions of reading lists,

For the format of the reading matcrials, sce the ncxt paragraph.

5 6 per root-word set, were prepared for the speakers to choose from during the recording session. Altogether, 40 different versions of the reading lists were made. Each list contained six groups of materials, printed in random order.

The format of the reading materials was somewhat different from traditional approaches. In previous studies, the acoustic correlates of tones were commonly analyzed from target words either in citation form (Kao, 1971, Bauer & Benedict, 1997) or in sentential form9 (Lee, 1983; Zee, 1995). For this study, the format of the reading materials contained three forms of the target word: 2- repeated citation forms, 3-repeated phrasal forms, in which the target words were the second syllables, and one sentential form, in which the target word was in the middle position of a fixed sentential frame. At the end of each reading material, there was a given picture cue corresponding to the given phrase. For example, if the target word was "lion" [a] , then the reading material1° would be written as follows.

in thesc studies the target words were placed in a sentence frame. 'O The corresponding English translation shown here is for illustration only. Thcy were not given io ihe speakers on the readiag lists. lion, lion, tb dance-lion dance-lion dance-lion 9 This - is - dance-lion - picture. fb "Tbis ~icture is about lion-dance."

The symbol, b. signai4 the end of the cepetition. The order of the citation form, the phrasal form, and the sentential fom were randomly arranged to counterbalancel1 any potentiai practice effect from the order of the reading foms (see Appendices B and C for the examples of the root-words). The inclusion of the phrasa1 form and the picture cue were mainly to facilitate accurate pronunciation of the target words; thecefore, they were not subjected to any statistical analysis in this study. There were two reasons for the inclusion of these two foms in the reading materials. One reason was that some of the YCRC speakers could not recognize all of the

Chinese words. Therefore, when they saw the picture, they knew how to "reaâ" the materiais. Another reason was that the phrasal

Tbt word "counterbalance" here refen to the process of altering or varying ihe order of events so bat the effects obsewed in an expriment are not due to the qualities of one particulsr order (LaJs, 1996: 554). a form wûs employed to guide speakers in pronouncing the target words accurately. This is essential to the speakers, since some of the target characters could carry more than one tone and be associated with different meanings. For example, the target word [SI , [si33], means "to try or to attempt". When it is used for the meaning

"exam", it is pronounced as [siz3]. Thus, with the phrasal forms provided, the participants would know exactly whic h target tone should be associated with the Chinese characters given.

2.1.3. Description of the Experiment

The experiment consisted of two sessions: a recording and a listening session.12 Half of the participants (five in each speaker group) were randomly assigned to proceed with the recording session first and later with the listening session. For the other fifteen participants, the listening tests were conducted f irst.

Before the recording sessions, speakers were given time to practice the pronunciation of the target words. Speakers' recording s were made when they felt cornfortable enough to do the task. This practice was particularly important to the YCRC speakers, who were

me üsieniog session will be dcJcribcd in Cbspta Tbne.

59 unable to read Chinese; thus they needed the picture cues in order to pronounce the target words. In cases when they were uncertain about the pronunciation of a word, the author of this study clarified the pronunciation for them.

Each recording session was conducted in a sound-treated room in the Phonetics Laboratory at Simon Fraser University. The speakers were instructed to read the lists into a unidirectional microphone (GeneXXA 33.984 DCA), which was placed about 3 inches from their lips. The speech samples were recorded on a JVC

Double Cassette deck (TD W709).

As mentioned above, forty versions of reading lists for the two root-word sets (/si/ and /fut) had been prepared before the actual experiment. From each root-word, each speaker was instructed to select at random eight versions of the reading list to read, from the

20 prepared versions. In total, the speakers read 16 selected reading lists, containing the two root-words (8 versions of reading lists x 2 root-words). In other words, each speaker read 96 reading materialsL3(8 lists x 6 tones x 2 root-words). Thus, aitogether 2880 reading materials from the three speaker groups were recorded in

l3 For the content of the reading materials, see Section 2.1.2.

60 this experiment (96 reading materials x 30 speakers). The signals were digitized with SounâEdit software (Version 2) at a sampling rate of 22 Khz, with a resolution of 16 bits. The signals were analyzed with Signalyze software (Version 3.12).

The method for obtaining speakers' mean data for the six lexical tones14 was that five tokens for every tone, in each form

(citation or sentential),15 were randornly selected from the 2880 recorded reading materials. Accordingly, 3600 tokens were chosen for the analysis (1800 tokens per reading form (citation or sentential) = 5 tokens x 6 tones x 2 root-words x 30 speakers). It should be mentioned that some of the YCRC speakers occasionally mispronounced the target words by employing different tones.

These mispronounced digitized signals produced by the YCRC speakers were judged by the experimenter (the author), a native

Hong Kong Cantonese speaker. The signals were examined with the

Signalyze software and the exbacted Fo values of the signals were further compared with those of the correct signals produced by the same speakers. The personal mean data for the tones of the YCRC

l4 The obraining of personal means for the tones will be discussed in Section 2.1-4. [5 The phrasal form was not subjected to any analysis in this snidy (see Section 2.1.2.). speakers were averaged on the basis of 3 or 4 tokens, which were correctly pronounced. Because of the occasional mispronunciation by the YCRC speakers, the number of tokens subjected to the statistical analysis was 3547. Although the means of the tones were obtained according to the tokens with correct tonal pronunciation, this should not affect the statistical results much, because al1 analyses compared the mean data of the speakers rather than the raw data of their individual tokens. The statistical analyses in this study were undertaken with the StatView program (Version 5.0).

2.1.4. Analyses for Two Acoustic Correlates of Tones

Before commencing with the description of the measurements, certain issues need to be clarified. These are (i) the domain for the analysis of the acoustic correlates of tones, (ii) the use of two different forms of the signals for analyses; and (iii) the method to obtain individual mean data for the two acoustic correlates.

The vocalic portion of the syllable (i.e., the vowel) was considered to be the domain of the two acoustic correlates, the duration and the fundamental frequency (Fo). This may suggest a discrepancy from the view that considers the Tone Bearing Unit (TBU) in Chinese, as the relevant domain here. In Chapter One, the whole Chinese syllable is described as a TBU, but the measurements for tonal durations mentioned in this chapter are based on the

vowels of syllables. Although tone may be best conceptualized as a property of the syllables, it is realized on syilabic segments (i.e.,

vowels) as a Fo contour (Lehiste, 1970: 84; McQueen and Cutler;

1997: 578). Furthermore, according to Chao (1968: 25), "the

(Chinese) tone of every stressed CV structure spreads over the

voiced part of the syllable". Since Fo extractions are sampled dong

the vocalic portion of a syllabte, it is justified to obtain tonal

durations of the target words by measurîng the vocalic portion of

the syllable. Consequently, in this study, only the vocalic segment of

a syllable was measured for tonal durations, as in previous studies

(for examples, Kao, 197 1; Kong, 1987; and Lee, 1993).

Two analytic forms -- sentential and citation -- were employed

to analyze the two acoustic correhtes. The sentential forms (Le.,

each target word was embedded in the same carrier sentence, see

Section 2.1.2) were submitted to durational analysis, because the

vowel durations of the citation forms varied. Therefore, vowel

durations were measured in sentential forms. The citation foms were used for the Fo analyses so as to avoid the possible effects of tone sandhi, tone change, and tonal coarticulation.16

In order to obtain persanal mean durations and Fo values, individual mean data were averaged according to the tone type of the target words. For example, the mean duration of a speaker's tom 1 was based on 10 tokens (five /sis5/ and five /fus'/) of his or her tone 1. A similar method was employed to obtain the means of the Fo data. One may question the comparability of these two high vowels /il and lu/. It is well known that different vowels are associated with different intrinsic du rations and fundamental frequencies. However, the two high vowels, /il and lu/, do not show great differences in the measurements of duration and Fo. Evidence from previous studies of English and Cantonese show that the means of vowel durations and Fo values of the two high vowels do not differ substantially. For instance, Peterson and Lehiste ( 1960) embedded target words into a sentence frame, and found that the mean vowel durations of /il (240 ms) and lu1 (260 ms) for five native English speakers were close to each other. Similarly, Munro ' s study (1993) also reported that the mean vowel durations of /if

L6 see Chapter One, Section 1.2.3.

64 (234.5 ms) and lu/ (232.2 ms) for 23 native American English speakers were similar. Concerning the intrinsic Fo values of the two vowels, Lehiste and Peterson (1961) observed that the mean Fo values of /il (183 Hz) and lu/ (182 Hz) for five speakers were compatible.

Studies of Cantonese present findingst7 resembling those found in English (see above). Kao (1971) found that /il and lu/, from citation forms of two native speakers, were 254 ms and 275 ms respectively. Although the data suggested that lu/ is longer than

/i/, this was simply due to the fact that the two target words were associated with different tones. In her study, the duration of lu/ was based on tone 2 in /fu2'/, whereas the one for the /il was based on tone 6 in /si22/. It is well known (see chapter One) that tone 2

(high rising tone) in Cantonese is the longest one (Kong, 1987) and tone 6 (low level tone) is as short as the shortest tone 4 (falling tone) (Eitel, 1947: Fok-Chen, 1974: 18). So (1999a) also found that the durations of /i/ and lu/ from citation forms of the two rising

l7The durations of the two high vowels are longer in Cantonese than in English, according to Kao (1971) and So (1999a). This may be the result of using citation forms in Cantonese, whereas in the studies on English, Lehiste and Peterson (1961) and Munro (1993), contextual forms (target vowel embedded in a carrier sentence) were used for the me asurements. tones of six native Cantonese speakers were compatible. The mean durations for the vowels, /il and lu/, in tone 2 were 304.71 ms vs.

317.73 ms. A similar pattern was also obtained in tone 5 as well; they were 302.71 ms vs. 320.04 ms.

With respect to the intrinsic Fo values of the two vowels, Lee

(1993) found that the mean Fo values for the two vowels of a male speaker were very close to each other. She found that the mean Fo values of /il and lu/ were 126.3 Hz and 125.4 Hz respectively.

Thus, ongoing research on these issues clearly shows that the vowels

/il and lu/ are comparable in terms of their intrinsic duration and fundamental frequency.

2.1.5. Measurements

2.1.5.1. Vowel Duration Measurement

Duration measurements were made according to standard segmentation practices (Peterson & Lehiste, 1960; Pollock, Brammer, and Hageman, 1993; Kehoe, Gammon, & Buder, 1995). Judgments of consonant and vowel boundaries (i.e., the onset and the offset of the vowel) were based on visual examination of the Fust and last identifiable periodic cycle in the time waveform and inspection of the waveform combined with the spectrogram.

2.1.5.2. Fundamental Frequency (Fo) Measurement

For the Fo measurement, individual Fo contours of the target words were obtained by employing the technique of Pitch Extraction in the Signalyze program, in which Fo values were extracted every 5 ms along the vocalic portion of the syllable. The vocalic duration of each Fo contour was further normalized in order to avoid infra- and interspeaker durational variations. The normalization procedure of the speakers' vocalic du rations was adopted from the one applied in

Abramson's Thai tone study (1976), in which Fo values were sampled at five locations, O%, 256, 50%, 75%, and LOO%, along the vocalic portions of syllables (see also So, 1998, 1999a, 1999b, where the same method was used).

2.2. VOWEL DURATIONS

2.2.1. Measurement Results

Table 2-1 shows the means and standard deviations of vowel durations associated with the six contrastive tones in the three speaker groups: the TCRC, the YCRC, and the NCAN groups. The mean durations of the six tones by the three speaker groups are presented in Figure 2-1.

Table 2-1. The means (in ms) and standard deviations of vowel durations of the lexical tones produced by the three speaker groups.

As seen in Figure 2-la below, the mean vowel durations of the six tones for the NCAN group are consistently shorter than those of the two CRC groups. In examining the mean durations of the six lexical tones (in Table 2-la), it appears that the long durational patterns of the two CRC can be attributed to the female speakers rather than the male speakers in the groups. Among the three female groups, the overall mean durations of the TCRC (175.60 ms) and the YCRC (170.97 ms) groups were longer than that of the

NCAN (140.87 ms) group. This pattern (TCRC and YCRC > NCAN) was consistently observed across the six tones (see Table 2-lb).

Figure 2-la. The mean vowel durations of the three speaker groups.

180 T iNCAN iTCRC @iYCRC

Tone 1 Tone2 Tom 3 Tone 4 Tone 5 Tone 6 Lexical Tones Figure 2-lb. The mean vowel durations of the three female speaker groups

200 T NCAN iTCRC 0 YCRC

Tone 1 Tone 2 Tone 3 Tone 4 Tone 5 Tone6 Lexical Tones

Figure 2-lc. The mean vowel durations of the three male speaker groups

Tone 1 Tone 2 Tone 3 Tone4 Tone 5 Tone 6 Lexical Tones In contrast to the pattern for the female groups, the mean vowel durations for the male speakers among the three groups may be considered to have the opposite durational pattern (NCAN >

TCRC and YCRC). The overall mean durations for the TCRC and the

YCRC male speakers (136.58 ms and 136.79 ms, respectively) were slightly shorter than that of the NCAN male speakers (146.59 ms).

This durational pattern among the three male groups was consistently observed for each lexical tone (see Table 2-lc).

To test for effects of SPEAKERGROUPS, SEX, and TONES, on the mean vowel durations, the measured values were submitted to a three-way mixed-design ANOVA, with SPEAKER GROUPS (3 levels:

TCRC, YCRC, and NCAN) and SEX (2 levels: femaie and male speakers) as between-subjects factors and TONES (6 levels: tone 1, 2,

3, 4, 5, and 6) as a within-subjects factor. Results (see Table 2-2 below) showed that only the effect of TONES was found to be significant, F (5, 120) = 11.773, p < 0+01. This indicated that significant differences in mean vowel durations among the six tones were found. The effects of SPEAKER GROUPS, SEX, and ail the interactions among the factors were non-significant, (ps > 0.05). Table 2.2. Results of the ANOVA for vowel duration.

SPEAKER GROüPS (2, 24) 0.322 0.728 SFX (1. 24) 2.990 0.097 ,TONES (5, 120) 11.773 < 0.0001 S

SPEAKER GROUPS x SEX (2. 24) 1.176 0.326 TONES x SPEAKER GROUPS (10. 120) 0.708 0.716

S indicates the effect of the factor was significant.

2.2.2. Discussion

2.2.2.1. Durational Patterns of the CRC Groups

The results did not support the first hypothesis, according to

which mean vowel durations of the YCRC group were expected to be

the longest and the ones of the TCRC to be the shortest. Differences

in mean vowel durations among the three speaker groups were non-

significant. In addition, although the effects of SEX, and SEX x

SPEAKER GROUPS were non-significant, it should be mentioned that

female speakers in the CRC groups tended to produce relatively

longer durations (approximately 20 - 50 ms) than the male speakers

in the same groups.

The durational patterns observed in the three female groups

(i.e., TCRC and YCRC > NCAN) was similar to the patterns for the three YCRC boys reported in So's study (1998). In her study, So

(ibid.) found that the mean duration of the YCRC was longer than those of the NCAN and the TCRC adult males. So (ibid.) explained that this phenomenon might be attributed to the interference of

English durational patterns.

This appears to be a plausible explanation because the research on English suggests that vowels in English tend to be longer than those in Chinese languages. Pervious studies (Peterson and

Lehiste, 1960; Munro, 1993; Wang, 1997; among others) have demonstrated that the duration of the English vowels /il and lu/ may have mean durations between 240 - 270 ms. On the other hand, Chinese vowel durations tend to be shorter (Wang, 1997 for

Mandarin; Fok-Chen, 1974; Lee, 1983; Lee, 1993 for Cantonese).

Wang (1997) found that the mean durations of /il (224 ms) and

/u/ (241 ms) were shorter for 15 Mandarin speakers than the durations of the corresponding English vowels (270 and 261 ms respectively) produced by 15 native English speakers. Similariy, studies in Cantonese also suggest such durational differences exist between Cantonese and English. Fok-Chen (1974: 4) explicitly mentioned that Cantonese /il is not as 'tense and long' as the English vowel /il, and it should be "somewhere between the English long /il and short /Ir. Lee (1983) found that the mean durations of

/il and /u/ for three native Hong Kong Cantonese speakers were only 123 ms and 199 ms respectively. When examining the mean durations of the tones for the NCAN speakers in Table 2-1, it can be seen that the longest mean durations among the six lexical tones by the female speakers (i.e., 162 ms for tone 5) and the male speakers

(160.33 ms for tone 2) are still considerably shorter than the respective English vowel durations (240 - 270 ms).

Thus, on the basis of the facts presented above it may be assumed that longer durations for the female speakers in the CRC groups may be the result of English influence. This may lead to the question that why should interfere occur for these female speakers but not the male speakers in the same groups? According to their self-reported questionnaires, the CRC male speakers have similar background to the CRC female speakers in terms of AOA and preferences in language choices. No explanation is available for this at this time. Further research is needed to provide a clearer understanding of the phenornenon of sex-related differences. 2.2.2.2. Durational Patterns Among Lexical Tones i n

Speaker Groups

To facilitate the discussion of the relative durational patterns among the six lexical tones in the three speaker groups, the mean durations in Table 2-1 were ranked in sequence (from long to short duration) and are presented in Table 2-3.

Table 2-3. The ranked mean durations of the lexical tones for the speaker groups (in ms).

NCAN-fernales: tone S > tone 2 > tone 3 > lone 1 > tone 4 tone 6 161.998 > 159507 > 136.817 > 135.376 > 126.449 > 125.087 males: tone 2 > tone 5 > tone 3 > tone 1 tone 4 > tone 6 160.327 > 152.938 > 146.186 > 142.730 > 139.614 > 137.774

TCRC-fernales: tane 2 z tone 5 z tone 6 w tone 1 z tone 3 z tone 4 186.200 > 185.483 > 179.582 > 174.143 > 172.887 > 155.320 malos: tone 2 > tone 5 tone 1 z tone 3 > tone 6 z tone 4 15L.718 > 149.645 > 137.333 > 136.470 > 127,864 > 116.450

YCRC-fernales: tone 2 > tone 5 > tone 3 tone 1 > tone 6 > tone 4 180.615 > 180.375 > 174.263 > 169.383 > 165.782 > 155.391 males: tone 2 > tone 3 > tone 5 > tone 1 > tone 4 > tone 6 145.108 > 140541 > 138.405 > 135.030 > 134.160 > 127518

As seen in Table 2-3, the relative durational patterns for the speakers in the NCAN group in the present study did not fuily support the claims of Kong (1987) and Zee (1995).18 In general, the rising tones (tone 2 and tone 5) appear to be long and the faUing tone (tone 4) tends to be short. However, it is found that the

l8 Thek claims have been mentioned at the begi~ingof this chapter.

7 5 Cantonese high rising (tone 2) in this study was not always the longest, since the low rising (tone 5) does occur as the longest (see the NCAN female speakers). The low faUing (tone 4) was not always the shortest as expected on the basis of previous Cantonese studies

(Kong, 1987; Zee, 1995); in fact, tone 6 occurs as the shortest (e.g., in the NCAN speakers). In agreement with the pattern for the level tones described by Kong (1987), the mean duration of the mid level

(tone 3) produced by the NCAN group was the longest among the level tones. The observed patterns indeed supports the claim of Fok-

Chen (1974), which stated that no consistent sequence of durational patterns, to determine which tone is the longest and which tone is

the shortest, were observed among the tones. To test the effect of

TONES on the mean vowel durations of the NCAN speaker group, a

one-way ANOVA was camed out. Results indicated that effect of

TONES was significant, F (5, 45) = 6.036, p < 0.01. Post hoc Tukey

tests revealed that each rising tone (tone 2 and tone 5) was

significantly different from tone 4 and tone 6 individually (ps <

0.05). The results suggest that the rising tones (tone 2 and tone 5)

were longer than the low faliing (tone 4) and the low Ievel (tone 6). Accordingly, it is me that the mean durations of the six tones of the NCAN group were not strictly confined to a consistent ranked sequcnce (see Table 2-3). However, if the relative durations of tones are compared in tone pairs, the relative durational patterns among the six tones are still observable. The general durational pattern of the lexical tones can be described in the following sequences: the durations of tone 2 & tone 5 are the longest; the durations of tone 4

& tone 6 are the shortest;19 and those of tone 1 & tone 3 are moderate, also see Figure 2-la for the graphic representation).

Comparing the mean durations of the tones produced by the

TCRC and YCRC groups with those produced by the NCAN group, it was found that CRC speakers in general follow the relative durational patterns when producing the tones with relatively few exceptions: (i) the male speakers in the YCRC group did not show that their rising tones (tone 2 and tone 5) were the longest; (ii) the female speakers in the TCRC group did not produce tone 4 and tone

6 as the shortest; and (iii) the female speakers in the TCRC group and the male speakers in the YCRC group did not have a moderate

l9 This confirms the observations by EiteI (1947: cited in Fok-Chen. 1974: 18). duration for their mean durations of tone 1 and tone 3,Thus, with respect to these results, the second hypothesis stated at the beginning of the chapter is confirmed by the durational patterns for the three groups. This implies that tonal durational patterns produced by the two CRC speaker groups, are similar to the NCAN speaker group, and thus confirrn to the expected durational patterns for the six lexical tones.

On the basis of the above discussion on vowel durations, although the statistical analysis implies that the differences in mean vowel durations among the three speaker groups were non- significant, a consistent tendency has ken observed in the CRC fernale speakers. They appear to produce relatively longer durational patterns than the male speakers in the same groups, whose mean durations in general were comparable to the ones of the NCAN speakers. The durations associated with the six lexical tones indicate that the TCRC and the YCRC speakers indeed follow the durational patterns of the lexical tones (i.e., the duration of tone

2 & tone 5 are the longest; of tone 4 & tone 6 are the shortest; of tone 1 & tone 3 are moderate). In concluding the discussion on tonal duration, it may be stated that the results of the present study support the claim by

Fok-Chen (1974), who does not consider duration to be an important parameter in production for differentiating the

Cantonese tone S. S imilar claims have been sug gested for Mandarin

(Gandour, 1994: 3t20) and Taiwanese Mandarin (Tseng, 1990: 27).

Therefore, due to the fact that duration is not an important parameter in distinguishing lexical tones in Chinese, it should be considered as a concomitant of tones.

2.3. FUNDAMENTAL FREQüENCY (Fo)

The mean Fo values (plain text in the table) of the six contrastive tones, which were taken at the 5 percentage points, for the speaker groups (NCAN, TCRC, and YCRC) are summarized in

Table 2-4 (see below). The mean Fo values of the female and male speakers in the groups are presented in the left and right columns, respectively. Each column shows the mean Fo values for the NCAN, the TCRC and YCRC groups. The mean Fo values reveal a consistent pattern in terms of the Fo values for the NCAN group (both for female and male speakers). They are the highest while those for the YCRC are the lowest. The Fo values for the TCRC are in between the two groups.

As mentioned earlier (both at the beginning of this chapter, as well as in the Chapter One), using Fo values (in Hz) to depict tonal patterns may sufficientiy describe the contour shape of the tone, but they fail to show the connection between the tone and Chao's tone letter values,20 which are employed in the majority of the literature to describe tones (Kao, 197 1; Hashimoto, 1972; Fo k-C hen,

1974; Bauer & Benedict, 1997; Bauer, 1999). This drawback can be overcome if tonal patterns are presented using Chao's tone letter values. According to Chao (1947: 25), the five reference points

(scale 1 to 5) for the tone letter systern can be described as anaiogous to five musicai notes, C, D, E, 'F, and 'G. with "one whole tone" (i.e., 2 semitones21) as an interval between any two musical notes. Accordingly, tone letter values can be computed by transforming Fo values into (musical) semitones; then the obtained semitones are further converted into Chao* tone letter values with a

2-semitone interval.

20 For details about Chao's tom letters, see Chapter One. Section 1.1.4. 21 A semitone is "the interval between any two successive [musical] notes of the chromatic scale" (Kerman, 1980: 548). It is a logarithmic scaling of one frequency relative to anotber (Lass, 1996 : 564) reference frequency point. Table 2-4. Mean Fo values at the five percentage points and their corresponding tone letter values

Notes: mean Fo values are in plain text, and the corresponding tone letter values are in bold. The symbol "T' means tone. Thus, in order to have a better understanding of the observed

differences among the three groups in this present study, the mean

Fo values of the three speaker groups presented in Table 2-4 above

were transformed into Chao's tone letters (1930). The present study

attempted to adopt the two formulas employed in the Taiwan

Mandarin study done by Fon & Chiang (1999:17).

Formulas Employed in Fon & Chiang's Taiwanese Mandarin Study:

Formula 1: N (Semitone) = 39.86 X (log Foi - log Fo~~~)~~ 1 Formula 2: Tone Letter Scde = N/2 +l l

In Formula 1, Foi stands for the Fo value to be transformed,

and Forni, refers to the minimum Fo value, as a reference point. In

this study, the Fominis the end point of tone 4 for the speaker

groups. The letter N stands for the obtained semitones from the

formula.

In Formula 2, there are two constant values. The first one, "Y,

refers to the interval of the tone letters, which are 2 semitones

22 This is a typicai formula to transform Fo values into musical semitones, Also see Ross, et, al, 1992; Nooteboom, 1997: 645. apart. The other constant "1" is necessarily added to the formula, because the lowest tone letter is "l", not "0".

Using 2 semitones as an interval may be appropriate for

Taiwan Mandarin, but it is not suitable for Cantonese in the present study. With the interval as 2 semitones, the computed tone letter value is greater than 5. For example, with Formuia 2, the starting point of tone L [55] of the NCAN female speakers (272.26 Hz) would be converted to the tone tetter 5.98. Obviously, Chao's tone letter system, being a 5 point scale, is unable to accommodate this value.

Consequently, in the present study the interval value was modified to 2.5 semitones in order to obtain the optimal tone letter values describing the tonal system of the speaker groups.

Formulas Enrployed in the Present Study :

Formula 1: N (Semitone) = 39.86 X (log Fo; - log Fo,;,) Formula 2: Tone Letter Scale = Nl2.5 + 1

In the example given above, the computed tone letter now will be

4.99. This can be considered an optimal value, because it is close to the tone letter "5". One may question the appropriateness of changing the intervd value. However, the interval value is "not strictly fixed, but varies freely" (Pike 1943: 28) depending upon "sex, individual, and mood" (Chao, 1947: 25; Fok-Chen, 1984: 226;

Ching, Williams & Van Hasselt, 1994: 557).

2.3.1. Tonal Patterns for the Speaker Groups

The converted tone letter values for the Fo means of the female and the male speakers in the three groups are listed in Table

2-4 (in bold fonts; see above). The tonal patterns are presented in

Figures 2-2 and 2-3 for the female and the male speakers in the

NCAN group; Figures 2-4 and 2-5, and Figures 2-6 and 2-7 for the female and male speakers in the TCRC and YCRC groups, respectively.

Figure 2-2. The tonal patterns for NCAN fernale speakers

0% 25% 50% 75% 1000/o Nomalized Vocalic Duration (in %) Figure 2-3. The tonal patterns for NCAN male speakers

Nonnalized Vocalic Duration (in %)

As seen in Figures 2-2 and 2-3 above, both the female and male speakers in the NCAN group show typical tonal patterns according to Chao's tone letter values, The computed tone letters at the five percentage points (O%, 25%, 508, 75%, and 100%) for tone

I correspond to "5" on the scale of tone letters. The ones for tone 2 and tone 5 also confom to their description in the Iiterature, because the dip points of these two rising tones are close to the scale

"2" and their end points reach "5" and "3" on the scales respectively. For the two level tones, tone 3 and tone 6, the light failing pattern appears to be more prorninent, but their end points are stiii close to their target, "3" and "2" on the scale. Finaily, the faliing tone, tone 4, falls from "3" to "1". However, according to

Chan (1987: 7), tones are understood as production from a starting pitch targeting another pitch (where the end points are the most essential), and the light falling pattern is acceptable in level tones

(Abramson, 1978:324, 1997:3, Tang and Maidment, 1997:9). Bauer

& Benedict (1997: 114) also claimed that the slightly falling pattern of tone "does not seem to be significant for the perceptions and productions" of native speakers. Thus, the tonal patterns in Figures

2-2 and 2-3 should be considered as typical patterns representing the NCAN speakers* lexical tones.

Figures 2-4 and 2-5 below show that the TCRC speakers* tonal patterns were by and large similar to those of the NCAN speakers, but some differences can be attested. These are (i) the contours for tone 1 are lower than those produced by the NCAN group, and (ii) the end points of the tone 2 contours are aiso lower than those produced by the NCAN speakers. The target tone letters here are best described using the scale "4" rather than the "5". In addition, the portion (from 25% to 50%) of the contours of tone 2 and tone 5 of the TCRC female speakers are unusually close to each other. Figure 2-4. The tonal patterns for TCRC female speakers

Normalized Vocaiic Duration (in %)

Figure 2-5. The tonal patterns for TCRC male speakers

0% 25% 50% 75% 100% Normalized Vocaiic Duration (in %) Exarnining the mean Fo values in Table 2-4 above reveals that the dip point of tone 5 for female speakers in the TCRC group is at

50%, rather than 25% of the normalized vocalic duration, as is the case for the NCAN speakers. This suggests that the female speakers in the TCRC group actually delay the timing of the rise from the turning point of the contour. This phenomenon was not found in the TCRC males' contour.

FinaUy, the tonal patterns of the YCRC speakers, as seen in

Figure 2-6 and Figure 2-7, demonstrate that the tone letter values of tone 1 and tone 2 in both female and male speakers are far below the values described in the Iiterature. Their tone 1, in fact, can be described as tone letters [44] instead of [SS], while tone 2 (the high rising tone) is almost indistinguishable from tone 5 (the low rising tone). The relative spatial relationship between the two rising tones appears to be narrower than the relative spatial relationship for the

TCRC and the NCAN groups. In the production of female speakers in the YCRC group, tone 5 has a delayed timing pattern from the turning point of their contour. Finally, there is a decrease in the tonal space between the level tomes, tone 1 and tone 6 for the YCRC female speakers. Figure 2-6. The tonal patterns for YCRC female speakers

0% 25% 50% 75% Normalized Vocalic Duration (in %)

Figure 2-7. The tonal patterns for YCRC male speakers

Normalized Vocalic Duration (in %) In summing up the above observations with regard to the

TCRC and YCRC groups, it may be stated that because the obtained tone letter values of certain tones (e.g., tone 1, tone 2, and etc.) were far below the values described or employed in the literature

(see above), it is reasonable to consider that they deviate from the typical spatial relationship of native Cantonese tones. These noticeable "tonal deviations" could be interpreted as different degrees of reduction (or, decrease) of the tonal space, when compared to tones of the same types in the tonal system. The reduction patterns were particularly evident in connection with tone

1, tone 2, and tone 5; they were not as evident in connection with tone 3 (the mid level tone), tone 6 (the low level tone) and tone 4

(the falling tone). Moreover, the reduction appears to be greater for contour tones than for the level tones. For example, while the Fo pattern (shape) for tone 1 is still being maintained as a level tone, those for the two rising tones (tone 2 and tone 5) of the two CRC groups were considerably reduced. The domain of reduction can be seen in terms of (i) the contour shape, and (ii) the spatial relationship between tones of the same types, such as the two rising tones and the three level tones (see Figures 2-4 -- 2-7). Finally, the deviations observed for the rising tones are more evident in connection with the two groups of CRC female speakers, than the male speakers.

However, one may question wherher using 2.5 semitones as an intervd was suitable for the YCRC group: could it be possible that the 2.5 semitones interval may be applied to the NCAN and the

TCRC groups but not to the YCRC group? To answer this question, the data from Table 2-4 were submitted to a series of statisticd analyses to see if the tonal systems of the two CRC groups differed from that of the NCAN group.

2.3.2. Tonal Deviation

In order to find out whether the tonal systems of the TCRC and the YCRC groups were different €rom that of the NCAN group, the present analysis proceeded by attempting to answer four questions:

(1) Considering the tonal system, do tonal spaces differ

between the two CRC groups and that of the NCAN group? If

they do, which group's tonal system will be affected the most? (2) For the three level tones (tone 1, tone 3, and tone 6), are

there any differences in the spatial relationships among the

three level tones between the TCRC and the YCRC groups when

they are compared to those of the NCAN group?

(3) For the other three contour tones (tone 2, tone 4, and tone

S), are there any differences in the tonal space (Le., Fo

interval) of the individual contour tonesZ3between the TCRC

and the YCRC groups in comparison with the NCAN group?

(4) In the previous sections it has been observed that the

rising tones (tone 2 and tone 5) of the two groups of CRC

speakers show that the two Lexical tones appeared to show the

greatest degree of reduction in terms of the contour shapes,

and spatial relationship of these two contour tones. The

question to be asked here is, to what extent did the rising

tones of the TCRC and the YCRC groups differ from these of

the NCAN group?

23 1 use the term "tonal space" here for describing the interval of Fo contours (from the maximum to the minimum points). 2.3.2.1. Analysis of Tonal Deviation

The first part of the analysis was designed to compare the tonal space of the tonal system of the three speaker groups. The tonal space was examined in terms of Fo ratio2'j (Fo,,, / Fomin)in the tonal system of the individual speakers (hereafter, the ratio will be referred to as T-space ratio). The Fo,,, was the starting point of tone 1 and the Fominwas the end point of tone 4 for each speaker.

The second part of the analysis focused on the spatial relations among the three level tones with regard to the speaker groups. The spatial relations were compared in terms of the Fo ratios obtained from the grand Fo means (average of the five percentage points -- O%, 25%, 50%, 75%, and 100%) of the three level tones (hereafter, it will be referred to as L-spatial ratios). In this part of the analysis, the individual speaker's Fo grand mean for tone 1 was compared with his/ her own Fo grand mean for tone 3

(i.e., tone 1: tone 3). The same L-spatial ratio comparisons were made for the pairs, tone 3: tone 6, and tone 1: tone 6.

24 See discussion in Section 2.3.2.2.

9 3 The third part of the analysis compared the tonal space of the individual contour tones among the three speaker groups. Again, cornparisons were made based on the Fo ratios (Fo,,, / Fhin)of the contour tones, (hereafter, the Fo ratios for the tonal space of the contour tones will be referred to as C-space ratios).

Finally, the fourth part of the analysis examined the two rising tones with regard to contour shape and spatial relationship, The objective here was to find out the extent to which the two rising tones of the two CRC groups differed from those of the NCAN group.

The investigations were accomplished by comparing percentage change (A%) in Fo values at different durational range- section^,"^ along the contours, among the three speaker groups.

2.3.2.2. Unit of Measurements

Two kinds of units of measurement were employed during the course of the analysis, (i) Fo ratios and (ii) percentage change at different ranges along the contours. The former was the measurement unit for the first three parts of the analysis, while the

Latter was used for the fourth part.

" An explmation of tbis tecm is given in Section 2.3.2.2. 2.3.2.2.1. Fo Ratios

Employing Fo ratios for comparing the tonal space among speaker groups is relatively re~ent.~~Traditionally, it was common to describe the tonal space in terms of Fo ranges (i.e., the difference between the highest and lowest Fo values of two points), see, for example, Jin's study, 1996: 95. Several Cantonese studies

(Hashimoto, 1972; Fok-Chen, 1974; Vance, 1977) used the Fo range to describe the tonal space of individual speakers of the same sex or of a given set of synthetic stimuli. However, data of Fo ranges for both sexes could be probtematic, because of the issue of comparability. Female speakers generally have a higher pitch and a larger Fo range than male speakers do." On the other hand, working with Fo ratios can minimize the intra- and inter-speaker variations in Fo measurements, and it proves to be a more appropriate method than using the Fo range for comparing data of female and male speakers within the same study.

26 SO, 1999b. see the example provided below. 27 The cause for the gender differences may be attributed to the fact that there are physical differences between males and females, such as females tend to have a smaller vocal tract and smaller vocal folds; whereas males have a larger vocal tract with a longer pharynx and larger vocal folds (Nolan, 1997: 750). So (1999b) for example, found that there was a significant difference in Fo ranges (Fo,,, - Fomin)for each rising tone contour between the two gender groups (5 females and 5 males) of native

Hong Kong Cantonese speakers. The Fo ranges of the two rising tones between female and male speakers were 77 Hz vs. 44.74 Hz for tone 2, and 25.66 Hz vs. 18.38 Hz for tone 5. However, if the comparisons were made by using speakers' Fo ratios (Le.. Fo ,, / Fo mi,) for each rising tone contour, no significant difference in Fo ratios between the two gender (or sex) groups was found. The Fo ratios between female and male speakers were 1.426 vs. 1.440 for tone 2, and 1.142 vs. 1.185 for tone 5.

Another example is the data for tone 228 hmthe two NCAN speakers (a female and a male) in this study. Fo,,,,,, Fomin,Fo range, and Fo ratio, for the female and the male speakers are shown below.

Fo- Fo* Fo range Fo ratio female (NCAN-F129) 256.60 Hz 190.80 Hz 65.80 Hz 1.345 male (NCAN-M430) 123.50 Hz 91.80 Hz 31.70 Hz 1.345

28 This mean data was the mean of 10 tokens (five 1s i2*/ and five If uz51) from each speaker. 29 NCAN-F1 means the first femaie speaker of the NCAN group. 30 NCAN-M4 means the fourth mate speaker of the NCAN group. As seen by exarnining the measurement data above, the mean

Fo range for the female speaker (65.80 Hz) is more than twice the mean Po range of the male speaker (31.70 Hz), but the Fo ratios for both speakers are the same (1.345). Thus, since comparisons based on the Fo ratios do not show sex-related differences, Fa ratios, rather than Fo ranges were employed in the first three parts of the present analysis.

2.3.2.2.2. Percentage Change in Fo

Percentage change (A%) in Fo values at different durational range-sections along the contour was the second type of measurement unit, used in the fourth part of this analysis.

Durational range-sections in this analysis were derived from the five percentage points (O%, 25%, 508, 75%, and 100%) along each individual rising contour. They were labeled R1 (0-25%), R2 (25-

50%). R3 (50-75%), and R4 (75-100%). In each durational range- section, the percentage changes (A%) in Fo values, for two successive percentage points of the rising tones, were obtained and compared among the speaker groups, With A% in Fo values at the four durational range-sections, a rough approximation of the contour shape can be depicted, because the A% in Fo values at each range-section indicates both (i) the direction, i.e., positive or negative, representing rising or falling direction, and (ii) the degree of increment or decrease between two successive points along the Fo contour. The greater the amount of the A% in Fo values, the more rapid rise or fall from one point to another point is assumed.

Figures 2-8a and 2-8b above illustrate the relationship between the percentage points and the durational range-sections along the normalized vocalic duration of a rising tone (tone 2). The data at the five percentage points are hypothetical Fo values, imitating the typical tone 2 pattern of a native speaker of

Cantonese, because those values are easier to calculate. Thus, on the basis of that contour, in RI (from the O to 25%), the percentage change of the Fo contour will be obtained by the formula, (95-1001

100)*100, and the percentage change is -5%. This indicates that the

Fo contour in the R1 range-section is in a falling direction. When the percentage changes are positive values, the Fo contours in these range-sections are rising. In this figure, the percentage changes in R2, R3, and R4 range-sections show successive increases of 9.4796,

15.38%, and 16.6796, respectively.

Figure 2-8a. Durational range-sections and percentage change

1M 1 eRising Tone 140

a0 I 4 0'16-2596- 50%- 7546-10046 RI R2 R3 R4 4 durational ramge-seclions Normalized Vocalic Duration

Figure 2-8b. Corresponding durational range-sections and percentage change in bar chart.

iRising Tone 20

1s

10

5

O

-5

-10 RI R2 R3 R4 4 durational raage.scctiwS Normalized Vocalic Duration Therefore, the pattern illustrated here is a typical pattern of tone 2

(the high rising tone) produced by native Cantonese speakers.31For tone 5 (the low rising tone), the pattern will be more or less the same as the one for tone 2, except that it will have smailer scaies in

A% in Fo values at dl range-sections.

2.3.3. Results

2.3.3.1. Tonal Space

The rnean T-space ratios for the speakers in the three groups are listed in Table 2-5 below. The mean ratios presented in this table show a cleac tendency for the group to conform to a certain ranking order for the mean T-space ratios (from largest to smallest), Le.,

NCAN (1.800) > TCRC (1.709) > YCRC (1.523). In other words, the tonal space of the NCAN group was greater than that of the TCRC group, which was greater than that of the YCUC group. This pattern appears to be the same for both the femaie and male speakers.

'l See the pattern in a bar chan representation for the NCAN speakers in Figure 2-9. Table 2-5. The T-space ratios of the thtee speaker groups

TCRC 1.682 0.171 1.735 0.248 0.202

In order to investigate whether the two factors, SPEAKER

GROUPS and SEX, had any effect on the mean T-space ratios, the

mean T-space ratios of the individual speakers were submitted to a

two-way mixed-design analysis of variance (ANOVA), with the

SPEAKERGROUPS (3 levels) and SEX (2 levels) as between-subjects

factors. The effect of SPEAKER GROUPS on mean T-space ratios was

found to be significant, F (2, 24) = 6.694, p < 0.01, while the effects

of SEX and interaction of SPEAKER GROUPS x SEX were not, F (1, 2) =

0.934, ps < 0.05, and F (2, 24) = 0.061, p > 0.05, respectively. Post

hoc Tukey tests for the SPEAKER GROUPS revealed that the

difference in the mean T-space ratios between the NCAN and the

YCRC groups was significant (p < 0.05). The Tukey tests also

indicated that the mean tonal space ratio of the TCRC group (1.709) was not significantly different from that of the YCRC group (1.523), or that of the NCAN group (1.800). This indicates that the ratio difference between the TCRC and NCAN did not reach statistical significance.

2.3.3.2. Spatial Relationships for Level To nes

Table 2-6 below shows the mean L-spatial ratios (tone 1: tone

3; tone 3: tone 6; and tone 1: tone 6) of the speaker groups. The ranked orders of the three mean L-spatial ratios among the speaker groups (from iarge to small ratios) again appear to have a pattern:

NCAN > TCRC > YCRC. This is observed in al1 three pairs of level

tone 1: tone 3: NCAN (1.235) > TCRC (1,203) > YCRC (1.171) tone 3: tone 6: NCAN (1.132) > TCRC (1.103) > YCRC (1 .O96) tone 1: tone 6: NCAN (1.397) > TCRC (1.327) > YCRC (1.B3)

In general, the pattern above applies to both female and male speakers from the three groups, with two exceptions in the L-spatial ratio (tone 3: tone 6): (i) for female groups, TCRC (1.120) > NCAN

(1.1 17) > YCRC (1.071), and (ii) for male groups, NCAN (1,147) >

YCRC (1 -096) > TCRC (1 -085). Table 2-6. The mean L-spatial ratios of the level tones of the three speaker groups

NCAN F 1.245 1.117 1.390 0.048 0.025 0.048 M M 1.225 1.147 1.405

Again, a three-way ANOVA was performed to investigate the relative spatial patterns of the level tones, with SPEAKERGROWS (3 levels) and SEX (2 levels) as between-subjects factors and PAIRS OF

LEVEL TONES (3 levels: tone 1: tone 3, tone 3: tone 6, and tone 1: tone 6) as a within-subjects factor. Results (see Table 2-7 below) indicated that only the effects of SPEAKERGROUPS and PAIRS OF

LEVELTONES (but not SEX) on the L-spatial ratios were significant, F

(2, 24) = 4. i 43, p < 0.03, and F (2, 48) = 139.440, p < 0.000 1, respectively. The effects of interactions among the factors were not significant @ s > 0.05) (see Table 2-7)+ Table 2-7. Results of the ANOVA for L-spatial ratio.

, SPEAKER GROUPS x SEX (2. 24) 0.69 1 0.5 11 LEVFL TONES x SPEAKER GROUPS (4, 48) 1.493 0.219 EVEL TONES x SM (2, 48) 0.60 1 0.552

L I LEWL TONES x SPEAKER GROUPS x SX [ (4, 48) 1 0.759 1 0.557

S indicates the cffect of tbe factor was significant.

Because this study is only interested in the differences among the speaker groups on the mean L-spatial ratios, post hoc Tukey tests were only used to examine the factor, SPEAKER GROUPS.

Results revealed that only the pair, NCAN vs. YCRC, was found to be significantly different in their mean L-spatial ratios (p < 0.05). In other words, the mean L-spatial ratios of the YCRC group were considered to be significantly smaller than those of the NCAN group.

On the other hand, the differences in the three mean L-spatial ratios between the NCAN and the TCRC groups were non-significant (p >

O.OS), although the mean L-spatial ratios of the TCRC groups appear to be smaller than those of the NCAN group (see Table 2-6).

To further investigate the mean L-spatial ratio differences between the NCAN and the YCRC groups, individual t-tests were used to compare their L-spatial ratios for tone 1: tone 3, tone 3: tone

6, and tone 1: tone 6. Results showed that there were significant differences between the NCAN and the YCRC groups in their mean L- spatial ratios for tone I: tone 3 (1.235 vs. 1.17 1) and tone 1: tone 6

(1.397 vs. 1.283), ts (18) = 2.444 and 3.490, ps < 0.03, but not for tone 3: tone 6 (1.132 vs. 1.096), t (18) = 1.710, p > 0.05. These results suggest that the YCRC speakers produce smaller L-spatial ratios in the two pairs, tone 1: tone 3 and tone 1: tone 6, than the

NCAN speakers.

2.3.3.3. Fo Intervals for Contour Tones

The mean C-space ratios (tonal space ratios) of the three contour tones, tone 2, tone 4, and tone 5, for the speaker groups are tabulated in Table 2-8 below. As seen in the table, the ranked orders of the mean C-space ratios for the three speaker groups in the contour tones (from large to small ratios) al1 confomed to a pattern, NCAN > TCRC > YCRC.

Tone 2: NCAN (1.437) > TCRC (1.311) > YCRC (1.181) Tone 4: NCAN (1.304) > TCRC (1.251) > YCRC (1.203) Tone 5: NCAN (1.163) > TCRC (1.120) > YCRC (1.096) Table 2-8. The mean C-space ratios of the contour tones for the three speaker groups

NCAN F M Sd M M

A the-way ANOVA was performed to examine the mean C-space ratios for the contour tones, with SPEAKERGROUPS (3 levels) and

SEX (2 levels) as between-subjects factors and CONTOUR TONES (3 levels = tone 2, tone 4, and tone 5) as a within-subjects factor.

Results (see Table 2-9) indicated that the effects of SPEAKER

GROUPS and CONTOUR TONES on the C-space ratios were significant

F (2, 24) = 34.147, p < 0.0001; and F (2, 48) = 73.617, p < 0.0001.

The interaction of SPEAKER GROUPS x CONTOUR TONES was also found to be significant F (2, 54) = 7.1 1, p < 0.001. Table 2-9. Results of the ANOVA for C-space ratio.

SPEAKER GROUPS (2, 24) 34.147 c 0.0001 S SM (1, 24) 1.980 0.172 CONTOUR TONES (2, 48) 73.617 < 0.0001 S

SPEAKER GROUPS x SEX (2, 24) 0,108 0.898 CONTOUR TONES x SPEAKER (4, 48) 7.110 0.0001 S

CONTOUR TONES x S 1 (2, 48) 1 0.232 1 O .7 94

I1 1 I1 1 CONTOUR TONES x SPEAKER GROüPS x 1 (4, 48) 1 0.391 1 0.814

S indicates the effect of the factor was significant.

To explore the differences in the Fo intervals of the speaker groups in the three contour tones, three individual 1-way ANOVAs were carried out to investigate the effect of SPEAKERGROUPS on the mean C-space ratios in tone 2, tone 4, and tone 5, respectively. The three ANOVAs yielded significant effects of SPEAKER GROUPS on the mean C-space ratios in each contour tone (tone 2, tone 4, and tone

S), Fs (2, 27) = 26.906, 6.422, and 16.149, p s < 0.01, respectively.

Post hoc Tukey tests were also conducted for each contour tone, The post hoc Tukey tests for tone 2 showed that the mean C- space ratios differed significantly among the speaker groups (NCAN vs. TCRC, NCAN vs, YCRC, and TCRC vs. YCRC), p s r: 0.05. In other words, the C-space ratios (i.e., the Fo interval) for tone 2 in the two

CRC groups were smaller in various degrees thiin that for the NCAN group. The resulcs conformed to the pattern mentioned at the beginning of this section: NCAN (1.437) > TCRC (1.311) > YCRC

(1.18 1). This implies that the two CRC groups had different degrees of reduction in the Fo interval for tone 2.

The post hoc Tukey tests for tone 4 revealed that only the rnean C-space ratios between the NCAN and the YCRC groups were significantly different Ip < 0.05), but not the pairs, NCAN vs. TCRC and TCRC vs. YCRC (ps > 0.05). In other words, dthough the mean

C-space ratio of the TCRC group was observed to be smaller than that of the NCAN group, the differences did not reach significance,

while the mean C-space ratio differences between the YCRC and the

NCAN were significant (ps < 0.05). Again, rhis implies that there is a

difference in the degree of reduction of tone 4 between the TCRC

and the YCRC groups. This difference may be expressed in the

following way: NCAN (1.304) = TCRC (1,251) > YCRC (1.203)?'

Finally, the post hoc Tukey tests for tone 5 indicated that the

mean C-space ratio differences in the pairs, NCAN vs. TCRC, and

- - 32 The symbol = here means "simitar to" or "approximate to".

108 NCAN vs. YCRC were found to be significant @s < 0.05), and the ratio difference between the TCRC and YCRC was not statistically significant (p > 0.05). Accordingly, this implies chat the mean C- space ratios (or the Fo interval) of tone 5 contours for the two CRC groups were reduced in more or less the same pattern, when compared with the NCAN group. This pattern may be expressed in the following way: NCAN (1.163) > TCRC (1.120) = YCRC (1.096).

2.3.3.4. Rising To nes

Table 2-10 shows the means and standard deviations of percentage change (A%) in Fo values at the four durational range- sections of the two rising tones (tone 2 and tone 5) for the speaker groups. To facilitate the cornparisons among the groups, the mean

A% in Fo values €rom Table 2-10 are presented in Figures 2-9 & 2-10 for tone 2 and tone 5 (see below). Table 2-10. Means and standard deviations of percentage change at the four durational range-sections of tone 2 and tone 5 of the speaker groups.

NCAN F Mean Sd M Mean Sd Gp Mean

Note: the minus sign " - " here, signals the falling trend of the contour, and the rest of the data show a in a rising trend. In examining the data in Table 2-10, the following observations can be made:

(1) A definite pattern may be observed with regard to mean

A% in Fo values at the four durational range-sections for both

female and male speakers in al1 three groups: NCAN > TCRC >

YCRC (also see Figures 2-9 & 2-10 for the patterns of the group

differences).

(2) Female speakers showed greater mean A% in Fo values

than male speakers in the R1 range-section, but the male

speakers tended to have a higher mean A% in Fo values than

the females did in the R2 range-section.

(3) Tone 2 had a greater mean A% in Fo values than tone 5 in

al1 four durational range-sections (also see 2-9 & 2-10).

(4) The data showed that there are various degrees of mean

A% in Fo values in different durational range-sections. For

examples, negative values were always found in the R1 range-

section. and the maximum mean A% in Fo values were mainly

found in the R3 and R4 range-sections. Figure 2-9. The four durationai range-sections of tone 2 for the three speaker groups.

NCAN m TCRC YCRC

Durational Range-sections

Figure 2-10. The four durational range-sections of tone 5 for the three speaker groups.

NCAN TCRC O YCRC 22 T Table 2-1 1. Results of the ANOVA for rising tones.

SPEAKER CROUPS (2, 24) 19.537 c 0.0001 S SM (1, 24) 5.532 0.027 S RlSING TONES (1, 24) 270.115 <0.0001 S 1 DüRAnONAL RANGESECTIONS (3, 72) 1 358.993

SPEAKER GROUPS x SEX (2, 24) 0.535 0.593 1lüSlNC TONES x SPEAKER GROUPS 1 (2, 24) 1 18.9921 <0.000l]S RIS~GTOW x six (i, 24) 0.029 0.867 DURATIONAL RANGESECTIONS x (6, 72) 9.957 c 0.0001 S SPEAKER CROUPS DURATIONAL RANGESECTIONS x (3, 72) 10.571 c 0.0001 S

1 RISING TONES K DURATTONAI, ( (3, 72) 1 40.546 1 c 0.0001 1 S RANGE-SECIIONS 1 1

1 SPEAKER GROüPS x SM 1 1 1 1 RlSING TONES x SPEAKER GROUPS x SM. (2, 24) 0.066 0.937 RISMG TONES K DURATIONAL (6, 72) 3.137 0.009 S RANGE43ECTIONs x SPEAKER CROUPS RISING TONES x DüRATIONAL RANGE- (3, 72) 1.568 0.205

RISING TONES x DURATIONAL RANGE- (6, 72) 2,135 0.060 SECîiONS x SPEAKER GROUPS x SM

S indicaies the effect of the factor was significant.

In order to determine the effects of the four factors, SPEAKER

GROCTPS, SEX,RISING TONES, and DURATIONAL RANGE-SECTIONS,

on the mean A% in Fo values, the data was submitted to a four-way

mixed-design ANOVA, with SPEAKER GROUPS (3 levels) and SEX (2 levels) as between-subjects factors and RISING TONES (2 levels: tone

2 and tone 5) and DURATIONAL RANGE-SECTIONS (4 levels: R1, R2,

R3, and R4) as within-subjects factors. Results of the ANOVA (see

Table 2-11 above) revealed significant effects of SPEAKERGROUPS,

SEX, RISING TONES, and DURATIONAL RANGE-SECTrONS, on the

mean A% in Fo values, ps c 0.03. The four individual two-way

interactions: (i) RISING TONES x SPEAKER GROUPS; (ü) DURATION AL

RANGE-SECTIONS x SPEAKER GROUPS; (Üi) DURATIONAL RANGE-

SECTIONS x SEX and (iv) RiSING TONES x DURATIONAL RANGE-

SECTIONS were found to be significant, ps < 0.0001. The three way

interaction of RiSING TONES x DURATIONAL RANGESECTIONS x

SPEAKER groups was aiso observed to be significant, F (1, 3, 72) =

3.137, p < 0.01.

Due to the complexity of the design, the present analysis only

focuses on the effect of RISING TONES x DURATIONAL RANGE-

SECTIONS x SPEAKER GROUPS. The results will provide a better

picture with regard to the two aspects of the rising tones mentioned

in the Section 2.3.2: (i) the contour shapes, and (ii) the spatial

relationship in relation to timing of the two rising tones. Coarse contour shapes of the two rising tones of the speaker groups could be approximated through both the direction and the degree of A% in Fo values at the four durational range-sections. The mean A% in Fo values of speaker groups between successive reference-percentage points (i.e., at each durational range-section) along each rising contour were compared through mean A% in Fo values for the speaker groups. Relative spatial relationships of the two rising tones were also compared through the speaker groups' mean A% in Fo values at each durational range-section. The results indicate the spatial relations in relation to timing between the two rising tones of the speaker groups.

2.3.3.4.1. Contour Shapes (in A%)

To investigate the differences in contour shapes of the rising tones among the three speaker groups, two individual two-way mixed-design ANOVAs were carried out with SPEAKER GROUPS as a between-su bjects factor and DüRATIONAL RANGE-SECTIONS as a within-subjects factors. The Fust ANOVA was for tone 2 and the second one was for tone 5, and they were employed to examine the effect of the above two factors on the mean A% in Fo values. Results of the ANVOA for tone 2 revealed that there were significant effects of SPEAKER GROUPS and DURATIONAL RANGE-

SECTIONS on mean A% in Fo values, F (2, 27) = 21.175, p < 0.0001; F

(3, 81) = 294.623, p < 0.0001, respectively. The interaction of

SPEAKER GROUPS x DURATIONAL RANGE-SECTIONSwas also found to be significant, F (6, 81) = 7.312, p c 0.0001. Post hoc Tukey tests for SPEAKER GROUPS further revealed that the differences in mean

A% in FQ values were significant between the speaker groups in al1 pairs: NCAN vs. TCRC; NCAN vs. YCRC; and TCRC vs. YCRC (ps <

0.05). Then, to further explore the performance of speaker groups at each durational range-section, individual ANOVAs were used.

Results revealed the effect of SPEAKER GROUPS on the means A% was significant in R2, R3, and R4 range-sections, Fs (2, 27) = 12.933,

17.794, and 14.139, p s < 0.0001, Post hoc Tukey tests indicated that the three groups were significantly different in three durational range-sections. The results of the Tukey tests are summarized in

Table 2-12. Table 2-12, Summary of contour shapes for tone 2 for the three speaker groups

NCAN vs. TCRC ------NCAN vs. YCRC --- p < O.OS** P c O.OS** p < O.OS** TCRC vs. YCRC --- p < 0.05 p < O.OS** p < O.OS**

** indicates p < 0.01

The results for tone 2 in Table 2-12 show an unambiguous pattern: consistent significant differences in mean A% in Fo values in the two pairs, NCAN vs. YCRC, and TCRC vs. YCRC were found in the three range-sections (R2, R3, and R4), except at the falling portion of the contour (i.e., R1 range-section). The differences in mean A% in Fo values between the NCAN and the TCRC groups in these durational range-sections were non-significant. This implies that the

speakers in the YCRC group do not have the same contour shape for

tone 2 as the speakers in the NCAN group, but the speakers in the

TCRC group do. Results of the ANVOA for tone 5 also indicated significant effects of SPEAKER GROUPS and DURATIONAL RANGE-SECTIONS on mean A% in Fo values, F (2, 27) = 6.552, p < 0.01; F (3, 81) =

139.729, p < 0.0001, respectively. The effect of the interaction of

SPEAKER GROUPS x DURATIONAL RANGE-SECTIONS was also found to be significant, F (6, 81) = 4.075, p < 0.02. Post hoc Tukey tests for

SPEAKER GROUPS further revealed that the differences in mean A% in Fo values were significant in the group pairs: NCAN vs. TCRC and

NCAN vs. YCRC (ps < 0.05); but not TCRC vs. YCRC (p > 0.05).

Individual ANOVAs, for examining the differences in the mean A% in

Fo values for the three speaker groups in each durational range- section, revealed the effect of SPEAKER GROUPS on the means were significant in R3, and R4 range-sections, F (2, 27) = 4.724, p < 0.02 and, F (2, 27) = 15.156, ps < 0.0001. Post hoc Tukey tests showed that the three groups were significantly different in two durational range-sections. The results of the Tukey tests are summarized in

Table 2-13 below. The table shows an unambiguous pattern -- consistent significant differences in mean A% in Fo values in the two group pairs, NCAN vs. TCRC, and NCAN vs. YCRC, were found in the

R4 durational range-section. Significant differences in A% in Fo values between TCRC vs. YCRC in the R3 durational range-section was also observed. The resutts impLy that the speakers in the TCRC and the YCRC groups did not have the same contour shape for tone

5 as the speakers in the NCAN group. The results imply that the speakers in the TCRC and the YCRC groups did not have the same contour shape for tone 5 as the speakers in the NCAN group.

Table 2-13. Summary of contour shapes for tone 5 for the three speaker groups

** indicates p < 0.01

2.3.3.4.2. Spatial Relationshi p

The spatial relationship of the rising tones was examined by comparing percentage changes between tone 2 and tone 5 in the four durational range-sections for the speaker groups. A series of paired t-tests were used. The t-tests compared the mean A% in Fo values at the four range-sections of tone 2 with their counterparts of tone 5 from the same speaker group. The results are presented in

Table 2-14 below:

Table 2-14. Summary of the spatial relationship for tone 2 and tone 5 for the three speaker groups

NCAN ..- p c 0.05** p < 0.05** p < O.OS**

TCRC -.. p c O.OS** p c O.OS** p c 0.05**

YCRC --- -mm p < 0.05 p c O.OS**

As may be seen in Table 2-14, the results reveal that both the speakers of the NCAN and the TCRC groups have the same consistent patterns: at RI, the difference in meanA% in Fo values between tone 2 and tom 5 in both NCAN and TCRC were found to be non-significant (ps > 0.05); however, as for the other three range- sections (R2, R3, and R4), the differences between the two rising tones were found to be significant (ps < 0.05). This implies that the relative spatial relationships between the two rising tones of the

TCRC group were more or less the same as those of the NCAN group.

Contrary to the patterns shown by the NCAN and the TCRC groups, the difference in mean A% in Fo values between the two rising tones for the speakers in the YCRC group was found to be significant (p < 0.05) only for the R3 and the R4 range-sections.

Thus, the speakers in the YCRC group failed to maintain the spatial relationship between the two rising tones the way speakers in the

NCAN did.

2.3.4. Discussion

2.3.4.1. Reduction Patterns in the Two CRC Groups

The findings of the above Fo analysis of the tonal systems of the speaker groups in terrns of (i) tonal space, (ii) spatial relationships of the level tones, (iii) Fo interval of the contour tones, and (iv) contour shape and spatial relationship of the two rising tones, revealed that the tonal patterns associated with the two CRC groups did show various deviations from those of the NCAN group.

To facilitate the following discussion, the results of this Fo analysis in the previous section were summarized and are presented in

Table-2- 15 below.

Table 2-15. Summary of the results of the statistical analysis of fundamental frequenc y.

S means significant difference from the NCAN group in the investigated areas. * means significant difference from the TCRC group in the investigated areas.

Table 2-15 shows those investigated areas between speaker groups that were significant. Two tendencies may be observed: (i) the YCRC speakers show greater tonal deviations from the NCAN group in the above statistical analyses than the TCRC speakers; and

(ii) hierarchical reduction patterns among the lexical tones may be

identified. In examining T-space ratios arnong the speaker groups, the ratio of the YCRC group was the only one found to be significantly different from the ratio of the NCAN group. This suggests that the

YCRC speakers have a smaller tonal space in their tonal system than the native speakers. This may be interpreted further as a reduction

(compression) of tonal space. Thus, the tonal space of the YCRC speakers is reduced while that of the TCRC speakers is not.

With regard to the cornparison of Fo ratios of the three tevel tones, there are significant differences only between the NCAN and the YCRC groups. Among the three L-spatial ratios (tone 1: tone 3, tone 3: tone 6, and tone 1: tone 6), only the fiist and the last pairs showed differentiation between these two groups. This implies that the grand mean of tone 1 (in Fo values) of the YCRC group is crucial in determining whether the differences in the L-spatial ratios between the NCAN and the YCRC groups were significant or not.

When the grand mean Fo value of tone 1 was small, it caused the L- spatial ratios with the other two tones (i.e., tone 1: tone 3, and tone

1: tone 6) to become smaller. This resulted in their values being significantly smaller than the corresponding values for the NCAN group. Thus, the overall results imply two things. Fust, tone 1 for the YCRC speakers was reduced. If that is the case, then the observed reduction of tonal space in the YCRC group may be attributed to their reduction of tone 1. Second, reduction has more influence on tone l(high level tone) than the other two (non-high) level tones (tone 3 and tone 6). These findings indeed matched the tendency presented in Figures 2-6 and 2-7 (Section 2.3), in which tone 1 for speakers of the YCRC group were observed to be far below the tone letter value [55] described in the literature?'

With respect to the contour tones, the results of the comparisons also matched the observation reported in Section 2.3, in which, the rising tones appeared to be reduced. However, the reduction in the C-space ratios seemed to have more impact on the two rising tones (tone 2 and tone 5) than the falling tone (tone 4).

Ohala (1978: 30-31) listed some of the characteristic differences between rising and falling tones that have been obser~ed:~such as

(i) falling tones outnumber the rising tones in languages with tonal systems, and (ii) falling contours can be produced faster than rising ones within a given Fo 1interval. With regard to the characteristic

33 (Kao, 1971; Hashimoto, 1972; Fok-Chea, 1974, 1979, 1984, Yip, 1980/1990; Bauer & Benedict, 1997; among O thers). 34 However, according to Ohala (1978: 31) the causes for the differences between rising and falling tones were unclear. differences between falling and rising tones, Hyman's study (1973) is more relevant to the present study. He found that, in cases of assimilation, the pitch (Fo) interval of a rising tone is more likely to be reduced than that of the falling tone. Further, Hombert's experimental study (1976) also reported that his subjects had more difficulty in producing rising tones than falling tones. Cheng (1977:

116) suggests that falling tones involve "least physiological effort in production". Thus, the findings of the present study were in agreement with the observations made by Hyman (1973) and the findings by Hombert (1976) in as much as they provide evidence for nsing tones being more reduced than falling tones in the tonal production by speakers of the two CRC groups. Evidence can be seen in the results of the two groups of CRC speakers in this study. They have reduced tonal space (Fo intervals) for the two rising tones

(tone 2 and tone 5). However, it should be mentioned that the Fo interval (tonal space) of tone 2 produced by the TCRC speakers was also significantly different from the YCRC group. For the tone 4, only the YCRC group (but not the TCRC group) showed a reduced tonal space. By comparing the mean percentage change (A%) in Fo values in the four durational range-sections, the speaker groups' contour shapes for the two rising tones can be approximated. It was found that patterns observed in the speaker groups with regard to mean

A% in Fo values for tone 2 and for tone 5 were quite consistent. For tone 2, the mean A% of the YCRC group showed the differentiation from the NCAN and From the TCRC groups in R2, R3 and R4 range- sections, whereas the mean A% in Fo values of the TCRC groups showed no significant difference from the NCAN group. The results indicate that the R2 range-section appears to be a crucial range- section, starting from which the NCAN and the TCRC speakers show the differentiation fiom the YCRC group. This implies that the speakers in the YCRC group do not have the same contour shape for tone 2 as the speakers in the NCAN group, but the speakers in the

TCRC group do.

For tone 5, the mean A% in Fo values in the R4 range-section for both the CRC groups was different from the NCAN group. In addition, there was a significant difference in A% in Fo values between TCRC vs. YCRC in the R3 durational range-section. However, the result may relate to the high A% in Fo values for the male speakers in the TCRC group (7.7108, in Table 2-10). The raw data relating to the TCRC male speakers were thus re-examined: it was found that the tendency of higher A% in Fo values was consistent arnong the male speakers in the TCRC and YCRC groups. No explanation is available at this time. The overall results imply that the speakers in the TCRC and the YCRC groups did not have the same contour shape for tone 5 as the speakers in the NCAN group-

These results from investigating the rising tone contours reveaied that the speakers in the TCRC group only failed to maintain the contour shape for tone 5, whereas the speakers in the

YCRC group failed to maintain the contours of the two rising tones the way speakers in the NCAN group did. In addition, the fact that the TCRC speakers maintained the contour pattern of tone 2 suggests that tone 5 is more vulnerable to reduction than tone 2.

With respect to a typical spatial relationship between tone 2 and tone 5, it was observed that differences in term of A% between the two rising tones started to show from the R2 range-section throughout the R4 range-section. This pattern was found to be the same for both the NCAN and the TCRC groups. Contrary to the relationship patterns of the two speaker groups, the speakers in the

YCRC group did not show this pattern. The rising tone contours for the YCRC speakers proved to be significant in the R3 & R4 range- sections.

To summarize the discussion above on the reduced tonal patterns of the CRC groups, the findings of this study (set Table 2-

15) revealed two facts. First, the YCRC speakers deviate more from the tonal patterns of the NCAN speakers than the TCRC speakers.

For the TCRC speakers, only the rising tones appear to deviate from the ones of the NCAN speakers, whereas the YCRC speakers show deviations in all areas of comparisons, with the exception of the L-

Spatial Ratios for tone 3: tone 6.

Second, as may be seen in Table 2-15, the results (those areas found to be significantly different from the NCAN group) suggest that the reduction patterns appear to be hierarchical: contour tones

(tone 2, tone 4, and tone 5) rather than the level tones (tone 1, tone

3, and tone 6) undergo greater degrees of reduction, for example, both the TCRC and YCRC groups showed reductions in rising tones.

For the contour tones, (tone 2, 4, and 3,the rising tones (tone 2 and tone 5) are more easily reduced than the falling tone (tone 4), and tone 5 is more easily reduced tone 2. For the level tones (tone

1, 3, and 6), the high level (tone 1) is more ükely to be reduced than

the rnid or low level tones (tone 3 and tone 6). These observed

hierarchical orders of tonal reduction are presented in Table 2-16.

Table2-16. The hierarchical orders of the reduced tonal patterns of the CRC speakers

General Pattern contour tones > level tones tone 2, tone 4, & tone 5 > tone 1, tone 3, & tone 6 (A) Contour Tones rising tones > low falling tone tone 2 & tone 5 > tone 4 I (i) Rising Tones low rising tones high rising tone (B) Level Tones high level tone > mid & low (or non-high) level tones tone 1 > (tone 3 and tone 6 )35 j

2.3.4.2. Fo Ratios

The present study attempted to use Fo ratios to control for

sex-related differences within speaker groups. In order to venfy the

hypothesis (Section 2.3.2.2.1), each comparison of the present study

examined the factor "SEX" for the purpose of identifying differences

35 Since no further evidence was available in this study showing tbat one of the non-high level tones is reduced before another one, 1 did not separate the tones in order. based on gender. The results, as expected (Section 2.3.2.2.1), did not show the effect of SEX in the fist three parts of the Fo analysis.

This may provide further support for the suitability of using the Fo ratio in place of Fo range when the research involves both sexes.

2.4 CONCLUSION

The aim of the above production study was to explore the differences in the two acoustic correlates of tone (duration and Fo) in the production of the two groups of CRC speakers from the cornparison group, NCAN speakers.

With regard to the vowel duration, no significant difference was found among the speaker groups, although the female speakers in the two CRC groups have a tendency to produce a longer durational pattern. Investigations of the relative durational patterns

(see Section 2.2.2.2) among the six lexical tones revealed that the production of tones by the TCRC and the YCRC speakers conformed to the relative tonal patterns. This indicates that they had

"knowledge" of the relative durational patterns when producing tones. Concerning the Fo cue, this study observed that some tonal (or

Fo) patterns of the TCRC and the YCRC groups were reduced from those of the NCAN group (see Figures 2-4 -- 2-7 representing the two

CRC groups). The mean Fo data at the five percentage points were submitted to a series of statistical analyses in order to further confirm that the tonal reduction patterns observed in the TCRC and

YCRC groups were deviated from the NCAN group to various degrees

(see Table 2-15 for areas of reductions). The findings revealed that the YCRC group showed a more severe reduction than the TCRC group. Meanwhile, the results suggest the existence of some hierarchical orders in reduction patterns -- (i) contour tones (tone

2, 4, & 5) rather than level tones (tone 1, 3, & 6), (ii) the two rising tones (tone 2 & 5) rather than the falling tone (tone 4), (iii) the low rising (tone 5) rather than the high rising (tone 2), and (iv) the high level tone (tone 1) rather than mid and low level tones (tom 3 & 6) showed greater tendency towards reduction (see Tabte 2-16).

The next chapter presents a perception test with the same groups of participants. The percentage of correct responses given by the participants and the types of tonal confusions experienced by the participants will be compared and discussed. TONAL PERCEPTION EXPERMENT

It was shown in Chapter Two that there exist various degrees of reduction patterns in the production of tones by speakers of the two CRC groups. These reductions were more evident in connection with the YCRC group than the TCRC group. Furthermore, tonal reductions appear to have certain hierarchical orders (e.g., rising tones > level tones). However, it was still unclear whether these reduced tonal patterns, observed in the productions of the CRC speakers, reflect deviations with regard to the lexical tones in their mental representations. If this is the case, it is expected that a perception experiment would provide additional evidence confirming the assumption that the lexical tonal patterns of the CRC speakers are indeed "different" from those of the NCAN speakers1.

Thus, the purpose of the present perception study is to evaluate the tonal systems of the two CRC groups in order to shed some light on this issue.

In this perception study, the production of 48 target words, in citation foms, by four native Hong Kong Cantonese language instructors, were identified by the same groups of participants, who took part in the production study. Al1 the 48 stimuli had been examined and were selected in a pilot study before they were given to the participants. The stimuli were presented by multiple speakers in order to minimize the familiarly effect of a particular speaker's voice on listeners' responses (Abramson, 1976: 2). Fok's identification piiradigm (1974; Yiu & Fok 1995), was adapted2 as a method for listeners to assess the target words once they were given a stimulus.

Two aspects of the performances by the two CRC groups were compared with those of the NCAN group in this perception study.

First, listeners' percentage of correct identifications were exrmined, in order to determine which tones are the ones that showed deviations. Second, similarly to the previous studies (Fok, 1974;

Ching 1984; Yiu & Fok, 1995), this scudy also adopted the approach of employing confusion matrices for investigating the tones that were rnisidentified by the Iistener groups, as a study dimension.

Confusion matrices for individual groups were constructed to illustrate the results of the above cornparison. Confusion matrices

2 See Section 3-1.4 for details. would show the responses by the listeners in the groups when a target tone was presented.

Two hypotheses were tested in this perception study. The fust hypothesis predicted that identifications by the two CRC groups would differ from those of the NCAN group, and the correct identification scores for the TCRC group would be higher than those for the YCRC group. This hypothesis is based on the following assumption: if the CRC speakers have reduced tonal patterns in their mental representations, then their perception responses should reflect this deviation. In other words, the reduced patterns observed in the productions of the speakers in the two CRC groups should have corresponding patterns in their perception.

The second hypothesis predicted that hierarchical patterns in tones that were poorly identified would be found in the responses of the CRC listeners. That is, CRC listeners are expected to make more errors with certain tones. This hypothesis is based on the following assumption: if the tonal reduction patterns of the CRC speakers are confined to certain hierarchical orders observed in theu productions (mentioned in Chapter Two), then their perception responses should reflect the patterns. For example, contour tones for the CRC speakers were found to be more easily reduced than level tones in their productions; accordingly, it was expected that the identification scores for contour tones would be lower than those for the level tones in this perception test.

This chapter commences with a description of the methodology employed in the present study (Section 3.1). The results of the two study dimensions are given in Section 3.2. The findings of the perception study will be discussed in the concluding section (Section 3.3).

3.1. METHOD

3.1.1. Participants

Five Cantonese language (2 males and 3 females) instructors, from 36 to 43 years old, were invited to take part in the experiment:

4 of them participated as speakers and one as a listener in a pilot study (see below). They were al1 born and raised in Hong Kong as native Hong Kong Cantonese speakers, and have been teaching

Cantonese from 3 to 16 years. These language instructors were al1 teaching Cantonese in Canada at the time of recording. They all reported normal speech and hearing.

Six listeners were participated in a pilot study in order to evaluate the selected natwal stimuli for the perception test. These listeners were native Hong Kong Cantonese speakers, including one experienced Cantonese instructor, three native Hong Kong Cantonese speakers and two TCRC speakers. Their ages ranged €rom

18 to 42 years, and they had been living in Canada from 2 to 25 years.

For the perception test, the listening were the same 30 subjects who took part in the production experiment? According to the self-assessrnent of the subjects, they had no hearing problem.

3.1.2. Materials: Preparation of Natural Stimuli

The same 12 Chinese words from the root-words /si/ and /fu/ in the production study were employed in this perception study as target words, Al1 stimuli were selected from the natural stimuli produced by the four speakers (see above), and listeners were given six answer choices in visual form to choose €rom.

In preparing the natural stimuli, the four speakers were asked to produce 384 natural two-word phrase speech tokens. The 12 target words were deliberately put in the final words of the phrases, similarly to the ones used in the reading list (see Appendices B & C).

The reason for embedding the target words into phrases was to eliminate the production of an incorrect tone in the target words.

The speakers were instructed to read the phrases clearly and at a

See details in Chapter Two. relatively slow rate, so that the target signals could be segmented from the phrases and so that they would be suitable to be presented as stimuli in the perception test. Half of the recorded signals (192 in number) were randomly chosen, and were digitized into a

Mwintosh cornputer. Preliminary examinations of fundamental frequencies and tonal patterns were done using Signalyze software.

This screening process was performed in order to ensure that the chosen signals were produced (i) with reasonable Fo heights (level) in relation to the tonal space of the individual speakers, and (ii) with typical Fo contour patterns. It also ascertained that the chosen signals were not produced in an exaggerated way, such as being too long in duration and/ or too high in pitch 1evel.j Among the signals, only 96 tokens (48 tokens per root-word set) were selected for the pilot study.

3.1.3. The Pilot Study and the Selected Stimuli

The pilot study -- with six listeners as participants -- was designed to select 48 natural stimuli from the above 96 for the

The syllable and vowel durational ranges, and the Fo ranges for the naturd stimuli from the male and fernale speakers are Iisted bere: Syllable durations: 417.60 ms - 619.68 ms hwel durations: 118.10 - 227.07 ms Fo range: 83 - 126 Kz (male speakers), and 150-243 Hz (female speakers). perception test, Selections were made on the basis of one criterion -

- each natural stimulus had to be correctly identified at least 75% of the time by the six listeners, so that the NCAN listeners would not have too much trouble in recognizing the stimuli in the perception test.

The selected 48 randomized natural stimuli (24 per root-word set) were dubbed through a JVC Double Cassette deck (TD W709) from the computer ont0 a SONY normal tape with two versions: one was prepared with the /si/ root-word set being presented first and then the /fu/ root-word set, the other version was prepared with the

/fu/ root-word being presented first and then the /si/ set. As in the sub-test condition of previous studies (Fok-Chen, 1974; Wong,

1998), stimuli were presented with multiple-speakers, in order to prevent responses triggered by familiarity with a particulx speaker's voice (Abramson, 1976: 2).

Before each stimulus was presented, the listener would hear a stimulus number immediately followed by a 0.5 second pure tons indicating the beginning of the stimulus presentation. Next, the stimulus was presented twice, a sequence of successive tokens of the given stimulus separated by a 1.5 second intervd. This was

' The pure tone of 160 Hz with amplitude of 40 dB was created by the SoundEdit program.

138 followed by 5 seconds of silence, in order to give time to individual listeners to make a decision, after which a second pure tone indicated the end of testing that stimulus. The way that each stimulus was presented is illustrated below:

No. 1, 0.5 sec pure tone, A, 1.5 sec. silent time, A, 5 sec. silent decision time, 1 sec. pure tone again. No. 2, 0.5 sec pure tone, B, 1.5 sec. silent time, B, 5 sec. silent decision time, 1 sec. pure tone again. No. ", "

3.1.4. Stimuli Identification Task

The present study adapted Fok's identification paradigm

(1974, Yiu and Fok 1995) designed to access the target words for the listeners after they have perceived a given token. In assessing the subjects' perceptual ability, the pictures and their corresponding

Chinese characters were provided simultaneously to the listeners.

The advantage of this method is that it provides at Least two routes at the semantic Level to access the target words (Yiu & Fok, 1995:

84). One is a phoneme-ideograph route (auditory word - written word matching) and the other is a phoneme-semantic route

(auditory word - picture matching; see Appendix D). The modified version of the identification paradigm in the present study deliberately provided one more route at the semantic level for listeners to access the target words. This extra route provided the corresponding English glosses (auditory word - visual

English glosses matching) simultaneously with the other two routes -

- pictures and Chinese characters -- to the listeners. This was important for the YCRC listeners, because they were either unable to read Chinese words or cuuid only recognize a limited number of

Chinese characters. Therefore, Iisteners were given answer choices with target words presented simultaneously in three routes: (i) the traditional Chinese characters, (ii) the sernantic pictures, (iii) and the corresponding English glosses. Since the two root-words, /si/ and /fu/ were presented in one set at a time (with 24 randomized stimuli), listeners were given six target words each time as possible choices. Accordingly, two papers containing 6 picture-choices corresponding to al1 six possible target words were prepared for the listeners (see Appendix E & F) to identify.

3.1.5. Procedures

The present research involved two sessions, the recording and the listening sessions. The former was described in the production experiment (Chapter Two), and the latter will be reported in this chapter,

The listening session was also conducted on an individual basis in a sound-treated room in the Phonetics Laboratory at Simon Fraser

University. The participants were instructed to listen to the given stimuli of two root-word sets, /si/ and /fu/. The two sets of stimuli were played on a National stereo recorder (RX-FM23) through stereo headphones (a Panasonic and a Sony, MDR-CD170) to both the listener and the experimenter (the author). Listeners were given two papers, and each contained a set of six picture-choices corresponding to al1 six possible target words (Le., the identification paradigm; see Appendices E & F). They were asked to identify the target words by pointing to the appropriate picture, character, or

English gloss (see Appendices E & F) once the presentation of a stimulus had been completely presented. The experimenter recorded their responses on an answer sheet.

3.16. Analyses

In order to explore the response differences among the

Iistener groups s ystematically, statistical analyses were used focusing on two dimensions: (i) the correct percentages (8) on the identification task among the three groups, and (ii) confusion matrices for the individual groups.

First, comparisons among the correct % of the three listener groups were made, in order to establish (i) whether there were differences among the groups in the identification test, and (ii) if differences did exist, which tones were the ones that showed the differences (correct % scores) among the groups.

Second, confusion matrices, which have been used by Fok,

1974; Ching, 1984; Yiu & Fok, 1995 as a method to investigate the listeners' tonal confusions, were constructed to justify the comparison in correct % of identifications. The obtained patterns from the confusion matrices would reveal how the listener groups responded to the given target tones (i.e., words). Specifically, it would indicate which pairs of tones showed significant confusions in listener groups. For this study, the structure of each confusion matrix and the way, to determine whether or not the observed frequency for tonal misidentifications (Le., tonal confusions) were statisticaiiy significant, were foUowed the ways in the study of Yiu &

Fok (1995) (see below).

In each confusion matrix of this study, the correct numbers of responses with their converted percentages were indicated in the cells dong the diagonal (from the left upper corner to the right lower corner) of the matix. Al1 these cells are shaded and the numencal data are in bold face. For each confusion matrix, it was assumed that if the listeners' responses were entirely random, the obtained frequency in each ceil would be equal (ibid.: 84). On the other hand, individual cells (i.e., tonal misidentifications) in the confusion matrix would be considered to be statistically significant if each obtained frequency (for the incorrect responses) in the ce11 was larger than the expected frequency (ibid.). The expected frequency in each ce11 wûs calculated as the total number of possible responses divided by 36 cells (6 target tones X 6 response choices), and this corresponds to a 3 % (1136) chance level (ibid.).

Accordingly, for the present study, the expected frequency of each ce11 for the listener groups (Tables 3-2 -- 3-4) was 13.33 (4801 36 =

13.33). In other words, if any obtained frequency for a ce11 was greater than 13.33, confusion between the two tones (one is the target tone and the other is the responded tone) was considered to be statistically significant. The ce11 is highlighted by presenting the obtained frequency in bold fonts with an asterisk.

3.2. Results

The mean for the correct scores for the identification task of the three groups in each lexical tone were tabulated in Table 3-1 below, As seen for the table, no prominent difference in the correct percentages between the female and the male speakers from the same group can be observed. Therefore, the analyses presented below for this perception test only compared the correct percentages among the three groups without further subdividing the correct percentages according to their gender.

Table 3-1. The mean correct scores for the speaker groups in the perception test.

NCAN-F 38 40 3 2 3 9 3 7 3 1 217 1 240 M 37 3 8 3 O 3 8 36 24 203 / 240 ' Ep . 7s ' :'*y&. :. % 62 , 73 SS :5420 & 080 TCRC-F 3 7 36 2 8 3 8 2 6 25 190 / 240 M 38 34 34 3 7 3 2 22 197 / 240 P 75 70 62 i TS SU 4'7 3871480 YCRC-F 29 2 8 2 6 2 5 13 18 139 / 240 M 30 25 2 7 2 9 19 16 146 1 240 f GP' S9 53 r $3 '1 54. 3Z 3 4 285 f 48Q Notes: the number indicated in the individual ce11 corresponding to the sex groups (F = female and M = male) was out of 40 stimuli, and those corresponding to the listener groups was out of 80 stimuli.

The overall mean for the correct percentages, in relation to the six tones of the three groups, are illustrated in Figure 3-1 below.

As expected, the grand mean correct percentage6 of the NCAN group was the highest among the three groups (M = 87.50%), while that of

This grand mean was obtained by averaging the mean correct percentages for the six lexical tones.

144 the YCRC group was the lowest (M = 59.38%). The grand mean for the TCRC group (M = 80.83%) was between the two groups. This pattern appears to be valid for almost every lexical tone in Figure 3-

1, with two exceptions: The mean correct percentages of the NCAN and the TCRC groups for tone 1 and tone 3, were the same. Thus, in order to explore the response differences and the patterns of tonal confusions among the three groups systematically, the mean scores for the six tones of the groups were submitted to cornparisons in (i) the identification test and (ii) confusion matrices.

Figure 3-1. The correct percentages of the three speaker groups in the perception test.

- Tone 1 Tone2 Tone3 Tom4 Tone5 Tone6 Lexical Tones 3.2.1. Identification Test

The way to evaluate the differences in the perception test arnong the three listener groups was that the mean scores (in correct 8)of the six tones for the groups were submitted to a two- way mixed-design ANOVA, with LISTENER GROUPS (3 levels: NCAN,

TCRC, and YCRC groups) as a between-subjects factor and TONES as a within-subjects factor. Results revealed that the effects of LISTENER

GROUPS and TONES were significant. F (2, 27) = 21.529, p < 0.0001, and F (5, 135) = 17.918, p c O.OOOI, respectively. The effect of the interaction between LISTENER GROUPS and TONES was significant, F

(10, 135) = 2.016, p < 0.05. The results indicate that the mean scores of the three listener groups were greatly different with regards to certain lexical tones.

Consequentiy, six individual 1-way ANOVAs were carried out in order to compare the group performance for each tone. Results showed that the effect of LISTENER GROUPS was significant for tone

1, tone 2, tone 4, and tone 5, Fs = (2, 27) = 7.223, 20.281, 24.130, and 8.270, ps < 0.01. Post hoc Tukey tests revealed that the differences in the mean scores only between the pairs, NCAN vs.

YCRC and TCRC vs. YCRC, were significant for the above four tones

(ps < O.OS), and the difference between the two NCAN vs. TCRC groups for tone 5 was also significant @ c 0.05). 3.2.2. Confusion Matrices

Three separate confusion matrices (Tables 3-2 - 3-4) for the

NCAN, the TCRC, and the YCRC groups respectively were constructed, in order to find out which tones were confused in each group. Table

3-2 below is the confusion matrix for the NCAN group. As seen in the table, although none of the lexical tones received a perfect score, relatively high mean correct % can be observed for tone 1, tone 2, tone 4, and tone 5, which were al1 over 90% (see the cells along the diagonal of the matrix). Only tone 3 and tone 6 had relatively low percentages (77.50% and 68.75% respectively).

Table 3-2. Confusion matrix for the NCAN group

1 Target 1 Response Tone

Tooc 1 Tone 2

Tone 3 6 I W.

Tone 6 7 188 LS 8 0 48.25% Column 7 6 8 5 7 5 9 6 7 4 7 4 4 8 0 Total The expected frequency in each ceU is 480/36 = 13.33. The shaded cells indicate correct identification with the corresponding percentages. * indicates that the obtained frequency for the incorrect responses was considered to be statisticaliy significant, because it was greater than the expected frequency. In exarnining the incorrect responses, it was found that the tones that were misidentified and the target tones mainly belonged to the same type of tones (i.e., a level tone was rnisidentified as another level tone). For example, tone 1 (high level) was misidentified as tone 3 (mid level), and tone 2 (high rising) was misidentified as tone 5 (low rising). There were, however, some exceptional cases, such as tone 4 (low falling) and tone 6 (low level): tone 6 was mainly misidentified as tone 47 The matrix also indicates that the NCAN listeners misidentified tone 3 as tone 6

(obtained frequency = 16), and tone 6 as tone 4 (obtained frequency = 18) at a significant level. In other words, confusions between tone 3 and tone 6, and between tone 6 and tone 4 were found.

Table 3-3 presents the confusion matrix for the TCRC group.

The mean correct % for tone 1, tone 3 and tone 4 (93.75%, 77.50%, and 93.75%) were similar to those of the NCAN group (93.7596,

77.508, and 96.25%, respectively). The mean correct 8 for the other tones, tone 2, tone 5, and tone 6 (87.508, 73.758, and

58.75%) were smaller than the correct % for the corresponding

' A discussion of this issue is given in Section 3.3 of this chapter.

148 tones of the NCAN group (97.508, 91.25%. and 68.75%, respectively).

Table 3-3. Confusion matnx for the TCRC group

cells indicatc correct - identification with the comsponding percentages. * indicates that the obtained frequency for the incorrect responses was considered to be statistically significant, because it was greater than the expected frequency.

There was a slightly higher frequency of misidentifications and greater error variations among the tones that were misidentified by listeners in the TCRC group than by those in the NCAN group. For example, in addition to the incorrect responses for tone 5 (low rising) being identified as tone 2 (high rising), tone 5 was also identified as tone 3 (mid level). For tone 6 (low level), it was identified as tone 1 (high level), tone 3 (mid level), tone 4 (Low falling), and tone 5 (low cising). The matrix reveals that the listeners in the TCRC group significantly misidentified tone 3 as tone 6

(obtained frequency = 15), and tone 5 as tone 2 (obtained frequency = 20). Moreover, these TCRC listeners also tended to frequently identify tone 2 as tone 5 (obtained frequency = 10). and tone 6 as tone 3 (obtained frequency = 12). Although the obtained frequencies for these two cases failed to reach significance

(expected frequency = 13-33), the relatively high frequencies of misidentifications have to be acknowledged. As a result, on the basis of the above observations, the TCRC listeners exhibited confusions between tone 3 and tone 6, and between tone 5 and tone 2.

Table 3-4. Confusion matrix for the YCRC group

- 2 8 O

- I 8 0

CO~U~D7 7 9 7 7 8 8 5 Total The expected frequency in each ceIl is 480/36 = 13.33. The shaded cells indicate correct identification with the corresponding percentages. * indicates that the obtained frequency for the incorrect responses was considered to be statistically significant, because it was greater than the expected frequency. Lastly, Table 3-4 above shows the confusion matrix for the

YCRC group. The mean correct 8 of the six lexical tones for the YCRC group were lower than those of the NCAN and the TCRC groups (see

Table 3-2 and 3-3 for comparison). Unlike the NCAN listeners, who exhibited Less confusions with regards to the different tones, the responses for the YCRC iisteners demonstrated misidentifications for all tones. Among the six tones, tone 1 was best identified with

73.75% accuracy.

The matrix aiso shows that the target tones, tone 2, tone 4, tone 5, and tone 6 were significantly misidentified as tone 5, tone 6, tone 2, and tone 4, respectively (their obtained frequencies = 17,

16, 34, and 17, respectively). For tone 1, no specific confusion between tone I and any other tone was observed to be statistically significant, because those incorrect response-choices for tone 1 were spread over the other five tones. This kind of response-pattern was different from the pattern of the poorly identified tones (tone

2, tone 4, tone 5, and tone 6), in which confusion between the target tones and particular response tones were observed to be significant. It is also notable that the YCRC iisteners frequently identified tone 3 as tone 6 (obtained frequency = L2), and tone 6 as tone 3 (obtained frequency =1 l), although their obtained frequencies failed to reach the significant level. The results imply they experience confusions with al1 tones.

3.3. DISCUSSION

The results of the comparisons of (i) the mean correct 8 of identifications and (ii) the patterns of confusion matrices between the two CRC groups and the NCAN group conform to the findings of their tonal reductions in the production study (in Chapter Two). In other words, the findings of this perception test support the two hypotheses stated at the beginning of this chapter. Tonal confusions for level tones -- tone 3 misidentified as tone 6, and tone 6 misidentified as tone 4 -- were also noticed in al1 three listener groups. The following discussion will review both the differences and similarities among the groups.

3.3.1. Group Differences

The findings from the cornparison of the mean correct 8 of tonal identifications for the three listener groups are summarized in

Tables 3-5 below. Table 3-5. Summary of the results of the identification test

TCRC vs. S NCAN YCRC vs. S* S* S* S* NCAN S indicates significant difference in the identification scores €rom the NCAN group in the investigated tones. * indicates significant difference in thé identification scores from the TCRC group at the investigated tones.

As seen in the table, when the mean scores for the tones of the two CRC groups were compared with those of the NCAN group, it was found that the mean identification scores of the TCRC listeners were higher than those of the YCRC listeners. For the TCRC Listeners, only the mean score for tone 5 was significantly lower than that of the NCAN listeners (73.75% vs. 91.25%). However, for the YCRC listeners, the mean scores for tone 1 (73.75% vs. 93.75%), tone 2

(66.25% vs. 97.50%), tone 4 (67.50% vs. 96.25%), and tone 5

(40.00% vs. 91.25%) were al1 significantly lower than those of the

NCAN listeners. In addition, the mean scores for these four tones for the YCRC listeners were also found to be significantly lower than those of the TCRC listeners: tone 1 (73.75% vs. 93.75%), tone 2

(66.25% vs. 87.50%), tone 4 (67.50% vs. 93.75%), and tone 5

(40.00% vs. 72.50%)- These results suggest the TCRC group performed better on the tonal identification test than did the YCRC group. Moreover, the summary table indicates that those tones with significantly lower identification scores for the TCRC and the YCRC groups (i.e., tone 5 for the TCRC group, and tone 1, 2, 4, 5 for the

YCRC group) conform to the tones, which were considered to be reduced in the productions of the two CRC groups.

In Chapter Two, however, it was found chat the TCRC group showed reduction patterns in both the two rising tones (tone 2 and tone 5): Why does the perception test appear to be problematic in this respect? Could this imply that the findings of these two experiments -- production and perception studies -- are inconsistent? The confusion matrix (Table 3-3) indeed indicates that iisteners in the TCRC group misidentified tone 2 as tone 5 with high frequency (obtained frequency = 10). For the NCAN listeners, the same kind of misidentification was made only for twice of the 80 given stimuli. Obviously, the TCRC listeners assigned more incorrect responses for tone 2 than the NCAN iisteners, but the frequency of their misidentification was not high enough to reach statistical significance. Thus, since the TCRC listeners did show a relatively high frequency of misidentification for tone 2, this pattern should be considered as a reflection of the observed reduction in tone 2 produced by the TCRC speakers reported in Chapter Two. Concerning the hierarchical patterns of the tones that were poorly identified by the CRC listeners, they can be clearly seen in

Table 3-5. Those tones are tone 1, tone 2, tone 4, and tone 5. The

YCRC listeners had difficulty in identifying al1 four tones, whereas the TCRC listeners had problems identifying the two rising tones.

Therefore, it can be stated that the CRC listeners had more difficulty in identifying the contour tones (tone 2, tone 4, and tone 5) than the level tone (tone 1). Among the three contour tones, tone 5 appeared to be the most difficult tone for the listeners in the two

CRC groups, because the mean scores for both the TCRC and YCRC groups were significantly lower than that of the NCAN group. Tone

4, on the other hand, appeared to be the least difficult tone for the

CRC tisteners to identify, because only the YCRC listeners showed

confusion between tone 4 and tone 6 (see Table 3-6 below). Tone 2

might be put in the middle of the two tones, since the YCRC listeners

showed confusion with this tone, whereas the TCRC listeners only

showed a tendency towards confusion between tone 2 and tone 5.

Finally, among the level tones, tone 1 appeared to be the level tone

that the YCRC listeners had difficulty identifying (see Table 3-5).

With regard to tonal confusion patterns of the three listener

groups, they are summarized in Table 3-6. The table only lists (i)

confusion found to be statistically significant in the group, and (ii) confusing tone-pairs with prominently high frequencies of

misidentifications, i.e., obtained frequency 2 10 (marked with H).

Table 3-6. Summary of the tonal confusions of the three listener groups

Tone 1 Tone 2 H: Tone 2 --> Tone 5 Tone 2 -> Tone 5 Tone 3 Tone 3 --> Tone 6 Tone 3 --> Tone 6 H: Tone 3 -> Tone 6 Tone 4 Tone 4 -> Tone 6 Tone 5 Tone 5 --> Tone 2 Tone 5 -> Tone 2 Tone 6 Tone 6 --> Tone 4 Tone 6 --> Tone 4 I 44 19 H: Tone 6 --> Tone 3 H: Tone 6 -> Tone 3

As seen in Table 3-6, response-choices of the groups8 conform

to the observation by Fok-Chen (1974) and Ching (1984), according

to which confusion was confined to tones with similar patterns. For

example, a high rising tone (tone 2) was replaced by a low rising

tone (tone 5); a low level tone (tone 6) was misidentified as a low

falIing tone (tone 4). Thus, regardless of the number of confusion

tone-pairs showing for the two CRC groups, the choices for the

incorrect responses were confined to tones with similar patterns.

This implies that the selection of choices by listeners of the two CRC

groups was not by chance, Le., they appeared to have some

Although the YCRC listeners made incorrect responses for al1 tones, the general pattern could still be obsewed (in Table 3-4).

156 knowledge (e.g., contour patterns, relative height etc.) of the lexical tones.

In addition, the confusion patterns in Table 3-6 show that not al1 three listener groups supported the observations by Fok-Chen

(1974), which stated that (i) the six lexical tones are not equally vulnerable to perception confusion, and that (ii) tone 1, tone 2, and tone 4 are the most salient in the perception of Cantonese tones.

The results of the present study also show that, for the NCAN group, not only tone 1, tone 2, and tone 4, but also tone 5 appear to be equally salient among the tones: the percentages for the correct identifications were al1 over 90% (see Table 3-2), whereas tone 3 and tone 6 caused considerable confusions for the participant^.^ For the TCRC group, tone 1 and tone 4 appear to be the most salient tones, the correct scores being over 90% (see Table 3-3). In addition to confusing the two level tones ftone 3 and tone 6), results for the

TCRC group indicate that the TCRC group experienced confusions with the two rising tones, especially tone 5. Results for the YCRC group showed that a variety of confusions arose among the six lexical tones; it should be mentioned though, that responses for tone 1 were somewhat better than other five tones (see Table 3-4).

Tone 3 aud tone 6 for the three iistener groups wil1 be discussed in Section 3.3.2. Consequently, no single tone can be identified as being salient for the YCRC group. The above observations provide additional support that the results of the NCAN group were perfectly matched with those of the native speakers in the study of Fok-Chen (1974).

Furthemore, the results also indicated that the observed patterns of the two CRC groups deviated from the NCAN groups to various degrees.

In summary, two findings have emerged on the basis of this perception study. First, the performance of the YCRC group was the poorest, because they made errors with al1 tones; moreover, the performance of the TCRC group appeiired to be more comparable with that of the NCAN group, with the exception of the two rising tones. This implies that the performance of the TCRC group is better than that of the YCRC group. Second, hierarchical patterns were also observed in the poorly identified tones for the two CRC groups. The

CRC listeners made more errors in the contour tones (Le., tone 2, tone 4, and tone 5) than in the level tone (i.e., tone 1). They appear to have much more difficulty in identifying the two rising tones

(tone 2 and tone 5) than the falling tone (tone 4), and tone 5 is the one that is highly misidentified by the CRC listeners. Of the three level tones, tone 1 appears to be the most difficulty tone to identify.

These two findings perfectly matched the reduced tonal patterns of the CRC groups discussed in Chapter Two. Therefore, the results of the two experiments -- production and perception studies -- are consistent.

3.3.2. Level Tone Confusions

3.3.2.1. Tone 3 and Tone 6

In examining the results shown in Table 3-6, it is noteworthy that listeners in al1 three groups had obvious confusions in the two non-high level tones, tone 3 and tone 6. NCAN listeners significantly misidentified tone 3 as tone 6, and tone 6 as tone 4. The TCRC and

YCRC listeners also show similar confusions in the target tones, tone

3 and tone 6.

The tendency to confuse the level tones has been reported in previous literature. Fok-Chen (1974) reported that her listeners confused the three level tones (tone 1, tone 3 and tone 6) in her study. Vance (1976) found that tone 3 was being perceived within the entire (Fo) range of the synthetic speech. Confusion between the two tones (tone 3 and tone 6) was not confined to adult native speakers of Cantonese: Ching (1984) reported that significant confusion between tone 3 and tone 6 also occurs in the responses of young Cantonese children. Wong (1998) observed that the level tones were poorly identified. Abramson's study (1978) on Thai tones indicated that "a considerable overlap" exists in the perception of the level tones. As in these previous studies, the present study also found that listener groups confused the three level tones, such as misidentifications of tone 1 as tone 3, tone 3 as tone 6, and tone 6 as tone 3 or tone 4.

According to Gandour (1994: 3120), Fo height is the dominant cue for distinguishing between tones with similar contours. In this case, since level tones in general do not involve much Fo movements

(except the acceptable falling pattern, which was mentioned in

Chapter One), the relative Fo height becomes a crucial factor.

Therefore, the observed tendency to confuse tone 3 and tone 6, but not much in tone 1 in this study might be contributed to these three factors.

First, the fact that confusion was found particularly with tone

3 and tone 6, but not with tone 1 may relate to the fact these two tones are closer to an individual's normal speaking voice. Eitel

(1947: cited in Fok-Chen, 1974: 18) has suggested that individual speaking voice is somewhere near tone 3. Similarly, Chao (1968: 26) also pointed out that it is normally centered near the lowest part of the speaker's voice range. Thus, because these two level tones (tone

3 and tone 6) are closer to the speaking voice of the speakers, listeners may find the two tones more confusing than tone 1. Moreover, association with higher pitch may cause tone 1 to become more salient than the other two tones (Fok-Chen, 1974;

Vance, 1976; Ching 1984). Therefore, it is less likely to be confused with any other tone.

Second, confusions might also relate to the relative Fo (pitch) distance (Nooteboom, 1997: 645) among the three level tones of

Cantonese. If the Fo distance between two tones is shorter, then more confusion is expected in identifying the two level tones (tone 3 and tone 6). In a normal speech situation, Jones and Wu (1912: xiv- xv)1° have suggested that tone 1 is approximately 3 semitones above tone 3, which is in turn 2 semitones higher than tone 6. In other words, the Fo distance between tone 3 and tone 6 is shorter than the distances between tone 1 and tone 3, and between tone 1 and tone

6. Thus, since the Fo distance between tone 3 and tone 6 is shorter, it is expected that more confusions are involved in identifying these two tones, Although the size of the scales may Vary from person to person, relative reiationship among the tones will remain (ibid.).

This might be the reason why the correct identifications for tone 1 in the NCAN gmup, as well as in the TCRC group, were remarkably high (93.75% in both groups), whereas the correct identifications

'O Chao (194725) adopted the musicd notes in bis book.

161 for tone 3 and tone 6 were considerably lower. Accordingly, if the distance between the two tones is greater, the more salient their difference becomes in perception.

Finally, using multiple speakers to present the isolated stimuli may increase the difficulty for listeners to identify tone 3 and tone

6. The method of using multiple speakers to present stimuli has been used as a condition of the studies by Fok-Chen (1974) and

Wong (1998). Fok-Chen (1974) found that native Cantonese listeners in general, showed more misidentifications when multiple speakers were employed, than if a single speaker was employed (i.e., a female or a male). Moreover, incorrect responses were more frequently found in level tones than in contour tones in the condition employing multiple speakers. Similarly, in his study, Wong

(1998) examined the perception of the three level tones in

Cantonese (tone 1, tone 3, and tone 6) by native listeners in varying conditions. One of the conditions was employing multiple speakers to read stimuli in citation forms. The results revealed that the native listeners had great difficulty in identifying the three level tones in this condition. The mean correct % for identifying the three tones was 49.83%. Wong suspected that his subjects identified the tones almost by chance. For this study, using multiple speakers to present the isolated stimuli appeared to have more influence on tone 3 and tone 6, because native listeners (in the NCAN group) were able to identify the lexical tones with a remarkably high correct % of identification. Most of the lexical tones were correctly identified more than 906 of the time, except for tone 3 (77.50%) and tone 6

(68.75%1).~~The overall results appeu to suggest that the confusing effects of using multiple speakers to present the isolated stimuli had more impact on tone 3 and tone 6. These patterns were quite similar to the results of Fok-Chen (1974).

The remaining question is why tone 3 and tone 6 are more readily confused in the perception test employing multiple speakers.

Under a normal situation, listeners will use the perceptual skill, normalization, to infer the information of the tonal patterns from given stimuli (Fourcin, 1978; Ching, 1984). However, in this case, when a set of stimuli from different speakers were presented successively, normalizing the stimuli becomes difficult. Every speaker has his or her unique pitch range (tonal space); it is possible that a high tone for one speaker has the same Fo values ais a low tone for another speaker. Another possible factor might relate

[' Although the correct % of tone 3 and tom 6 were relatively lower than the other tones, the correct identifications of the tones (in Z) in this smdy were still considerably higher than Wong's findings (1998). The better performance in this study might be attcibuted to the fact that the stimuli had undergone screening pmcesses -- signal selections and the pilot study -- in the initial stages of the experiment, before the perception test was conducted. to the fact that contour tones can be perceived through other perceptual cues, such as the contour (shape), slope, and direction

(rising of falling) (Gandour, 1979, 1981, 1984 in Cantonese) -- facilitating the identification of the target tones.

3.3.2.2. Tone 6 and Tone 4

In addition to the points discussed above, this study also observed that tone 6 was frequently misidentified as tone 4 by both the NCAN and YCRC groups (see the summary of the confusion matrices for the three groups in Table 3-6). However, re-examining the listeners' responses, it was found that 16 out of 18 of the errors in the NCAN group (see Table 3-2) were actually conûibuted by three listeners (2 males and 1 fernale). Among the two male listeners, one made 6 errors out of the given 8 stimuli for tone 6, and the other male tistener made 5 errors out of 8. In fact, their scores constitute the total number of errors in the group of NCAN male listeners. The other five errors were made by one of the NCAN female listeners. This suggests that some of the native tisteners are unable to distinguish between tone 6 and tone 4. Unlike the case of the NCAN group described above, the enors made by the listeners in the YCRC group were distributed evenly among aii members of the group. Confusion between tone 4 and tone 6 might be attributed to two factors. First, tone 4 and tone 6 are similar to each other, because they resemble each other in pitch level, quality of voice, and are relative shorter than the others in duration (Eitel, 1947: cited in Fok-Chen, 1974: 18). Second, because the tonal pattern for tone 6 appears to be associated with a light falling pattern (as mentioned in Section 2.3.1), this may facilitate the two tones being more like each other in the perception test. This assuml tion is made by examining the tonal patterns in Figures 2-2 -- 2-7, in Chapter

Two. Consequently, tone 6 and tone 4 are frequently indistinguishable (Eitel, 1947: cited in Fok-Chen, 1974:

Hashimoto, 1972; Bauer & Benedict 1997).

3.4. CONCLUSION

The purpose of the perception study was to find out whether the production deviations in the two CRC groups were reflected in their perception scores. It was assumed that if their results were compatible with those of the NCAN group, it would imply that they had similar patterns in their tonal systems. If the results were not compatible, it would imply that their tonal systems deviate from that of the NCAN speakers. The findings of the perception study revealed that the perception scores of the TCRC group were comparable to the NCAN group (with the exception of the rising tones), whereas the perception scores for the YCRC groups showed more deviations from those of the NCAN group (at tone 1, tone 2, tone 4, tone 5). Furthemore, the patterns in the confusion matrices confied that the choices of tones, which were confused in the selection process, were confined to tones with similar patterns. This implied that even listeners in the YCRC group had some knowledge about the six lexical tones. The results revealed that (i) listeners in the TCRC group showed fewer problems in identifying the tones, and

(ii) that hierarchical patterns were observed in the identification test; e.g., contour tones were more frequently misidentified than the level tones. These results rnatched the findings presented in Chapter

Two, and suggest that the tonal representations of the CRC group should be considered as reduced in cornparison with the tonal representations of the NCAN group. CHAPTER FOUR

CONCLUSION

The objective of this thesis was to examine the similarities and differences in the tona iI systems of Canac jian Raised Cantonese (CRC) speakers and native Hong Kong Cantonese speakers (NCAN). The implications of the production and perception studies (from

Chapter Two and Three) point to significant theoretical issues (see below) justifying the undertaking of the present research.

This final chapter begins with summaries of the findings of the production and perception studies presented in the previous two chapters (Section 4.1). Discussions about the implications of the deviated tond patterns of the two groups of CRC speakers are provided in Section 4.2, and are followed by a conclusion section surnming up the implications of the results (Section 4.3). The limitations of the present study and the direction of future studies wili be discussed in Section 4.4.

4.1. TONAL PATTERNS OBSERVED IN THE TWO CRC GWOUPS

In order to investigate the tonal system of the two groups of

CRC participants, production and perception experiments were conducted. In the production study, the two acoustic correlates of tone -- vowel duration and fundamental frequency (Fo) -- were systematically compared for the three speaker groups. In the perception study, speakers' perceptual ability to identify natural stimuli was tested in order to provide additional evidence from a perspective other than speech production. The perception experiment was evaluated by comparing (i) identification socres (% correct) and (ii) confusion matrices for the individual groups. The findings of the production and perception experiments were examined in relation to each other. The results were consistent and they suggested the existence of tonal pattern deviations at the phonemic level for the participants in the two CRC groups.

4.1.1. Findings of the Production Study

4.1.1.1. Vowel Duration

With respect to the acoustic correlate of vowel durations, a comparison was carried out to investigate (i) if there were any differences in vowel durations among the three speaker groups, and

(ii) whether the relative durational patterns were maintained by speakers of each group.

Durational differences between the two CRC groups and the

NCAN group did not reach the level of statistical significance. It should also be mentioned that female speakers in the TCRC and the YCRC groups had a tendency to produce longer durational patterns than any other speakers. These patterns can be attributed to interference from English vowel durational patterns, which is a result of a language in contact situation.

An examination of the durational patterns among the six lexical tones revealed that the durational cue alone was not enough to differentiate the six lexical tones in any of the speaker groups. No consistent pattern was observed; in many instances tone 5 was the longest, at other times it was tone 2 that had the longest vowel duration. However, a pattern revealing some relationship among the six lexical tones was observable: it appears that tone 2 and tone 5

(high and low rising tones) are the longest and tone 4 (low falling) and tone 6 (low level) are associated with the shortest duration, while the durations of tone 1 (high level) and tone 3 (mid level) fa11 between the two pairs of tones. This implies that the TCRC and YCRC speakers are aware of the relative durational patterns of al1 tones, and their production of tones appears to follow consistently the durational patterns mentioned here.

It may be concluded on the basis of the present study that the acoustical parameter of duration is not an important factor in distinguishing lexical tones in the production of Cantonese t0nes.l It is thus appropriate to consider duration as a concomitant feature of

Cantonese tones, but not a parameter for differentiating among the six lexical tones.

4.1.1.2. Fundamental Frequency (Fo)

With regard to the other acoustic correlate, fundamental frequency, it may be recalled that the mean Fo data for al1 speakers were converted into Chao's tonal letter values with an interval of 2.5 semitones. On the basis of the tonal patterns shown in the six figures (see Figures 2-2 - 2-7), it may be stated that both the TCRC and YCRC groups showed considerable reductions (or deviations) from those of the NCAN group.

To determine whether the observed reduction in tonal patterns existed in the TCRC and YCRC groups, the Fo data were submitted to a series of statistical analyses in order to investigate the issue systematically. This Fo analysis examined four areas related to the tonal systems of the speaker groups. They were (i) tonal space, (ii) spatial relationships among the level tones, (iii) tonal space (Fo interval) of contour tones, and (iv) contour shapes and

' The same conclusion has been reached for Mandarin (Gandour. 1994: 3120) and for Taiwanese Mandarin (Tseng, 1990). spatial relationship of the two rising tones. Analyses were compared in terms of Fo ratios and percentage change (A%) in Fo values. The summaries of the findings -- Tables 2-15 & 2-16 -- are reproduced here in order to facilitate the discussions below.

Table 2-15. Summary of the results of the statistical analysis of fundamental frequency.

S means significant difference from the NCAN group in the investigated areas. * means significant difference from the TCRC group in the investigated areas.

Table2-16. The hierarchical orders of the reduced tonal patterns of the CRC speakers

tone 2, ione 4, & me 5 > tone 1. tone 3. & tone 6 (A) Contour Tones risin tonts > low lalling tone tone 1& tone 5 > utne 4

(i)Rising Tones low risirg @one> bigb- rising- tone . tone 5 > tone 2 (B) Level Tores higb ltvel tont > mid & low (or non-bigb) level tones tone 1 > (tone 3 and tone 6)

' Since no fimher evidence was available in this study showing that one of the non-high level tone is reduced before another one, 1 did not separate the tones in order. The first part of the analysis examined the tonal space of the three groups in terms of T-space ratio. It was found that (i) the T- space ratio of the TCRC group is similar to that of the NCAN group, and (ü) the T-space ratio of the YCRC group is significantly smaller than that of the NCAN group.

The second part of the analysis investigated the spatial relationships of the level tones (tone 1, tone 3, and tone 6) among the groups in terms of L-spatial ratios. Three L-spatial ratios were compared: tone 1: tone 3; tone 3: tone 6; tone 1: tone 6. It was found that the L-spatial ratios of the TCRC group were not significantly different from (or smaller than) those of the NCAN group. However, significant differences were found in the ratios between the YCRC and the NCAN groups. The results revealed that the ratios of tone 1: tone 3, and tone 1: tone 6 of the YCRC group are significantly smaller than those of the NCAN group. From this it follows that the mean Fo value of tone 1 becomes crucial to the final results. This further implies that their tone 1 was reduced.

The third part of the anaiysis evaluated the tonal space (Fo interval) of the contour tones (tone 2, tone 4, and tone 5) of the groups in terms of C-space ratios. For tone 2, the C-space ratio of the YCRC speakers was significantly smaller than both the ratios of the NCAN and the TCRC speakers. With respect to tone 5, the C- space ratios of the TCRC and YCRC groups were significantly smatler than those of the NCAN group. For the bw falling tone (tone 4), the

C-space ratio of the TCRC group was found to be similar to that of the NCAN group, whereas the ratio of the YCRC group was significantly smaller than that of the NCAN group.

The fourth part of the analysis examined the (i) contour shapes and (ii) the spatial relationship between the two rising tones

(tone 2 and tone 51, in tems of percentage change (A%) in Fo values

in the four durational range-sections: R1 (0-2581, R2 (25-50%), R3

(50-75%), and R4 (75-100%)- In tems of contour shape, the tone 2 contour for the TCRC group was similar to that of the NCAN group,

but the corresponding contour for the YCRC group tumed out to be

significantly different from those of the NCAN and the TCRC groups.

For the tone 5 contour, both the contours for the TCRC and YCRC

groups were significantly different from the tone 5 contour for the

NCAN group (see Table 2-15).

Concerning the spatial relationship of the two rising tones, it

was found that the A% in Fo values in the four durational range-

sections of the TCRC groups were comparable to those of the NCAN

group. On the other hand, the YCRC speakers failed to maintain the

spatial relationship between the two rising tones. For the speakers in the YCRC group, differentiation in A% in Fo values between the two rising tones begun in the R3 durational range-section (50-758).

Two conclusions can be drawn on the basis of the above findings with regard to the Fo analysis in the areas of tonal space, spatial relationships among the level tones, Fo interval (tonal space) of contour tones, and the contour shapes and spatial relationships of the rising tones.

First, tonal reductions had mure impact on the YCRC than the

TCRC group. The YCRC group had four reduced tones (tone 1, tone

2, tone 4, and tone S), whereas the TCRC group only had two reduced tones (tone 2 and tone 5) (see Table 2-15).

Second, the reduced tones (see above) produced by the two

CRC groups were confined to certain hierarchical orders. Reduction appears to have more impact on the contour tones than the level tones. Amongst the contour tones, it was the falling tone (tone 4) rather than the rising tones (tone 2 and tone 5) that were influenced the most. Furthermore, comparisons of the contour shape revealed

that speakers in the two CRC groups faited to maintain the contour

shape of tone 5; whereas in the case of tone 2, the speakers in the

TCRC group did maintain the contour shape of tone 2 (see Table 2-

15). This suggests that tome 5 is more easily reduced than tone 2.

For the level tones, it was the high level (tone 1) rather than the other two level tones (tone 3 and tone 6) that were affected. The hierarchical orders are summarized in Table 2-16 above.

4.1.2. Findings of the Perception Study

The purpose of conducting a perception study was to provide additional evidence to confirm the results with regard to the reduction in the tonal systems of the two CRC groups observed in the production study (Chapter Two). Furthemore, the perception study aimed at examining the extent to which the reduced tonal patterns in the production study reflect deviations in the mental representation of lexical tones of the CRC participants.

In order to evaluate the tonal system of the TCRC and the YCRC groups on the basis of the scores of the identification test, cornparisons were carried out on the scores and confusion matrices.

Cornparisons in correct identification scores were made to find out whether the scores of the two CRC groups were different from the identification scores of the NCAN group (Le., which tones of the

TCRC and the YCRC groups were different from (or lower than) those of the NCAN group). The confusion matrix of each group revealed the choices of the incorrect responses made by the listeners in the group. This provided a better understanding about which pairs of tones were confused by listeners in each group. In other words, these confusion matrices facilitated a clearer understanding by indicating the pattern of false responses of the listeners in the CRC group in relation to those of the NCAN group.

4 Identification Test

The result of the cornparisons of the scores of identification

(correct %) revealed that only the scores of the YCRC speakers were significant lower than those of the NCAN and the TCRC speakers for tone 1, tone 2, tone 4 and tone 5. In addition, the identification score for tone 5 for the TCRC listeners was also significantly lower than that of the NCAN listeners. The summary of the results is reproduced from Chapter Three (Table 3-5 below).

Table 3-5. Summary of the results of the identification test

TCRC vs. S , NCAN YCRC VS. S * s * S * S * NCAN s indicates significant difference from the NCAN group at the investigated tones. * indicates significant difference fiom the TCRC group at the investigated tones. Two findings are revealed in the summary table (Table 3-5 above). Fust, there exist degrees of difference in the scores of identification that exist between the two CRC groups. Participants in the TCRC group made fewer errors than those in the YCRC group.

Second, on the basis of those tones (tone 1, tone 2, tone 4, and tone

5) that were poorly identified by the two CRC groups, it was found that hierarchical patterns were observed in these tones. Contour tones in general were less identifiable than level tones. Tone 5 (low rising) and tone 1 (high level) were the most poorly identified among the contour tones and the level tones, respectively. The results were in agreement with the reduced tones observed in the production study. This implies that CRC participants had difficulty both in producing and identifying these tones. This further suggests that CRC speakers' mental representations of lexical tones deviate somewhat from those of the NCAN speakers, and are reflected in their tonal productions as reduced tonal patterns.

4.1.2.2. Confusion Matrices

The confusion matrices for the listener groups (Tables 3-2 to

3-4) revealed some tones are consistentiy confused at a statistically significant level. Furthemore, there are a few marginal cases, i.e., certain tones were frequeatly misidentified, but a level of statistical significance was not reached. The summary of the confusion matrices for the three listener groups in Table 3-6 is re-produced below and those pairs, which exhibited a prominentiy high frequency of misidentifications (frequency = 10 or above) are marked with the symbol (H).

Table 3-6. Summary of the tonal confusions of the listener groups

one 1 Tone 2 Ei: Tone 2 --> Tone 5 Tone 2 --> Tone S Tone 3 Tone 3 --> Tone 6 Tone 3 --> Tom 6 H: Toae 3 --> Tone 6 Tone 4 Tone 4 -> Tone 6 Tone 5 Tone 5 -+ TOM 2 Tone 5 --> Tone 2 Tone 6 Tone6->Tom4 Tone 6 --> Tone 4 66 99 H: Tone 6 --> Tone 3 U: Tone 6 -> Tone 3

As may be seen in the summary of the confusion matrices

(Table 3-6 above), one more piece of information about the hierarchical patterns exists in the poorly identified contour tones: tone 2 tended to be more poorly identified chan tone 4.

There are cenain similarities with regard to the patterns of misidentification among the three listener groups.

(i) Tones, which were misidentified as other target tones, were

mainly from the same type of tones which have sirnilar contour shapes and/ or tonal values (e.g., rising tones were

misidentified as other rising tones).

(ii) Tone 3 was frequently misidentified as tone 6, or vise

versa.

(iii) Tone 6 was misidentified as tone 4 (NCAN & YCRC).

Fok-Chen (1974) and Ching (1984) have reported on similar misidentification patterns in the perceptions of native speakers; thus the fist observation is in accordance with expectations.

Although listeners in the two CRC groups made more errors in the test than the NCAN group, their choices of incorrect responses were confined to tones with similar patterns. This implies that the responses by the TCRC as weil as the YCRC groups were not given by chance. Regardless of the number of errors they made, they did have some knowledge of the lexical tones (see Table 3-6).

The second observation may be related to the facts (i) that tone 3 and tone 6 are closer to one's speaking voice; (ii) that tone 3 and tone 6 have a shorter Fo (pitch) distance; and (iii) that employing multiple speakers in a perception study causes the

Listeners difficulty in the identification of tone 3 and tone 6. The last observation may be explained by the similarities of the two tones, tone 4 and tone 6 (e.g., their pitch (Fo) values and contour patterns).

To sum up the results of this perception study, an examination on the tones with poor identification scores and the patterns of the tonal confusions provided a better understanding of the CRC groups* tonal systems. The findings in the two CRC groups were consistent with the results of the production experiment (see the summary table, Table 2-15). Listeners in the TCRC group had difficulty in

identifying the two rising tones (see Table 3-6), whereas the YCRC

speakers had problems with four tones (tone 1, tone 2, tone 4, and

tone 5). Regardless of the number of errors they made, the selected choices for the incorrect responses revealed that most of their

misidentifications were confined to the tones with similar patterns.

This clearly identifiable pattern implies that the CRC participants

indeed have some knowledge of tones (both in terms of production

and perception). Moreover, hierarchical orders (from more difficult

to less difficult) were observed for those tones poorly identified by

the CRC listeners: (i) contour tones > Level tones; (ii) tone 5 > tone 2

> tone 4; (iii) tone 1 > tone 3 and tone 6.

Thus, the findings of the production and perception studies

suggest that the reduction patterns in the production of the two groups of CRC speakers coincide with the deviant tonai patterns in their lexical representations. En addition, the reduction patterns were subject to certain hierarchical orders. Implications of these observations will be explained more explicitly in the following section (Section 4.2).

4.2. DEVIANT TONAL PATTERNS

The deviant tonal patterns observed in connection with speakers of the CRC speakers have two implications. First, the CRC groups (KRC and YCRC) appear to have different degrees of mastery of the Cantonese tonal system (Section 4.2.1). Second, the hierarchies of their reduced tones illustrate inverse relationships to the sequences of tonal acquisition in Cantonese tones (Section

4.2.2)-

4.2.1. Mastery of Tonal System

The findings of the production and perception studies (in

Section 4.1) provide clear evidence that (i) there rue different degrees of tonal reduction in the speech of the TCRC and the YCRC groups, and (ii) the observed tonal reductions follow certain hierarchical orders. In addition, their performances in the production and perception experiments refIect that an awareness of certain tonal features, such as tonal patterns (e.g., direction and shape) and general durational characteristics of the target tones

(e.g., tone 2 is associated with relatively longer duration pattern). In other words, they did have the knowledge of tones. Evidence for different degrees of tonal reduction in the two CRC groups can be seen from their reduced tonal patterns in the productions and the choices of false responses in the perception test.

For examples, in terms of production, the patterns of the tones

(tone 1, tone 2, tone 4 and tone 5) of the speakers in the YCRC group were much more reduced than those of the speakers in the

TCRC group (see Figure 2-6 & 2-7 vs. Figure 2-4 & 2-5). However, their production of these tones still exhibited the typical contour patterns of the lexical tones (e.g., a rising tone was produced with a rising trend, a falling tone was produced with a falling direction) and relative durational patterns (see Figure 2-la). In terms of perception, the confusion matrix for the YCRC group revealed that most of theu incorrect responses were confined to the tones with similar patterns or tonal values, associated with the given stimuli.

This evidence suggests that they were aware to some degree of tonal features. These observed phenomena from this research, as So

(1998) has proposed, appear to be related to the issue of "degree of mastery" of tones, As mentioned in Chapter One, tonal systems are acquired generally before two years of age (Chao, 1951; Li & Thompson, 1976 for Mandarin; Tse, 1978 for Cant~nese);~however, confusions regarding the lexical tones among young native speaking children were still observed both in speech production and perception. Li &

Thompson (1976) found that the 17 Mandarin-speaking Taiwanese children (ages from 18 - 34 months), who participated in their study, showed difficulty in learning the rising tone and the dipping tone. Similarly, Light (1977) reported that his daughter (during the time from 30 -36 months) had problems producing rising contour tones in Cantonese. ln terms of perception tests, Clumeck (1976) found that his two Mandarin-speaking participants (ages at 3:s; and

2:1O, respectively) confused the rising tone and the dipping tone.

Finally, Ching (1984) found that Cantonese children (from age 4 to age 10) confused the tone pairs, tone 3 vs. tone 6, and tone 5 vs. tone 6."

The findings of these studies imply that children indeed have not fully mastered the tonal system More the age of two. In connection with the perception test, Ching (1984) further suggested that they have not yet fully mastered the perceptual skiil, Le.,

These studies reported that acquisition of tones precedes that of tbe segmental system (see Cbapter One, Section 1.1.1.). see Chapter One, Section 1.3.2 for detaiIs.

183 normalization of tonal patterns: which is acquired to perceive signals; this ski11 also improves through experience (ibid.). She found that children at about the age of 10 make "confident judgment based on both speech and pattern forms" of the given stimulL6 In other words, children are able to fully master the tonal system (tonal contrast among the tones) by approximately the age of 10.

Concerning the cases of the two CRC speaker groups in this study, speakers' age of arriva1 (AOA) in Vancouver may provide some dues with regard to their performance on both the production and perception experiments in this research. The TCRC speakers moved to Canada in their early teens (10 - 15 years old). Acquisition of the tonal system should have been finished before their migration. Thus their performance should be comparable to the

NCAN group. The findings in this study generally demonstrated this assumption with the exception of the reduction patterns evident in the rising tone by speakers in the TCRC group.7 On the other hand, speakers in the YCRC group were either born in Vancouver, or moved to Vancouver before the age of seven. For the Canadian born

Cantonese speakers, the exposure to Cantonese environments

' See chapter One, Section 1.1.3 for details. See Chapter One, Section 1.3.2 for details about Ching's study (1984). ' An explanation of this phenornenon is offered below in this section. necessary to acquire the lexical tones was fa- more Limited than that of Cantonese children in Hong Kong. The YCRC speakers who were born in Hong Kong, undoubtedly had been exposed to the native environment for several years before they moved to Canada. These

YCRC speakers were too young when they amved in Canada, thus -- in accordance with Ching's observation (1984), mentioneci above -- they may not have fdly mastered the tonal patterns of Cantonese before they left Hong Kong. it is thus conceivable that members of the YCRC group had not fulIy mastered the tonal system of

Cantonese when they took part in the experiments for the present study. This might explain why the YCRC speakers demonstrated severe tonal reductions in the production and perception studies.

4.2.2. Hierarcbical Orders of Tonal Reductions

The hierarchical orders of tonal reduction observed in this study exhibited a mirror image of the hierarchical sequences of tonal acquisition8 in Cantonese reported by Tse (1977)P In the order of tonal acquisition, Tse observed that a child lems the

' An exception was found in the order ror the level tones, and it will be discussed Later in the discussion in this section. Tse's (1977) study was a longitudinal case study of a native Cantonese child. simple tones prior to the relatively difficult ones.1° Accordingly, level tones are learnt prior to contour tones. Among the level tones, the high level (tone 1) is fiist acquired, then mid level (tone 3) and low level (tone 6). For the contour tones, the faiiing tone (tone 4) is acquired before the two rising tones (tone 2 and tone 5). and the acquisition of the high rising tone (tone 2) precedes that of the low rising tone (tone 5).'l

The order of tonal reduction reported in this study ascertained reverse patterns to the ones of tonal acquisition reported by Tse

(1977). It was assumed that the relatively difficult tones were going to be reduced before the simpler ones. The reduction patterns were found in the following orders: The general pattern was that reduction of contour tones (tone 2, 4, & 5) preceded reduction of level tones (tone 1, 3, & 6). Among the three level tones, tone 1 was first subject to reduction before the other two level tones. For the contour tones, the low falling tone (tone 4) was reduced before the two rising tones (tone 2 & 5), and tone 5 was reduced before tone 2.

'O Here the assumption of difiiculty is made on the basis of the sequences of tonal acquisition. " According to Tse (1977: 200-201)' allowing "one's voice rise al1 the way up to the highest pitch (Fo) IeveI than to try to stop it at the mid level" is always easier. Further, this might also relate to the frequency distributions of tones that Tone 2 bas a bigher frequency occurrence than tone 5 (Fok-Chen, 1974: 35, 99 (19.4% vs. 11.3%); 1979: 155 (19% vs. 11.4%); Ng & Kwok, 1999 (16.63% vs. 10.41%)). The hierarchical orders of tonal reduction and of tonal acquisition are listed in Table 4-1 below.

Table 4.1. Cornparison of hierarchical orders of tonal reduction in the present study and tonal acquisition in Tse's study (1977).

As may be seen in Table 4-1, the sequences of hierarchical orders of the reduction and the acquisition of tones show a reverse pattern. However, it should also be noticed that the sequences of tonal reduction and of the tonal acquisition for level tones are the same. The fact that tone 1 was reduced first may be related to the observations according to which this high level tone required more effort to produce than other level tones. Eitel (1947; cited in Fok-

Chen, 1974: 18) and Chao (1968: 26) mentioned that the individual

In Tse' classification of tooes, tone 4 was considered as a level tone rather than a low hlling tone, since he assigned tbe tone letter value (111 rather than 1211. speaking voice is normally centered somewhere between tone 3 and tone 6 (see Chapter three, Section 3.3.2.1). In order to produce a tone with a higher pitch level than a normal voice pitch, one needs to increase the vibration of the vocal folds through one or both mechanisms, Le., increasing (i) the tension of the vocal folds, and/ or (ii) the subglottal pressure.13 Thus, taking this into consideration, production of tone 1 requires more effort than the production of the other two level tones. In Hombert's experiment (1976), his subjects were found to have more difficulty in producing the higher tone than the lower tones. This might be the reason why the YCRC group's tone 1 was the first to be reduced arnong the 3 level tones.

If the production of tone 1 requires more physiological effort, as mentioned above, then what is the reason for tone 1 to be acquired much earlier than the other two level tones? This might relate to the frequency of occurrences. It is well known that motherese or caregiver speech in English is characterized by higher pitch and exaggerated intonation (Cho and O'Grady, 1992: 423-

424). Similarly, the child-directed speech in Cantonese is also associated with high mean Fo values and a larger Fo range (Tang and

Maidment, 1996). Because children are first exposed to these high

l3 See Chapter One (Section 1.1.2).

188 pitch utterances, they acquire the high level tone (tone 1) relatively earlier than the other two level tones (tone 3 and tone 6).

4.3. CONCLUSION

On the basis of the production and perception experiments of this study, it may be concluded that the two groups of CRC speakers demonstrated various degrees of reduction in their tonal systems in comparison with that of the NCAN speakers. This may be attnbuted to the fact that they had different degrees of mastery of lexical tones when they moved to Canada, and before they participated in these experiments.

In addition, this study also found that tonal reductions had much more impact on certain lexical tones; thus hierarchical orders could be observed among the following tones, tone 1, tone 2, tone

4, and tone 5. It was found that contour tones were subject to greater degrees of tonal reduction than the level tones. Among the contour tones, the falling tone (tone 4) was reduced later than the two rising tones (tone 2 and tone S), and tone 5 occurred as a reduced tone more frequently than tone 2. With regards to the level tones, it seems that only tone I has become reduced. Consequently, it appears that tonal reductions are facilitated if the tones (both level and contour tones) (i) are more complex or require more effort to be produced, and (ii) the frequency of the occurrence of the tones is lower. The observed hierarchies in tonal reduction exhibited inverse patterns to the sequences of tonal acquisition in

Cantonese, in which the relative difficult tones would be learnt at the end of the acquisition period.

Therefore, it seems plausible to assume that if a CRC speaker has not fully mastered the tonal system, what s/he learnt last will be reduced first. Consequently, the order of reduction of his 1 her tones will be in the opposite direction of the hierarchies of tonal

acquisition.

4.4. LIMITATION AND FUTURE RESEARCH

The present study is to be considered as an initial step in

identifying the similarities and differences of the tonat systems of

the CRC speakers in relation to the NCAN speakers. The areas that

have been investigated and the metbodologies to examine the

acoustic correlates were al1 exploratory. Future studies could

further expand on the issues listed below.

First, this study has observed sex (gender) differences in vowel

durations in relation to the tonal system of the CRC speaker groups. Future research has to establish whether this sex related difference can be associated with any sociolinguistic factors.

Second, although the present research has identified the hierarchical tonal reduction patterns in the tonal system of the CRC groups, the causes for these reductions are still uncertain. The question whether this phenomenon can be attributed to first language attrition due to a language-in-contact situation, or what we are witnessing here is a case for a simplification process, will have to be answered by future research.

Lastly, in the perception experiment, the author has proposed chat the frequent misidentifications of tone 3 as tone 6 may be partly attributed to the employing of multiple speakers in the experiments. To eliminate this possible factor, future studies may employ a subset in a perception test, in which the stimuli are blocked by speakers in order to investigate the listeners' performance in the identification of tone 3 and tone 6. Appendix A Languagc Background Questionnaire

SEX: - AGE (JI prwCnt): -YCPN 014. NORMAL HEARING ABILITY? Ycu I Nu

I 1 ) Wsrc you barn in Canada? Yts ! No

IÎ"No".f a) when wcre yw bom? ( b ) How long did you Iive thme? .Y-. ( c ) wbni did you corne to Caada? l9 ;rge (arrivai): year old. ( d ) How long bave you Iivd in Vancouver l Cawh? Monk (Le., y-). ( e ) 04 you have my Mmkal trainingin your bukground? Ycs l No, how long

Orandhiber Orandmotber Fatber Mothcr Siblings ( brolbcn 1 riicm) elusrnaterl kieads

( 3 ) ln ihe fange hout 0% to 100% how would you rate your degne of cornfort or pnfemce in the fbllorvinilrrrir: Arnr of Iingu8ge use Eaglish Cintonese Prcfcned Iraguagc in rhu keas Reading (books, mrmzincs, etc.) Yi Yi Spcrking Yi % Media (TV prognm. films. Videos. songs etc.) % % ( 4 ) Hrvc you cvcr Icmcd ay Impqcoihrr than English and Canconm*? Yss !No

If Ycs. plarc ill thcm on the tablc, and your Isvsl in proficisncy of thal lmgurgc in bath the Writm and Swlrcn_fons, an s stcde of 1-10 (the worst-ihc bçsr).

(5)Didymkaray~iaCiataoeoamH~~ Yes or No

(6)Whatisyour~hadeprictevel? If you have 4 univaity dagne, whu idue(1) the subjws) ih.t you majorad in ? (2)

( 7 ) Have you rranily ken lumiog Caatoeerc? Yes or No

( 8 ) Have pu~l leamcd Cliiio~csebelore? Yw 1 No (If ÿes", go to Q9)

( 9a ) Wh= did / da you uke pur Cmonese lessonr? ( 9b ) How mmy hour paweek houn. Fmm 19 f (par 1 month); Age (at ihat lime) yem old To 19 f Cyear/ month) Age (at that tirne) yem old

Appendix C /h/ Reading List Appndix D Identification Piuiidigm in Yiu & Fok (1 995) Appcndix E /si/ Identification Pariidigm in the Present Study Appendix F Iful Identilicrtion Pmdigm in the Present Study References

Abramson, A. S. (1976). Thai tones as a reference system. In T. Gething, H. J. Kullavanijaya (Eds.), Tai Linguistics in Honor of Fang Kuei-Li. Bangkok: Chulalongkorn University Press. . (1978). Static and dynamic acoustic cues in distinctive tones. Language and Speech. 21 (4), 319-325. . (1997). The Thai tonal space. in A. S. Abramson (Ed.), Southeast Asian Linguistic Studies in Honour of Vichin Panupong. Bangkok: Chulalongkorn University Press. 1-10, Arendrup, B. (1994). Chinese. In R. E. Asher, & J. M. Y. Simpson (Eds.), The Encyclopedia of Language and Linguistics. England: Pergamon Press. 5 16-524. Bacon-Shone, J. & Bolton, K. (1998). Charting multilingualism: language censuses and language surveys in Hong Kong. In M. C. Pennington (Ed.), Language in Hong Kong ut Century's End. 43- 90.

Bauer, R. S. (1999). Modern Cantonese . New York: Mouton de Gmyter.

Bauer, R. S. & Benedict, P. K. (1997). Modern . Berlin: Mouton de Gruyter. Chan, M. K. M. (1987). Tone and melody in Cantonese. Retrieved September 6, 1999 from World Wide Web: c http:// deall.ohio-state.edu/chan.9/articles/bls 13.htm>. Also in (Proceedings of the Thirteenth Annual Meeting, Berkeley Linguistic Society, 26-37.)

Chao, Y. R. (1928). Studies in the Modem Wu-Dialects. Peking: Tsing Hua College Research Institute. Monograph. No. 4. . (1930). A system of tone letters. Le Maitrr Phonetique, 45, 24-27. . (1947). Cantonese Primer. New York: Greenwood Press. . (1951). The Cantian idiolect: an analysis of the Chinese spoken by a twenty-eight-month-old child. Semitic and oriental studies. Berkeley and Los Angels: University of California Press, 27-44.

, (1956). Tone, intonation, singing, chanting, recitative , tonal composition, and atonal composition in Chinese. In M., H. Halle, G. Lunt, H. McLean, and C. H. Van Schooneveld (Eds.), For Roman Jakobson: Essays on the Occasion of His Sixtieth Birthday. Mouton: The Hague. 52-59. . (1968). A Grammar of Spoken Chinese. University of California: Berkeley & Los Angeles. Cheng, C. C. (1977). Tonal correlations in Chinese dialects: a quantitative study . Studies in the Linguistic Sciences. 7 (2), 115-128. Cheung, S. H. N. (1972). Cantonese as Spoken in Hong Kong. Hong Kong: The Chinese University of Hong Kong. Ching, Y. C. T. (1984). Lexical tone pattern learning in Cantonese children. Language Learning and Communication. 3 (3), 243- 41 4.

Ching, Y. C. T., & Williams, R., & Van Hasselt, A. (1994). Communication of lexical tones in Cantonese alaryngeal speech. Journal of Speech and Hearing Research, 37, 557-57 1. Cho, S. W., & O' Grady, W. (1992). Language acquisition: the emergence of a grammar. In W. O'Grady & M. Dobrovolsky (Eds.), Contemporary Linguistic Analysis: An introduction. Toronto: Copp Clark Pitman Ltd. 423-424. Clumeck, H. (1976). Acquisition of the tonal contrasts of Mandarin. Unpublished manuscript, University of California, Berkeley. . (1980). The acquisition of tone. In G. H. Yeni- Komshian, J. F. Kavanagh, and C. A. Ferguson, (Eds.), Child Phonology: Production. Volume 1. New York: Academic Press. 257-277. Duanmu, S. (1990). A Forma1 Study of Syllable, Tone, , and Domain in Chinese Languages. Ph.D. dissertation, MIT. Eitel, E. J. (1947). A Chinese-English Dictionary in the Cantonese Dialect. Flege, J. E., MacKay, 1. R. A., & Meador, D. (1999). Native Italian speakers* perception and production of English vowels. Journal of the Acoustical Society of America, 106 (3,2973- 2987. Fok Chen, Y. Y. (1974). A Pcrceptual Study of Tones in Cantonese. Occasional Papers and Monographs, 18, (Center of the Asian Studies, University of Hong Kong. . (1979). The frequency of occurrence of speech sounds. In R, Lord (ed.). Hong Kong Language Papers. Hong Kong: Hong Kong University Press. 150-157. . (1984). The Teaching of Tones to Children with Profound Hearing Impairment. British Journal of Disordcrs of Communication, 19, 225-236. Fon, J., & Chiang, W. Y. (1999). What does Chao have to say about tones? A case study of Taiwan Mandarin. Journal of Chinese Linguistics. 27 (l), 13-37. Fourcin, A. J. (1978). Acoustic patterns and speech acquisition. In N. Waterson & C. Snow (Eds.), The Development of Communication. London: John Wiley. Fox, R., & Qi, Y. Y. (1990). Context effects in the perception of lexical tone. Journal of Chinese Linguistics, 18, 26 1-283. Fu, B. N. (1995). A system of Tone Features and Its Implications For the Representation of Tone. Ph.D. dissertation. Simon Fraser University. Gandour, J. (1979). Perceptual dimensions of Cantonese tones: a multidimensional scaling reanalysis of Fok's tone confusion data. Southeast Asian Linguistic Studies (4). Canberra : Department of Linguistics, Research School of Pacific Studies, Australia National University. 41 5-429.

. ( 1981). Perceptual dimensions of tone: evidence from Cantonese. Journal of Chinese Linguistics. 9 (l), January, 20-36. . (1984). Tone dissimilarity judgments by Chinese listeners. Journal of Chinese Linguistics. 12 (2), June, p. 235- 260c. . (1994). The phonetics of tone. In R. E. Asher & J. M. Y. Simpson (Eds.), The Encyclopedia of Language and Linguistics. Vol. 6, Oxford: Pergamon, 3116-3123.

Hashimoto, O. A. (1972). Studies in Yue Dialects 1: Phonology of Cantonese. Cambridge University Press. Hombert, 3. M. (1976). Difficulty of producing different Fo in speech. Abstract. Journal of Acoustic Society of America, 60, 544-545. Howie, J. M. (1976). Acoustical Studies of Mandarin Vowels and Tones. Cambridge University Press. Hyman, L. M. (1973). The role of Consonant types in natural tonal assimilation. In L.M. Hyman, (Ed.), Consonant Types and Tone. Southern Califarnia Occasional Papers in Linguistics, 1, 151- 179.

Ioup, G. & Tansomboon, A. (1987). The acquisition of tone: A Maturational Perspective. In G. Ioup, & S. H. Weinberger (Eds.). fnterlanguage Phonology: The Acquisition of a Second Language Sound System. Cambridge: NewBury House Publishers, 321-332. Jin, S. (1996). An Acoustic Study of Sentence Stress in Mandarin Chinese. Ph.D. Dissertation, The Ohio State University.

Jones, D. & Woo, KT. (1912). A Cantonese Phonetic Reader, University of London Press.

Kao, D. L. (1971). Structure of the Syllable in Cantonese. Mouton & Co., Netherlands. Kehoe, M., Gammon, C. S. & Buder, E. H. (1995). Acoustic correlates of stress in young children speech. Journal of Speech and Hearing Research, 38, 338-350. Kerman, J. (1980). Listening. 3rd edition. New York: Worth.

Mack, M. (1984). Early bilinguals: how monolingual-like are they? In M. Paradis & 1. Lebrun (eds.), Early Bilingualism and Child Development. Lisse: Swets and Zeitlinger B.V., 163-173. . (1988). Phonological transfer in a French-English bilingual child. In Papers presented ut Language Contact and Conflict Conference. Brussels: Belgium. Major, R. C. (1992). Losing English as a first language. The Modern Language Journal. 76. 190-208. . (1997). L2 acquisition, L1 loss, and the critical period hypothesis. In James, A. and Leather, J. (eds.). Studies- Language Speech: Structure and Process. Germany, Berlin: Mouton de Gruyter, 147-159.

Matthews, S. & Yip, V. (1994). Cantonese: A Comprehensive Grammar. London: Routledge. McQueen, James M. and Cutler, Anne. (1997). Cognitive Processes in Speech Perception. In W. J. Hardcastle and J Laver (Eds.), The Handbook of Phonetic Sciences. Cambridge, Blackwell. 566-585.

Moore, C, B., & Jongman, A. (1997). Speaker normalization in the perception of Mandarin Chinese Tones. Journal of the Acoustical Society of America. 102, 1864- 1877. Munro, M. J. (1993). Productions of English vowels by native speakers of Arabic: acoustic measurements and accentedness ratings. Language and Speech. 36(1), 39-66. Ng, M. L. & Kwok, C. L. (1999). Frequency of occurrence of Cantonese word initials, finals, and tones. ICPhS. San Francisco. 797-799. Nolan, F. (1997). Speaker recognition and forensic phonetics. In W. J. Hardcastle and J. Laver (Eds.), The Handbook of Phonetic Sciences. Cambridge: Blackwell Publishers Ltd., 744-767. Nooteboom, S. (1997). The of speech melody and rhythm. In W. J. Hardcastle and J. Laver (Eds.), The Handbook of Phonetic Sciences. Cambridge: Blackwell Publishers Ltd., 645. Ohala, J. J. (1978). Production of Tone. In V. A. Frornkin (Ed.), Tone: A Linguistic Survey. New York: Academic Press Inc., 5-39. Peng, S. H. (1995). Perceptual evidence of tonal coarticulation. OS U Working Papers in Linguistics, 45, 160-165, Peterson, G. E. & Lehiste, 1. (1960). Duration of syllable nuclei in English. Journal of Acoustic Society of America, 32 (6), 693- 703. Pike, K. L. (1943). Phonetics: A Critical Analysis of Phonetic Theory and a Technic for the Practical Description of Sounds. Ann Arbor: The University of Michigan Press. . (1948). Tone Language. Ann Arbor: University of Michigan Press.

Pollock, K. E., Brarnmer, D. M. & Hageman, C. F. (1993). An acoustic analysis of young children's production of word stress. Journal of Phonetics, 21, 183-203. Ramsey, S. R. (1987). The Languages of China. Princeton: Princeton University Press. Rose, P.J. (1988). On the non-equivalence of fundamental frequency and pitch in tonal description. In D. Bradley, E.J. A. Henderson, and M. Mazaudon (Eds.),. Prosodic Analysis and Asian Linguistics: to honor R. K. Sprigg. Pacific Linguistics, C- 104. The Australian National University. Ross, E. D., Edmondson, J. A., Seibert, G. B., & Chan, J. L. (1992). Affective exploitation of tone in Taiwanese: an acoustical study of tone latitude. Journal of Phonetics. 20, 441-456. Sagart, L. (1988). Glottalized tones in China and South-East Asia. In D. Bradley, E. S. A. Henderson, and M. Mazaudon, (Eds.), Prosodic Analysis and Asian Linguistics: to honor R. K. Sprigg. Pacific Linguistics, C-104. The Australian National University. Sancier, M. & Fowler, C. A. (1997). Gesturd drift in a biingual speaker of Brazilian Portuguese and English. Journal of Phonetics, 25, 421-436,

Shen, X. N. (1990). Tonal coarticulation in Mandarin. Journal of Phonetics, 18, 28 1-295. So, L. K. H. (1997). Tonal changes in Hong Kong Cantonese. In Wright, S. & Kelly-Homels, H. One Country, Two Systems, Three Languages: A Survey of Changing Language use in Hong Kong. Clevedon: Multilingual matters. 80-85. So, C. K. L. (1998). Difference in tonal productions between Canadian raised Cantonese and native Cantonese speakers, In K. Lee and M. Oliveira (Eds.), Proceedings of the 14th North West Linguistics Conference, held at Simon Fraser University on March 7-8, 1998, Department of Linguistics, SN, 156 - 165. . (1999a). A comparative study on Cantonese rising tones: native Cantonese speakers and Canadian raised Cantonese speakers. Canadian Acoustics. 27(3), 94-95. . (1999b). An Acoustic analysis of Cantonese Rising Tones. Poster session presented at the 138th Acoustical Society of America meeting (November 1-5, 19991, Columbus, OH.

Steinbergs, A. (1992). The classification of languages. In W. O'Grady, & M. Dobrovolsky (Eds.), Contemporary Linguistic Analysis: An introduction. Toronto: Copp Clark Pitman Ltd. 350 Tang, J. S. Y. & Maidment, J. A. (1996). Prosodic aspects of child- directed speech in Cantonese. In V. Hazan, S. Rosen, & M. Holland (Eds.), Speech Hearing and Language (an online version), Department of Phonetics and Linguistics, University College London. Retrieved February 6, 1999 from the World Wide Web: http:// ww.phon.ucl.ac.uk/home/shl9/tang/tangjm Tseng, C. Y. (1990). An Acoustic Phonetic Study on Tones in Mandarin Chinese. Institute of History & Philology Academia Sinica, Special Publications, No. 94, Taipei: NanKang. Tse, K. P. J. (1977). Tone acquisition in Cantonese: a longitudinal case study. Journal of Child Language. 5, 191-204. Tse, S. M. (1982). The acquisition of phonology: English versus Cantonese. Acta Psychologica Taiwanica. 24 (l), 1-7. Tuaycharoen, P. (1977). The Phonetic and Phonological Devdopment of a Thai Baby: From early Communicative Interaction to Speech. Unpublished Ph. D. Dissertation, University of London. Vance, T. J. (1977). An expecimental investigation of tone and intonation. Phonetica. 33, ,368-342. UCLA Language Materialsr Canfonrsr Language Profile. (2000, January 11). California, LA: UCLA. Retrieved January 11, 2000 from World Wide Web: http:llwww.Imp.ucla.edu/profileslprofcOL .htm Wang, J. (1997). The Acquisition of English Vowels by Mandarin ESL Icarners: A Study of Production and Perception. MA Thesis, Simon Fraser University. Wang, W. S. Y. (1967). Phonological features of tone. International Journal of American Linguistics. 33 (2), p.93- 105.

Wang, W. S. Y. & Cheng, C. C. (1987). tones in modern dialects. In R. Channon, 1. Lehiste, and L. Shocky (eds.). In Honor of Iise Leshiste: Ilse Lehiste Peuhend. Foris Publications. 5 15-523. Wong, P. C. M. (1998). Speaker Norrnalization in the Perception of Cantonese Level Tones: EfSect of Context-Target Pitch Distance. MA Thesis. (an online version). Retrieved January 3, 2000 from World Wide Web:

Yiu, E., M. & Fok, A. Y. (1995). Lexical tone disruption in Cantonese aphasic speakers. Clinical Linguistics & Phonetics. 9, 79-92. Zee, E. (1977). (1978). Duration and intensity as correlates of Fo. Journal of Phonetics. 6, 213-220. . (1995). Temporal organization of syllable production in Cantonese Chinese. Proceedings of the Xlllth International Congress of Phonetic Sciences. Stockholm, Sweden, 250-3. . (1999) Chinese (Hong Kong Cantonese). In International Phonetic Association, Handbook of the lnternational Phonetic Association: A Guide to the Use of the International Phonetic Alphabet. Cambridge: Cambridge University Press. 58-60.