UNIVERSITY OF HAWAI'I LIBRARY

THE ACQUISITION OF ENGLISH SPEECH BY

ADULT CIDNESE ESL AND EFL LEARNERS

A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI'I IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PIDLOSOPHY

IN

LINGUISTICS

August 2003

By

Te-fang Hua

Dissertation Committee:

Ann M. Peters, Chairperson Patercia J. Donegan Kenneth L. Rehg James Dean Brown Martha Crosby ©Copyright2003 by Te-fang Hua

iii ACKNOWLEDGEMENTS

My sincerest thanks to my advisor, Dr. Ann M. Peters, for her advice and encouragement over the years I have spent conceputualizing, implementing, and writing up this dissertation. Her extensive detailed comments on earlier drafts of this dissertation proved to be extremely insightful and beneficial to my later revisions. I thank her for always supporting and encouraging me to live up to my full potentials and for being a wonderful role model.

I extend the deepest gratitude to my committee members. I thank Dr. James Dean Brown for inspiring me to pursue this topic and for his valuable comments on the methodology of this dissertation. I thank Dr. Kenneth L. Rehg and Dr. Patricia J. Donegan for giving me valuable comments and suggestions which lead to significant revisions. To my other committee member Dr. Martha Crosby, I thank her for encouraging me to look at my dissertation from a broader perspective.

I am deeply indebted to all the participants of the study -- the students at the Tarnkang University, Taipei, the University of California, Irvine, and the University of Hawai'i who generously agreed to participate in this experiment. My special thanks to Dr. Sharon Fahn of Tarnkang University, Dr. Safia Agarwa, and Ms. Sushama Archaya for their generous help in recruiting participants, and to Dr. Hak-khiam Jang, Dr. I-Ru Su, Dr. Arnold Li, Laura Sacia, and Joan Wylie for their valuable input on earlier drafts of the test stimuli.

I have profited immensely from Dr. Victoria Anderson, from whom I acquired the knowledge and skills necessary to conduct the phonetic experiment. Her advice was valuable to my dissertation.

My gratitude goes to Dr. William O'Grady, for his advice and for being an excellent role model as a scholar and a teacher.

IV I would like to acknowledge the Chiang Ching-kuo Foundation for International Scholarly Exchange for a one-year fellowship for me to conduct my dissertation research. I would also like to thank the Department of Linguistics of the University of Hawai'i for providing a Graduate Assistantship over the first three years of my doctoral study which made it possible for me to continue my doctoral program at the University of Hawai 'i.

I thank my teachers at the Department of Linguistics, Dr. Anatole Lyovin, Dr. Byron W. Bender, Dr. Robert A. Blust, Dr. Michael L. Forman, Dr. George W. Grace, and Dr. William O'Grady, and Dr. Lawrence Reid, who have guided me to discover the beauty and complexity of linguistics.

My special thanks go to Wendy Onishi and Jennifer Kanda, secretaries at the Department of Linguistics, and Caroline Paet, secretary at the Department of Second Language Studies for their professional assistance on administration issues.

Thanks go to my linguistics buddies, Shanshan Wang, Dr. Katsura Aoyama, Kazuko Imoto, Cathy Kawahata, Dr. Terry Klafehn, Marilyn Plumlee, Dr. Moriyo Shimabukuro of the University of the Ryukyus, Caroline Steele, Dr. Sarka Stivarova, Min Sun Song, who were there to toss ideas around, eat, laugh, hang out, and give me encouragement at various stages of this dissertation research.

My deepest thanks go to my family, my father, Chih-Iong Hua, who inspired me to study foreign languages and listeratures, my mother, Yu-hsiao Shih, for encouraging me to do whatever make me happy, my brothers and sisters John, Teresa, Darren, Brigitte, and Davan, for always believing that I am the best, and my husband, Dr. Jonas Chen, for his technical support to the computer-assisted data analysis of this dissertation, and for his love, support, patience, and interest in everything I do.

Last but not least, my warmest thanks go to my furry friends, Xiongxiong and Maomao, for their loving company throughout the years.

v ABSTRACT

THE ACQUISITION OF ENGLISH SPEECH RHYTHM BY ADULT CHINESE ESL AND EFL LEARNERS

by Te-fang Hua

Chairperson of the Supervisory Committee: Professor Ann M. Peters Department of Linguistics

Mandarin Chinese speakers are frequently reported by ESL professionals to speak English in a -timed rhythm. However, little empirical evidence is available to physically characterize their speech rhythm in English. In view of the paucity of information available on this issue, the current study compares speech samples of Taiwan Mandarin (TM) and English speakers with respect to their difficulties in producing English rhythm by analyzing three well-attested correlates of in English, duration, intensity, and pitch.

The Participants in this study were 10 native speakers of English, 10 TM speakers learning English as a Second Language (ESL), and 10 TM speakers learning English as a Foreign Language (EFL). The subjects were requested to read two prosodically diverse sets of sentences, with Type A featuring a single strong syllable or two widely spaced strong and Type B featuring a regular alternation between strong and weak syllables.

The results showed that the TM ESL and EFL speakers experienced difficulties with Type A but not with Type B rhythm. For Type A sentences, the TM speakers produced relatively shorter, softer, and lower-pitched strong syllables and relatively longer, louder, and higher-pitched weak syllables than the English speakers. The combination leads to less duration, intensity, and pitch differentiation between the strong and the weak syllables. Additionally, the TM speakers produced fewer levels of stress

VI than the English speakers did. Increased proficiency and exposure is correlated with positive changes in the use of duration, intensity, and pitch as correlates for stress.

The current study strongly challenges using "syllable-timing" as a cover tenn in describing the speech rhythm of TM speakers because they were apparently able to manage at least one type of English stress-timing well. We propose multiple parameters under the traditional rhythmical category "stress-timing" by building in possible language-specific variations as to the number of unstressed syllables permitted within a and the number of prosodically weak syllables within a higher prosodic domain.

Vll TABLE OF CONTENTS

Acknowledgements iv Abstract vi List of Tables xiv List of Figures xxi CHAPTER 1: Introduction 1 CHAPTER 2: Characteristics of rhythm in English and in Chinese 5 2.1 Defining rhythm 5 2.2 Speech rhythm in English 6 2.2.1 Characteristcs of English speech rhythm 6 2.2.1.1 Tendency toward 6 2.2.1.2 Metrical representation of stress 10 2.2.1.3 The acoustic correlates of stress in English 12 2.2.2 Acquisition of English rhythm 15 2.3 Characteristics of speech rhythm in Mandarin Chinese 23 2.3.1 Foot strucure in Mandarin 23 2.3.2 Full-toned versus neutral-toned syllables 25 2.3.3 Idiosyncrasies of lexical rhythm across dialects 30 2.3.4 The Phonetic correlates of stress in Mandarin Chinese 33 2.3.5 Are BM and TM -timed, syllable-timed, or foot-timed? 35 2.4 Potential difficulties Taiwan Mandarin and Beijing Mandarin speakers might have with the production of English speech rhythm 36 CHAPTER 3: The present study 40 3.1 Purpose of the study 40 3.2 Research questions 40 3.2.1 Do Taiwan mandarin speakers have difficulties with the duration, intensity, or pitch of strong syllables, weak syllables, or both? 42

viii 3.2.2 Do Taiwan mandarin speakers produce less differentiation in duration, intensity, or pitch between strong and weak syllables than English speakers? 43 3.2.3 Do Taiwan Mandarin speakers correlate duration, intensity, and pitch with the target stress patterns in an English-like way? 44 3.2.4 Do TM speakers produce more English-like duration, intensity, and pitch patterns with improved proficiency and exposure to English? 45 3.2.5 Are duration, intensity, and pitch coordinated as correlates of stress for Taiwan Mandarin speakers? 46 3.3 Method 47 3.3.1 Subjects 47 3.3.2 Materials 48 3.3.3 Procedures 53 3.3.4 Analyses 54 3.3.4.1 Acoustical analyses 54 3.3.4.2 Normalization of data 63 3.3.4.3 Statistical analyses 68 CHAPTER 4: Results and discussion (1): Duration 70 4.1 Duration patterns of Type A sentences 71 4.1.1 Absolute duration patterns 71 4.1.2 Relative durations Patterns 78 4.1.3 Significant differences between NS, ESL and EFL 85 4.1.3.1 Differences between NS and ESL vs. differences between NS and EFL 86 4.1.3.2 Significant differences between ESL and EFL 87 4.1.4 Correlation of duration patterns between speaker groups 91 4.1.5 Test reliability 92 4.1.6 Summary of results for Type A sentences 93 4.2 Duration patterns of Type B sentences 94 4.2.1 Absolute duration of syllables 94 4.2.2 Relative duration patterns 101 ix 4.2.3 Significant differences between NS, ESL and EFL 108 4.2.3.1 Differences between NS and ESL vs. differences between NS and EFL 110 4.2.3.2 Significant differences between ESL and EFL 111 4.2.4 Correlation of duration patterns between speaker groups 114 4.2.5 Test Reliability 114 4.2.6 Summary of duration results for Type B sentences 115 4.3 Discussion of the duration in Type A and Type B sentences 117 4.3.1 Do TM speakers have difficulty lengthening strong syllables, shortening weak syllables, or both? 117 4.3.2 Do TM speakers use duration as a correlate of stress? 121 4.3.3 Do TM speakers produce smaller duration contrasts between strong and weak syllables than English speakers? 126 4.3.4 Do TM speakers produce duration patterns that are closer to the target speech rhythm with improved proficiency and increased exposure to English? 133 4.3.5 Summary for the discussion of duration 134 CHAPTER 5: Results and discussion (2): Intensity 136 5.1 Intensity patterns of Type A sentences 137 5.1.1 Intensity patterns in dB 137 5.1.2 Relative intensity patterns 144 5.1.3 Significant differences between NS, ESL and EFL 151 5.1.3.1 Differences between NS and ESL vs. differences between NS and EFL 153 5.1.3.2 Significant differences between ESL and EFL 154 5.1.4 Correlation of intensity patterns between speaker groups 157 5.1.5 Test reliability 158 5.1.6 Summary of intensity results for Type A sentences 159 5.2 Intensity patterns of Type B sentences 161 5.2.1 Intensity patterns in dB 161 5.2.1.1 Relative intensity patterns 167 x 5.2.2 Significant differences between NS, ESL and EFL 174 5.2.2.1 Differences between NS and ESL vs. differences between NS and EFL 176 5.2.3 Significant differences between ESL and EFL 177 5.2.4 Correlation of intensity ratios between speaker groups 178 5.2.4.1 Test Reliability 179 5.2.5 Summary of intensity results for Type B sentences 180 5.3 Discussion of the intensity in Type A and Type B sentences 182 5.3.1 Do TM speakers have difficulty with the intensity of strong syllables, the intensity of weak syllables, or both? 182 5.3.2 Do TM speakers use intensity as a correlate of stress? 186 5.3.3 Do TM speakers produce smaller intensity contrasts between strong and weak syllables than English speakers? 189 5.3.4 Do TM speakers with improved proficiency and increased exposure to English produce intensity patterns that are closer to the target speech rhythm? 195 5.3.5 Summary for the discussion of intensity 197 CHAPTER 6: Results and discussion (3): Pitch 199

6.1 F0 patterns of Type A sentences 200

6.1.1 Fo frequency in Hz 201

6.1.2 Fo frequency in semitone ratios 208 6.1.3 Significant differences between NS, ESL and EFL 215 6.1.3.1 Differences between NS and ESL vs. differences between NS and EFL 217 6.1.3.2 Significant differences between ESL and EFL 218 6.1.4 Correlation of pitch patterns between speaker groups 220 6.1.4.1 Test Reliability 221 6.1.5 Summary of pitch results for Type A sentences 222 6.2 Pitch patterns of Type B sentences 224

6.2.1 Fo frequency in Hz 224 6.2.2 FO frequency in semitone ratios 231 xi 6.2.3 Significant differences between NS, ESL and EFL 238 6.2.3.1 Differences between NS and ESL vs. differences between NS and EFL 240 6.2.3.2 Significant differences between ESL and EFL 241 6.2.4 Correlation of pitch patterns between speaker groups 242 6.2.5 Test reliability 242 6.2.6 Summary of results for Type B sentences 243 6.3 Discussion of the pitch in Type A and Type B sentences 244 6.3.1 Do TM speakers have difficulty with the pitch of strong syllables, the pitch of weak syllables, or both? 244 6.3.2 Do TM speakers use pitch as a correlate of stress? 249 6.3.3 Do TM speakers produce smaller pitch contrasts between strong and weak syllables than English speakers? 253 6.3.4 Do TM speakers with improved proficiency and increased exposure to English produce pitch patterns that are closer to the target speech rhythm? 260 6.3.5 Summary for the discussion of pitch 262 CHAPTER 7: Coordination among duration, intensity, and pitch 265 7.1 The coordination of duration, intensity, and pitch of syllables in Non-final positions 267 7.1.1 Significant differences in duration, intensity, and pitch 267 7.1.2 The variations in duration, intentisy, and pitch and stress 270 7.1.3 Differentiation between strong and weak syllables in non-final position.. 274 7.2 Duration, intensity, and pitch of syllables in final position 280 7.2.1 Significant differences in duration, intensity, and pitch between pairs of subject groups · 281 7.2.2 The variations in duration, intentisy, and pitch and stress 282 7.2.3 Differentiation between strong and weak syllables in final position 286 7.3 Summary 289 CHAPTER 8: Conclusion 291 8.1 Findings and implications 291 8.2 Strengths and limitations of the currenty study 296 XlI 8.2.1 Strengths 296 8.2.2 Limitations 297 8.3 directions for further research 301 8.3.1 Production of speech rhythm 301 8.3.2 Perception of stress 304 8.3.3 A quantitaive representation of speech rhythm , 306 8.3.4 Rhythm and speech processing 308 8.3.5 Pedagogy for teaching speech rhythm 309 Appendix A: List of Experimental Sentences 313 A.l Type A Sentences 313 A.2 Type B Sentences 314 Appendix B: Segmentation Criteria 315 B.l Type A Sentences 315 B.2 Type B Sentences 321 Bibliography 327

xiii LIST OF TABLES

Table 2.1 Tones in Mandarin Chinese 27 Table 3.1 Profile of the three subject groups 47 Table 3.2 Labels, stress patterns, and syllable numbers of test sentences 53 Table 3.3 The Hertz to Semitone conversion chart 66 Table 3.4 Mathematic derivation of semitones from frequency in Hz 67 Table 4.1 Mean syllable durations in ms for Type A sentences 72 Table 4.2 Mean syllable duration as its percentage of the total sentence duration for Type A sentences 79 Table 4.3 Student's t-test scores for duration (%) of individual syllables between pairs of groups for Type A sentences 85 Table 4.4 Number of strong and weak syllables with duration (%) significantly different from NS in non-final vs. final positions for Type A sentences ...... 86 Table 4.5 Number of strong and weak syllables with duration (%) significantly different between EFL and ESL in non-final vs. final position for Type A sentences 88 Table 4.6 Number of strong and weak syllables with duration (%) significantly different between EFL and ESL speakers categorized as content and function in non-final vs. final position 89 Table 4.7 Pearson Product-moment Correlation Coefficients for mean syllable duration between groups for Type A sentences 91 Table 4.8 Test-retest reliability for syllable duration from three productions for Type A sentences 92 Table 4.9 Mean syllable durations in ms for Type B sentences 95 Table 4.10 Mean syllable duration as its percentage of the total sentence duration for Type B sentences 102

xiv Table 4.11 Student's t-test scores for durations (%) of individual syllables between groups for Type B sentences 109 Table 4.12 Number of strong and weak syllables with durations (%) significantly different from NS in non-final vs. final positions for Type B sentences ...... 110 Table 4.13 Number of strong and weak syllables with duration (%) significantly different between EFL and ESL in non-final and final positions for Type B sentences 112 Table 4.14 Pearson Product-moment Correlation Coefficients for mean syllable duration between pairs of groups for Type B sentences 114 Table 4.15 Test-retest reliability for syllable duration from three productions of Type B sentences 115 Table 4.16 Number of strong and weak syllables with duration (%) significantly different from NS in non-final vs. final positions for Type A and Type B sentences 117 Table 4.17 Number of strong, weakly stressed, and unstressed syllables with duration (%) significantly different from NS in non-final vs. final position for Type A sentences 118 Table 4.18 Group average duration (%) of strong and weak syllables in non-final and final positions for Type A and Type B sentences 120 Table 4.19 Group average duration (%) of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A sentences 121 Table 4.20 Number of strong and weak syllables classified as "+LRPS" or "-LRPS" in non-final and final positions for Type A and Type B sentences 122 Table 4.21 Number of "+LRPS" vs "-LRPS" strong, weakly stressed, vs. unstressed syllables in non-final vs. final position for Type A sentences 125 Table 4.22 Average duration (%) of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A and Type B sentences 126 Table 4.23 Duration contrasts (%) between strong and weak syllables in non-final position for Type A sentences 127 Table 4.24 Duration contrasts (%) between strong and weak syllables in final position for Type A sentences 128 xv Table 4.25 Duration contrasts (%) between strong and unstressed syllables in non- final and final position for Type A sentences 130 Table 4.26 Duration contrasts (%) between stressed and unstressed syllables in non- final and final position for Type A sentences 131 Table 4.27 Duration contrasts (%) between strong and weak syllables in non-final positions for Type B sentences 131 Table 4.28 A comparison of duration results between ESL and EFL. 133 Table 5.1 Group Average intensity in dB of individual syllables for Type A sentences 138 Table 5.2 Group Average intensity ratios of individual syllables for Type A sentences 145 Table 5.3 Student's t-test scores for intensity ratios of individual syllables between groups for Type A sentences 152 Table 5.4 Number of syllables with intensity ratios significantly different from NS as strong or weak in non-final and final positions for Type A sentences ..... 153 Table 5.5 Number of strong and weak syllables with intensity ratios significantly different between EFL and ESL in non-final vs. final positions for Type A sentences 154 Table 5.6 Number of strong and weak syllables with intensity ratios significantly different between EFL and ESL speakers categorized as content and function in non-final vs. final positions for Type A sentences 155 Table 5.7 Pearson Product-moment Correlation Coefficients for mean syllable intensity ratios between groups for Type A sentences 157 Table 5.8 Test-retest reliability for intensity ratios from three productions for Type A sentences 159 Table 5.9 Group Average intensity in dB of individual syllables for Type B sentences 162 Table 5.10 Group Average intensity ratios of individual syllables for Type B sentences 168

xvi Table 5.11 Number of syllables with intensity ratios significantly different from NS in categories of strong and weak in non-final and final positions for Type B sentences 176 Table 5.12 Number of syllables with intensity ratios significantly different between EFL and ESL in categories of strong and weak in non-final and final positions for Type B sentences 178 Table 5.13 Pearson Product-moment Correlation Coefficients for mean intensity ratios between pairs of groups for Type B sentences 179 Table 5.14 Test-retest reliability for intensity ratios from three productions of Type B sentences 180 Table 5.15 Number of strong and weak syllables with intensity ratios significantly different from NS in non-final vs. final positions for Type A and Type B sentences 182 Table 5.16 Number of strong, weakly stressed, and unstressed syllables with intensity ratios significantly different from NS in non-final vs. final position for Type A sentences 183 Table 5.17 Group average intensity ratios of strong and weak syllables in non-final and final positions of Type A and Type B sentences 184 Table 5.18 Group average intensity ratios of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences 185 Table 5.19 Number of strong and weak syllables classified as "+IRPS" or "-IRPS" in non-final and final position for Type A and Type B sentences 186 Table 5.20 Number of "+IRPS" vs "-IRPS" strong, weakly stressed, vs. unstressed syllables in non-final vs. final position for Type A sentences 188 Table 5.21 Average intensity ratios of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A and Type B sentences 189 Table 5.22 Intensity contrasts (ratio) between strong and weak syllables in non-final positions for Type A sentences 190 Table 5.23 Intensity contrasts (ratio) between strong and weak syllables in final positions for Type A sentences 192

xvii Table 5.24 Intensity contrasts (ratios) between strong and unstressed syllables in non- final and final position for Type A sentences 193 Table 5.25 Intensity contrasts (ratios) between stressed and unstressed syllables in non-final and final position for Type A sentences 193 Table 5.26 Intensity contrasts (ratio) between strong and weak syllables in non-final positions for Type B sentences 194 Table 5.27 A comparison of intensity results between ESL and EFL 196

Table 6.1 Group Average Fo in Hz of individual syllables for Type A sentences 202

Table 6.2 Group Average Fo in semitone ratios of individual syllables for Type A sentences 209 Table 6.3 Student's t-test scores for semitone ratios of individual syllables between NS, ESL and EFL for Type A sentences 216 Table 6.4 Number of syllables with semitone ratios significantly different from NS as strong and weak in non-final and final positions for Type A sentences ... 217 Table 6.5 Number of strong and weak syllables with semitone ratios significant different between EFL and ESL in non-final vs. final positions for Type A sentences 218 Table 6.6 Pearson Product-moment Correlation Coefficients for mean syllable semitone ratios between groups for Type A sentences 221

Table 6.7 Test-retest reliability for Fo frequency in semitone ratios from three productions of Type A sentences 222

Table 6.8 Group Average Fo in Hz of individual syllables for Type B sentences 225

Table 6.9 Group average Fo (Hz) of strong content words vs. weak function in non- final and final positions for Type B sentences 231

Table 6.10 Group Average Fo in semitone ratios of individual syllables for Type B sentences 232 Table 6.11 Student's t-test scores for semitone ratios of individual syllables between groups for Type B sentences 239 Table 6.12 Number of strong and weak syllables with semitone ratios significantly different from NS in non-final vs. final positions for Type B sentences ...... 240

xviii Table 6.13 Number of strong and weak syllables with semitone ratios significantly different between EFL and ESL in non-final vs. final positions for Type B sentences 241 Table 6.14 Pearson Product-moment Correlation Coefficients for mean semitone ratios between groups for Type B sentences 242 Table 6.15 Test-retest reliability for syllable semitone ratios from three productions of Type B sentences 243 Table 6.16 Number of strong and weak syllables with semitone ratios significantly different from NS in non-final vs. final positions for Type A and Type B sentences 245 Table 6.17 Number of strong, weakly stressed, and unstressed syllables with semitone ratios significantly different from NS in non-final vs. final position for Type A sentences 246 Table 6.18 Group average semitone ratios of strong and weak syllables in non-final and final positions of Type A and Type B sentences 247 Table 6.19 Group average semitone ratios of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences 248 Table 6.20 Number of strong or weak syllables classified as "+PRPS" or "-PRPS" in final and non-final position for Type A sentences 249 Table 6.21 Number of "+PRPS" vs "-PRPS" strong, weakly stressed, vs. unstressed syllables in non-final vs. final position for Type A sentences 251 Table 6.22 Average semitone ratios of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A and Type B sentences 252 Table 6.23 Group average semitone ratios of strong vs. weak syllables in non-final position for Type A sentences 254 Table 6.24 Group average semitone ratios of strong vs. weak syllables in final position for Type A sentences 256 Table 6.25 Pitch contrasts (ratios) between strong and unstressed syllables in non- final and final position for Type A sentences 258 Table 6.26 Pitch contrasts (ratios) between stressed and unstressed syllables in non- final and final position for Type A sentences 258 xix Table 6.27 Group average semitone ratios of strong vs. weak syllables in non-final position for Type B sentences 259

Table 6.28 A comparison Fo results between ESL and EFL 261 Table 7.1 Number of strong and weak non-final syllables for Type A sentences with significant greater or smaller duration, intensity, or semitone ratios than those of NS 268 Table 7.2 Number of strong and weak non-final syllables with duration, intensity, or semitone ratios significantly greater or smaller than those of NS for Type B sentences 269 Table 7.3 Number of strong and weak non-final syllables classified as Raised or Lowered for Type A sentences 271 Table 7.4 Number of strong and weak non-final syllables classified as "+SRPS" or "-SRPS" for Type B sentences 273 Table 7.5 Duration, intensity, and semitone ratios of strong versus weak syllables in non-final positions for Type A sentences 274 Table 7.6 Duration, intensity, and semitone ratios of strong versus weak syllables in non-final positions for Type B sentences 278 Table 7.7 Number of strong and weak syllables in final position of Type A sentences classified as significantly different in duration, intensity, and/or pitch 281 Table 7.8 Number of strong and weak syllables in final position of Type B sentences classified as significantly different in duration, intensity, and/or pitch 282 Table 7.9 Number of strong and weak final syllables classified as "+SRPS" or "- SRPS" for Type A sentences 283 Table 7.10 Number of strong final syllables classified as "+SRPS" or "-SRPS" for Type B sentences 285 Table 7.11 Duration, intensity, and semitone ratios of strong versus weak syllables in final positions for Type A sentences 287

xx LIST OF FIGURES

Figure

Figure 3.1 The sampling of the Extreme Point Fosfor the utterance Jane is not my mom 60 Figure 3.2 The corresponding Extreme Point Fos on the original pitch track for the utterance Jane is not my mom 61 Figure 3.3 The Middle Fos, the Fosat Peak Intensity, the Mean Fos, and Extreme Point Fos for the utterance Jane is not my mom 62 Figure 4.1 Mean syllable durations (ms) for sentences Al through A7 77 Figure 4.2 Mean syllable durations (%) for sentences Al through A7 84 Figure 4.3 Distribution of significant differences between ESL and EFL in terms of the spread of difficulties to ESL and/or EFL 90 Figure 4.4 Mean syllable durations (ms) for sentences B1 through B7 100 Figure 4.5 Mean syllable durations (%) for sentences B1 through B7 107 Figure 4.6 Distribution of significant differences in syllable duration between ESL and EFL speakers for Type B sentences 113 Figure 4.7 Number of strong and weak non-final syllables classified as "+LRPS" or "-LRPS" for Type A and Type B sentences 123 Figure 4.8 Number of strong and weak final syllables classified as "+LRPS" for Type A and Type B sentences 124 Figure 4.9 Group average duration (%) of strong vs. weak syllables in non-final position for Type A sentences 127 Figure 4.10 Group average duration (%) of strong vs. weak syllables in final position for Type A sentences 129 Figure 4.11 Group average duration (%) of strong vs. weak syllables in non-final position for Type B sentences 132 Figure 5.1 Group Average peak syllable intensity in dB for sentences Al through A7 143 xxi Figure 5.2 Group Average intensity ratios for sentence Al through A7 150 Figure 5.3 Distribution of significant differences in intensity ratios between ESL and EFL in terms of the spread of difficulties to EFL and/or EFL 156 Figure 5.4 Group Average peak syllable intensity in dB for sentences Bl through B7. 167 Figure 5.5 Mean syllable intensity ratios for sentences B1 through B7 173 Figure 5.6 Number of strong and weak syllables classified as "+IRPS" or "-IRPS" for Type A and Type B sentences 187 Figure 5.7 Group average intensity ratios of strong vs. weak syllables in non-final position for Type A sentences 191 Figure 5.8 Group average intensity ratios of strong vs. weak syllables in final position for Type A sentences 192 Figure 5.9 Group average intensity ratios of strong vs. weak syllables in non-final position for Type B sentences 194

Figure 6.1 Group Average Fo Frequency (Hz) for sentences Al through A7 207

Figure 6.2 Group Average FoFrequency in semitone ratios for sentence Al through A7214 Figure 6.3 Distribution of significant differences in semitone ratios between ESL and EFL in terms of the spread of difficulties to ESL and/or EFL 220

Figure 6.4 Group Average Fo Frequency (Hz) for sentences Bl through B7 230 Figure 6.5 Group Average FO Frequency in semitone ratios for sentences B1 through B7237 Figure 6.6 Number of strong and weak syllables classified as "+PRPS" or "-PRPS" for Type A and Type B sentences 250 Figure 6.7 Group average semitone ratios of strong vs. weak syllables in non-final position for Type A sentences 255 Figure 6.8 Group average semitone ratios of strong vs. weak syllables in final position for Type A sentences 257 Figure 6.9 Group average semitone ratios of strong vs. weak syllables in non-final position for Type B sentences 259 Figure 7.1 Percentage of strong non-final syllables classified as "+SRPS" and weak non-final syllables classified as "-SRPS" for Type A sentences 271 xxii Figure 7.2 Percentage of strong non-final syllables classified as "+SRPS" and weak non-final syllables classified as "-SRPS" for Type B sentences 273 Figure 7.3 Duration, intensity, and pitch contrasts between strong and weak non-final syllables for Type A sentences for NS, ESL, and EFL speakers 277 Figure 7.4 Duration, intensity, and pitch contrasts between strong and weak non-final syllables for Type B sentences for NS, ESL, and EFL speakers 280 Figure 7.5 Percentage of strong final syllables classified as "+SRPS" and "-SRPS" final syllables classified as "+SRPS" for Type A sentences 284 Figure 7.6 Percentage of strong final syllables classified as "+SRPS" for Type B sentences 286 Figure 7.7 Duration, intensity, and pitch contrasts between strong and weak syllables in final position for Type A sentences for NS, ESL, and EFL speakers ...... 289

xxiii CHAPTER 1: INTRODUCTION

As fundamental as rhythm is in the perception and production of speech, the rhythm of many languages has not been positively determined for native speakers, much less even considered for non-native speakers. To date, how second language learners acquire the rhythm of a new language remains largely an empirical question. Mandarin Chinese speakers, for example, are frequently reported by professionals teaching and researching English as a second language to experience difficulties with English speech rhythm. Their rhythm has been informally described as staccato or syllable-timed. However, little empirical evidence is available to physically characterize what it is that makes their speech rhythm different from that of the native speakers of English.

What makes this inquiry particularly interesting is that although learning to produce a new speech rhythm seems one of the most difficult tasks for second language learners, it is one of the first linguistic features discovered by children learning their first language. It seems that prosodic aspects of language, which are usually acquired at an early stage for children learning their first language, may be especially difficult to break away from for second language learners.

Psycholinguists have long noted that newborn infants can discriminate languages differing in rhythmic structure, even when the speech itself has been low-pass filtered, namely, when only suprasegmental information is available (Mehler et aI., 1988; Moon et aI., 1993). At 9 months of age, infants already show reliable preference for unfamiliar words with a rhythmic pattern that is dominant in their parental language (Jusczyk, Cutler, & Rendanz, 1993). There is also evidence of rhythmic constraints on children's early speech. Numerous studies on the omissions of syllables in children's production of polysyllabic words reported that English-speaking children develop a strong preference for the Strong-Weak metrical footing at an early stage of language acquisition (Allen & Hawkins, 1980; Klein, 1978, 1981; Wijnen et aI., 1994; Gerken, 1994, 1996).

1 Furthermore, speech rhythm plays an essential role in the processing of speech. Over the past two decades, there is increasing evidence that native listeners draw heavily on rhythmic units as the most natural and efficient way of segmenting continuous speech into words (Mehler et aI., 1981; Cutler et aI., 1986; Cutler and Norris, 1988; Cutler and Butterfield, 1992; Otake et aI., 1993; Cutler, 1994). Results of these studies generally suggest that in a stress-timed language such as English, native listeners at the beginning of a stressed syllable, whereas in a syllable-timed language such as French, segmentation takes place at each syllable. In other words, the units of rhythm, perceptual prominence, and segmentation tend to converge for a given language. This means that placing stress on the right syllables is essential to the comprehension of speech.

For second language learners, deviance in speech rhythm is not only a key element for the detection of a foreign accent, but it also negatively affects the intelligibility of one's speech (Adams, 1979; Wang, 1987; Anderson-Hsieh, Johnson, & Koehler, 1992). If the speaker makes a stress mistake within a word, the listener may not be able to understand the word, even if all the individual sounds are pronounced correctly. Or if the speaker places stress on the wrong word(s) within a sentence, the listener may misunderstand the meaning of the sentences, even if all the individual sounds are pronounced correctly. Studies have shown that native speakers of English are less tolerant of prosodic deviance than of segmental errors when they are asked to judge intelligibility, acceptability of accent, or overall pronunciations of foreigners' speech (James, 1976; Johansson, 1978; Anderson-Hsieh, Johnson, & Koehler, 1992).

Although the rhythmic structure of English has been more extensively researched than most other languages, what we know about the second language acquisition of English speech rhythm to date reveals only the tip of an iceberg. Little empirical evidence is available in the form of acoustic measurements which physically characterize the difficulties non-native speakers might experience with English speech rhythm. There is still considerable speculation regarding which parameters (fundamental frequency, intensity, or duration) cause difficulty in speech rhythm for non-native speakers of English. Yet if the foreign learner wants to acquire the stress-timed rhythm of English

2 with any degree of success, it is apparent that they must know the means to produce English stress and in particular, what it is that distinguishes stressed from unstressed syllables in the stream of speech.

Previous research in this area is generally limited to the examination of the duration of isolated words in the speech of a small number of non-native speakers drawn from a single proficiency level (see Chapter Two for a full review). In view of the paucity of information available on the second language acquisition of speech rhythm and the limitationsrof the previous research, the current study compares Taiwan Mandarin and English speakers with respect to their phonetic realization and distribution of stress by analyzing all three well-attested correlates of stress in English and using longer stretches of speech produced by a relatively larger number of subjects representing two English proficiency levels.

Results of this study are expected to contribute toward our understanding of the acquisition of second language , particularly how second language learners develop the new rhythm, including using a phonetic process reserved for marking a different set of phonemic contrasts in their native language. In this case, the pitch correlate of stress in English is also used for signaling phonemic pitch contrasts in Chinese Mandarin. Results of the study may also be used for comparing first and second language acquisition of speech rhythm and for investigating the ways in which speakers of various native languages are similar or different in their use of duration, intensity, and pitch as correlates for stress in English. In addition, the current results may have implications for the construction of a theoretical model of the rhythm of language. Second language provide additional data that might help reveal any natural constraints on the structure or development of a speech rhythm.

The study will possibly interest psycholinguists researching the link between rhythm and speech segmentation. The results of the current study are expected to reveal rhythmic patterns of TM ESL/EFL learners, which may affect their strategy for segmenting speech in English. Although the link between rhythm and speech segmentation in the native language has been an active area of investigation, whether a 3 similar link exists in the second language is generally unknown. In addition, the current research methods, including our approaches for sampling, measuring, and normalizing duration, intensity, and pitch, may become useful for researchers who want to conduct experiments on the prosodic aspects of language.

The results are expected to raise the awareness of second language professionals about the kinds of difficulties that learners might have in learning to produce a new speech rhythm. In effect, the study hopes to encourage second language teachers and researchers to reexamine the ways in which stress and rhythm are introduced in current classroom practices. The current findings are expected to have implications for the development of teaching materials appropriate for TM speakers learning English as a second language.

The organization of the chapters will be as follows. Chapter One is a general introduction. Chapter Two reviews the phonetics and the phonology of rhythm in English and in Mandarin. Based on the review, a comparison is made between these two languages to make preliminary predictions about the types of difficulties TM speakers might have with English speech rhythm. With these predictions in mind, Chapter Three presents the research questions and details regarding the methodological design ofthe current study. Chapters Four through Six each focus on analysis of a different phonetic cue, in the order duration, intensity, and pitch. Results from both acoustical and statistical analyses are presented. Immediately following the presentation of results is a discussion of the results, which directly addresses the research questions. Chapter Seven combines results from duration, intensity, and pitch to examine possible interactions. Chapter 8 summarizes results and draws conclusions for the whole study. Finally the limitations and strengths of the study are discussed along with proposals for further research on the second language acquisition of rhythm.

4 CHAPTER 2: CHARACTERISTICS OF RHYTHM IN ENGLISH AND IN CHINESE

The organization of this chapter is as follows. In section 2.1, I discuss speech rhythm in general. In section 2.2, I discuss the characteristics of speech rhythm in English and review previous research on the second language acquisition of English speech rhythm. In section 2.3, I discuss the characteristics of speech rhythm in Beijing Mandarin (BM) and Taiwan Mandarin (TM). In section 2.4, I project possible difficulties TM speakers might have with English speech rhythm based on similarities and differences between TM and English with regard to their characteristics of speech rhythm.

2.1 DEFINING RHYTHM

Speech rhythm is defined as the phonological representations of isochrony, i.e., "the process of providing phonological representations of time with phonetic substance (i.e., duration)" (Podesva, 2003). Rhythm is defined as a durational phenomenon although in some languages (e.g., English), stress may serve to define rhythm (to be discussed in section 2.2).

Most linguists discuss rhythm in terms of languages being 'foot-timed' (also referred to as 'stress-timed'), 'syllable-timed', or 'mora-timed'. The distinction is based on the notion of isochrony such that duration is perceived to remain constant from one foot to the next in a foot-timed language (e.g., English), from one syllable to the next in a 'syllable-timed' language (e.g., French), and from one mora to the next in a 'mora-timed' language (e.g., Japanese) (Pike, 1945; Abercrombie, 1967; see McCawley, 1968 for 'mora-timing'). It can be illustrated by the following three sentences:

(1) (English): This is absolutely rediculous. (French): C'est absolument ridicule. (Japanese):Kore-wa zettai-ni bakageteiru.

5 The perceptual impression that stresses in English, syllables in French, and moras in Japanese seem to occur at equal intervals of time is perhaps one of the first triggers for proposing these three different timing units. However, it is important to note that isochrony is only intended by native speakers as a mental target. Phonetic isochrony in speech production is rarely confirmed empirically (for evidence against perfectly isochronous stress, see Classe, 1939; O'Connor, 1968; Uldall, 1971; Lea, 1975; Faure et aI., 1980; for evidence against perfectly isochronous syllables, see Delattre, 1966; Olsen, 1972; Pointon, 1980; Balasubramanian, 1980; Wenk and Wioland 1982, for evidence against perfectly isochronous moras, see Beckman 1982). Factors such as the intrinsic duration of a segment, the number of segments in a mora/syllable/foot, the number of syllables in a foot, the position of a mora/syllable/foot in an utterance (i.e., final ones tend to be longer than non-final ones) do matter (Dauer, 1982). However, if one could keep all of these factors under control, it may be possible to approximate phonetic isochrony.

2.2 SPEECH RHYTHM IN ENGLISH

I first discuss the characteristics of English speech rhythm in section 2.2.1. Then I review previous research on the second language acquisition of English speech rhythm in section 2.2.2.

2.2.1 CHARACTERISTCS OF ENGLISH SPEECH RHYTHM

2.2.1.1 Tendency toward isochrony English is commonly referred to as an archetypal stress-timed language. This view originates from the impression that stressed syllables in English to occur at more or less equal intervals of time as illustrated in (2).

(2) The 'teacher is 'interested in 'buying some 'books. (Pike, 1945, p.34)

Other things being equal, the more syllables there are in an interstress interval, the shorter they tend to be (Gimson, 1962; Halliday, 1970; Allen, 1975). Compare the following examples:

6 (3) a. Ken's here. b. Kenny's here. c. Kennedy's here.

Although each of them contains a different number of syllables in the first foot, the of the foot does not increase proportionately to the number of syllables. For example, sentence (3c) has twice the number of syllables as sentence (3a), but it does not take twice as long to say as (3a). Apparently, the addition of a syllable to a foot does not take fixed amount of time to utter. Instead, the durations of syllables can be stretched or compressed according to the number of syllables within the foot.

In addition, other things being equal, the more syllables there are in an interstress interval, the shorter the stressed syllable tends to be. Lehiste (1972) studied the timing of words spoken in isolation compared to the timing of the same words with monosyllabic or bisyllabic suffixes. Two native speakers of American English produced base words such as stick, speed, sleep, and shade in isolation, and then the same words followed by monosyllabic suffixes such as -y and -er and bisyllabic suffixes such as -ily, and -iness. Lehiste found that her subjects consistently compressed the duration of the base words according to the number of the syllables added to them, supporting the hypothesis that in reality the foot is the domain of timing.

One of the major consequences of stress-timing is thus the wide variation in syllable length. If a speaker intends to say each stretch of varying numbers of syllables within similar time limits, stretches with more syllables must be spoken faster than stretches with fewer syllables. It seems that the more syllables there are in an interval, the more compressed the duration of the unstressed syllables tends to become as well. This can cause the unstressed syllables to undergo various reductions or modifications. For example, consonants or vowels in unstressed syllables can be reduced or completely omitted in English.

(4) a./p:;)red/ [phred] 'parade' b./p:;)lisl [phlis] 'police' c. If:;)netIksl [fnethIkhs] 'phonetics' d. lay so hIm! [ay so 1m] 'I saw him.' e. lay wII got [ayl go] 'I will go.' 7 It is important to note that although there is a tendency toward isochronous stresses in the production of English (i.e., the more syllables there are in a foot, the shorter they become), stresses in English are by no means physically isochronous in speech production. Physical evidence for perfect isochrony is slight. Many experimental studies have disproved the existence of phonetic isochrony in English speech production.

Classe (1939) measured interstress intervals in English and found that they are measurablely longer when they contain more syllables. Bolinger (1965) examined the recordings of two lengthy sentences read by six English speakers. Accents were marked and the intervals between the accents were measured. His results did not support isochronic stresses, either.

O'Connor (1965) recorded a strictly rhythmic limerick in English spoken with a tap of the hand at each stress. However, even under such favorable conditions, isochrony could not be demonstrated. In a follow-up study (1968), he used a set of seven utterances, each containing three monosyllabic feet, with the first and third foot held constant, but with the second varied in segmental length from three to nine segments. Duration measurements showed that the variable foot had a clear tendency to greater duration as the number of segments increased. He found no evidence for equal interstress intervals in English.

Uldall (1971) studied the recording of a passage from "The North Wind and the Sun" read by David Abercrombie. The 45-second passage was divided into metrical feet, whose durations ranged from 260 to 870 ms. Similar to what was found in previous studies, he also reported that the average duration of all feet varied considerably according to the number of syllables. The results showed that the lengths of the inter­ stress intervals were not regular.

Lea (1980) found a linear relationship between the number of intervening unstressed syllables and the duration of the inter-stress intervals. The results seriously

8 contradict the claim that the number of intervening unstressed syllables has no effect on the duration of the interval.

Faure, Hirst, and Chafcouloff (1980) had two British subjects record a number of sentences. The recordings were played to three English phoneticians to identify the stresses. The durations of the inter-stress intervals were then measured. They concluded that stressed syllables in English were not even roughly isochronous because the inter-stress intervals varied considerably in length with the number of syllables.

Results from such studies show that it is extremely difficult to confirm isochrony of stresses in production by measuring interstress intervals instrumentally. However, evidence shows that native listeners seem to perceive stresses as more isochronous than they actually are.

Lehiste (1975; 1977) found that English speakers tend to perceive isochrony in utterances which are not actually isochronous. In one of her experiments, she played four noise-filled intervals to a group of 30 listeners. Of the four sequences of intervals, three were of equal length, and the fourth was either longer or shorter. The duration of the intervals corresponded to the range observed in actual productions of metric feet in another production experiment. The subjects were first asked to identify the longest intervals, and on a second presentation, to identify the shortest intervals. Results showed that in order to obtain significant agreement that a given interval was actually longer or shorter than the rest, an increment between 30-100 milliseconds was needed. Increments smaller 30 milliseconds were never reliably detected, which accounts for most of the differences observed in the production experiment. The results suggest that some of the differences between the lengths of intervals are actually below the perceptual threshold. In fact, the just noticeable differences in actual speech may be even larger than those obtained for synthesized noise since there is independent evidence that listeners have better discrimination of length with non-speech than with speech materials (Fujisaki, Nakamura& Imoto, 1973).

9 2.2.1.2 Metrical representation of stress The perception of timing in English centers around stress. To understand the characteristics of English rhythm, it is important to understand how stress is organized in English. In English, stress is a property of words and phrases. Therefore, the citation form of the verb "perMIT" has its stress on the second syllable and the noun "PERmit" has its stress on the first syllable. The requirement of stress often exempts function words, which are usually realized as stressless in English. For example, English the is usually pronounced as unstressed [5e] in connected speech, but stressed as [5i] or [51\] when pronounced in isolation.

When individual words are strung together to form connected speech, the underlying stress patterns of the words undergo readjustments and a new hierarchy of prominence is formed. For example, when considered separately, both words "baby" in (5a) and "sitter" in (5b) share the same Strong-Weak rhythmic pattern. But when they are connected into a single utterance in (5c), their stress patterns are reorganized in a hierarchical way.

(5) x x x x x x x x x x x x x a.baby b. sitter c. baby sitter

Following Metrical Theory, stress is expressed, not as features, but as relative prominence. Prominence is represented by a metrical grid, which depicts a sequence of beats that vary in strength.

10 (6) x x x x x x x Look at Annie.

The grid columns are intended to depict the temporal structure of the beats. The height of the grid columns represents the relative prominence. Intrinsically, various levels of rhythm may be present in an utterance at one time. That is, a given utterance may contain several nested hierarchical rhythmic patterns. For example, given the six-syllable word reconciliation, which has the stress pattern shown in (7), a native speaker of English will find it natural to tap on one, two, three, or six syllables, but not to tap on four or five syllables (Hayes, 1995).

(7) x [ a ] 1 taps A x x [re a ] 2 taps B x x x [re ci a ] 3 taps C x x xxx x [reconciliation] 6 taps D reconciliation

By definition for any rhythmic structure that has multiple levels, any beat on a higher level must also serve as a beat on all lower levels. And stresses on any level are nested within the lower levels on the hierarchy. For example, in (7), level D includes all stresses at levels A, B, C; level C includes all stresses at levels A, B, etc.

The metrical representation of stress is consistent with three major phonological characteristics of stress (Kager, 1995; Hayes, 1995). First, stress is "culminative." In English, there is a single strongest syllable at the foot level and at the intonational phrase level. For example, the four-syllabic word babysitter defines two stress feet at the lowest level. The constituents at the lower level are grouped together to form larger units at higher levels. At any given level of the prosodic hierarchy, there are elements of greater and lesser prominence but there is only a single strongest stressed syllable within each domain. 11 Second, stress is rhythmically distributed. In English, equal timing of stresses tends to occur at multiple levels, as depicted in (8) below.

(8) x x x x x x x x x x x x x x x x xx xx xx twenty-seven Mississipi legislators (Hayes, 1995, p.28)

Third, stress is hierarchical. In English, there are multiple levels of stresses. At least three levels of stress are present in English, (1) primary stress, (2) , and (3) no stress.

(9) 2 1 3 abstraction

This is not to suggest that there is a direct mapping between the strength of the acoustic signals and the levels of prominence on a metrical grid. The location in which a syllable occurs in an utterance, for example, could have an effect on its acoustic realization. For instance, the pitch contours over a syntactic unit generally follow a downward trend in most languages. This general pitch lowering is often known as pitch declination. Thus it is common for a high pitch accent earlier in a sentence to have a higher fundamental frequency than a high pitch accent later in a sentence. Not only does the overall pitch drift lower, the range within which pitch varies also narrows toward the end of the utterance. Sometimes the range can be so narrow that it is difficult to distinguish a high pitch accent from a low pitch accent. Furthermore, the intensity of speech also tends to decrease toward the end of an utterance. As a result, it is common for a stressed syllable earlier in an utterance to have a greater intensity than a stressed syllable that occurs later in the utterance.

2.2.1.3 The acoustic correlates of stress in English Rhythmic beats in English are signaled by recurrences of stressed syllables, which are usually longer, louder, and higher in pitch than the unstressed ones (Ladefoged, 1993). 12 Lieberman (1960) recorded 16 speakers of American English producing 24 word pairs in which a change of grammatical category from noun to verb is commonly associated with a shift of primary stress from the first syllable to the second. An example for this type of words is 'progress versus pro'gress. The results indicated that 90% of their stressed syllables had higher pitch, 87% had greater intensity, and 66% had a greater duration. Although duration has the most direct impact on the temporal aspect of speech among other phonetic cues, it is highly susceptible to variations in segmental composition across syllables. This may explain in part why the percentage of stressed syllables having a greater duration turns out to be much lower than the percentage of stressed syllables having higher pitch or greater intensity in Lieberman's study.

Although stressed syllables in English are often pronounced with high pitch, they can also be uttered with low pitch. The shapes of the pitch contours on stressed syllables are often used to signal different meaning (Pierrehumbert and Hirschberg, 1990). For example, high pitch can be used to signal certainty of the information whereas low pitch can be used to signal uncertainty. An example is given in (9) to illustrate that a change of could lead to a change in intonation meanings. Note that these intonation contours are transcribed following the ToBI conventions, l where H* stands for high pitch accent, L* for low pitch accent, H- for high phrasal , and H% for high boundary tone.

(10) May 1 interrupt you? H* H-H% (I have confidence that it is okay.) L* H-H% (I am uncertain if it is okay.)

Even though that duration, intensity, and pitch all contribute to signaling stress in English, they are not equally important with regards to the perception of stress (Lehiste, 1970). Among these three acoustic correlates of stress, native listeners of English seem to rely most heavily on pitch and least heavily on intensity.

1 The ToBI framework is a set of community-wide standard conventions for transcribing the intonation and prosodic structure of spoken utterances in a language variety. As intonation and prosody vary from language to language, there are language specific variations of ToBI systems.

13 Fry (1955) evaluated the relative importance of duration and intensity in the perception of stress. The material used was 125 synthesized test words created from five pairs of disyllabic words, where a difference of rhythm was associated with a difference of grammatical function, noun vs. verb. They included object, subject, digest, permit, and contract. The duration and intensity ratios of the first and the second vowels of each test word were varied independently in five steps within ranges based on the actual productions of these words by 12 native speakers of English. 100 subjects were asked to listen to these synthesized test words and to identify the accented syllable in each case. The results showed that when duration and intensity varied in the same direction, there was excellent agreement among the listeners. That is, listeners agreed that a syllable was stressed when it was both long and loud, and unstressed when it was both short and soft. The results also suggested that both duration and intensity were linked to the perception of stress. The number of noun judgments increased with the increase of duration ratio and the increase of intensity ratio of Vowel One and Vowel Two. However, the duration ratios of the two vowels had a stronger influence on the perception of stress than the intensity ratios. The whole range of variations in intensity contributed toward a 29% increase of "noun" judgments whereas the variations in duration contributed toward an increase of 70% of "noun" judgments.

Fry (1958) took pitch into account in two additional experiments. In the first experiment, he combined the five duration ratios between the first and the second vowels of the disyllabic word subject with step changes of fundamental frequency, while holding intensity constant throughout both syllables. With each duration ratio, the pitch of one syllable was varied in eight steps while the other syllable was held constant at a low pitch. A total of 80 different synthesized patterns of the test word was presented to 41 subjects for stress judgments. The results confirmed pitch as a factor in the perception of stress. In all duration contexts, when two syllables differed in fundamental frequency, the

syllable with the higher Fo was more likely to be judged as stressed than the syllable with the lower Fo.

14 In a subsequent experiment, Fry combined the five duration ratios with variations in fundamental frequency over one syllable while keeping the intensity ratios of the two vowels constant at all times. In each case, the fundamental frequency of one syllable featured either a linear or a curvilinear change within a fixed range while the frequency of the other syllable was held level at either the upper or the lower bound of the predetermined pitch range. For the linear pattern, the Fochanged continuously throughout the vowel and for the curvilinear pattern, the Fochange took place in the second half of the vowel duration. The purpose was to simulate common intonation patterns on disyllabic words. 76 English listeners were subjected to the stress judgment test of the 80 synthesized versions of the test word subject. The results showed that when two syllables differ in tone shapes, the -tone syllable is more likely to be perceived as stressed than the level-tone syllable. In all duration patterns that contained both contour and level tone syllables, a significantly greater number of contour tone syllables were judged as stressed than level tone syllables. Whether the tonal is linear or curvilinear, rising or falling does not change the judgment of stress. The results suggested that the pitch cue might outweigh the duration cue in the perception of stress in English. This finding was supported by Bolinger (1958), who indicated that a syllable could stand out from others by increasing or decreasing its pitch level, as well as by including corners or sharp points of the pitch curve.

2.2.2 ACQUISITION OF ENGLISH RHYTHM Given that there are more second language learners of English than any other language in the world, it is not surprising that most of what is known about the acquisition of a second speech rhythm comes from research on English as a target language. However, the majority of these studies were focused solely on the duration parameter of rhythm and the studies were generally to determine whether the inter-language produced isochronous stresses. Despite variability in the research methods and the native language backgrounds of the subjects, overall, the results of these studies showed that non-native speakers of English produced less duration differentiation between stressed and unstressed syllables and stressed more syllables than did native speakers of English.

15 Adams and Munro (1978) studied the placement and the correlates of stress in the connected utterances of native and non-native speakers of English. Eight Australian English native and eight non-native graduate teachers of English of various Asian languages classified as syllable-timed were asked to read aloud 12 short passages, including two nursery rhymes, three excerpts of verse with a strongly metrical rhythm, five fabricated equivalents of these items, and one passage each of colloquial and literary prose.

In the perception part of the experiment, 10 naiVe native listeners of Australian English were asked to listen to the recordings and identify the syllables that they perceived as "prominent." The results showed that the non-native speakers stressed more syllables than did native speakers of English. In addition to stressing syllables that were stressed by native speakers, non-native speakers also placed stress on a number of syllables that were not stressed by native speakers, including prepositions and conjunctions. The native speakers stressed an average of 52 syllables, while the non­ native speakers stressed an average of 82 syllables.

In the production part of the experiment, the fundamental frequency (Hz), intensity (dB), and duration (ms) of the individual syllables were measured. Fundamental frequency is defined as the number of complete cycles of variations in air pressure per second caused by the opening and closing movements of the vocal cords. The intensity of a sound is relative to the size of variations in air pressure. It is usually measured in decibels (dB). Overall, little evidence was found to suggest that native and non-native speakers of English used different phonetic cues to signal stress at the sentence level. Stressed syllables were associated with increased duration, a greater fall of amplitude from its peak, and a wider range of fundamental frequency for both groups of subjects. It is noteworthy that although very little difference was shown between native and non­ native speakers in the duration of their stressed syllables, the non-native speakers' unstressed syllables were generally longer than those of the native subjects. From Table V given by Adams and Munro (1978, p.142), it is possible to figure out that the mean length of stressed syllables was 293 ms for the native speakers and 307 for the non-native

16 speakers, whereas that of unstressed syllables was 226 ms for the native speakers and 277 ms for the non-native speakers.

This study concluded that the real difference in stress production between native and non-native speakers of English lies not in the mechanisms employed to signal stress, but the distribution of stress. This was one of the few studies of rhythm that took perception into consideration and that examined all three phonetic correlates of stress. However, it had several limitations. (1) All of their non-native speakers were highly proficient English learners, whose stress production was potentially less likely to differ significantly from English native speakers than that of less proficient learners. (2) The great majority of their materials were composed of highly rhythmical texts, which could potentially induce a regular alternating rhythm out of the speakers and therefore underestimate the differences between native and non-native speakers. (3) The study did not specify the individual first languages represented among the non-native speakers nor did it explain by what criteria these languages were classified as syllable-timed or stress­ timed. It seems over-simplifying to assume that speakers from these different languages form a natural group. (4) The results were statistically analyzed based on the absolute measurements of fundamental frequency, intensity, and duration. Individual variations over pitch range, volume of speech, and speech rates were not taken into consideration. (5) Final and non-final syllables were not separated in the comparison between stressed and unstressed syllables. Given that final syllables are usually specially marked to signal the sentence boundary, it is desirable to analyze final and non-final syllables separately so that non-native speakers' difficulties with marking boundaries do not become confounded with their difficulties with stress.

Observations of non-native speakers of English over a number of years convinced Taylor (1981) that rhythm is perhaps one of the most widespread difficulties among foreign learners of English. In an informal survey, a recording of reading aloud and unscripted speech prompted by questions was made of 49 experienced teachers of English with 23 different native languages. Of the 49 subjects, 25 were judged to have "considerable difficulties with rhythm". Of these 25, 15 were judged to have produced

17 syllable-timed rhythm instead of stress-timed rhythm. The rhythm type of Ll was found to affect the rhythm in L2 to some extent. Of these 15 speakers, 11 were native speakers of syllable-timed languages. Insufficient length differentiation between stressed and unstressed vowels seems the basic cause of syllable-timed rhythm among non-native speakers of English, particularly the inability to use reduced forms in unstressed syllables. None of the 15 subjects who were judged to produce syllable-timed rhythm reduced unstressed vowels properly whereas all of those that did produce reduced forms produced acceptable English rhythm.

This study raised a very interesting question: Do English L2 learners transfer Ll rhythmic patterns into L2 or do they all start out with a syllable-timed rhythm regardless of their LIs? However, the methodology of this survey raised some serious doubts. First, there are no indications of what these 23 languages were and what criteria the author used to classify them into the two rhythmic types. Given that there is a lack of general consensus about how the distinction should be drawn, the classification of languages into rhythmic types seems to be an empirical question of its own. Second, there were no specified criteria or procedures as to how the subjects were judged to produce a stress­ timed or a syllable-timed rhythm and by whom. Third, apart from duration and vowel quality, the effect of pitch and intensity on speech rhythm was largely ignored in the interpretation of why syllable-timed rhythm or stress-timed rhythm was perceived among learners of English.

Bond and Fokes (1985) studied the timing of words with zero, monosyllabic, and disyllabic suffixes. Three pairs of ESL learners, native speakers of Thai, Malaysian, and Japanese, were asked to read the base words stick, speed, sleep, and shade in isolation and with monosyllabic suffixes -y and -er and disyllabic suffixes-ily and -iness. Each subject said each word printed on cards three times and proceeded through the cards three times. The relative duration of the base word to the duration of the entire word were measured and compared with the results from two English native speakers in an earlier study (Lehiste, 1972). Instead of compressing the base words in proportion to the number of syllables in the suffixes as the English native speakers did, the non-native speakers

18 produced about the same amount of duration of on the base words, regardless of the number of suffixed syllables. There is variability between speakers of the same language or within the same speaker. They concluded that non-native speakers' difficulties were with consistency and proportion of timing. The results suggested that English learners did not observe the tendency toward isochrony of feet to the same degree nor as consistently as native speakers did.

Bond and Fokes (1985) also focused on the timing aspect of rhythm. Although pitch is the most important cue for the perception of stress in English, it has been largely unexplored in the study of rhythm. It would be interesting to see how the two speakers of Thai (a tone language) and the two speakers of Japanese (a pitch accent language) differed in their pitch patterns from native English speakers. However, one needs to exercise caution when using isolated words in the study of rhythm. Speakers are more likely to enunciate clearly when asked to read vocabulary removed from context, which makes the rhythm less natural. The small numbers of subjects from each language also makes the results hard to interpret. Given that individual variations are usually greater among non-native than among native speakers, it is desirable to have a large number of subjects in the sample to increase the reliability and generalizability of the data.

Mandarin speakers are frequently reported by English teachers to experience difficulties with speech rhythm in English (Chang, 1987). Their rhythm has been informally characterized as staccato or syllable-timed. However, this area of research has been little explored.

Juffs (1990) studied the stress errors of 19 Mandarin-speaking first-year undergraduates from Hunan Agricultural College. A recording was made of each student reading a lOS-word passage chosen from their English textbook. Ten polysyllabic lexical items were chosen for analysis of stress placement and syllable structure. Defining word stress as primarily manifested by pitch height and tonic stress as primarily manifested by pitch movement, he found, based on his own perception as a native speaker of RP, i.e., the standard British English dialect, that Mandarin learners seemed to be using pitch movement to realize not only tonic accent, but also word stress. He suggested that 19 Mandarin speakers might have interpreted English stress as tonal due to the fact that tone is such a salient feature in their first language. In addition, Mandarin speakers sometimes produced stress on the incorrect syllables, such as saying red'skinned for 'redskinned or cont'nent for 'continent. He further proposed that there might be a link between errors in stress placement and errors in syllable structure. Taking the word 'continent as an example, he found that several students produced it with stress on the final syllable. His hypothesis was that Mandarin speakers might have syllabified it as con-ti-nent. According to his analysis of the first syllable, the vowel Ial is a weak vowel in Hunan dialect and the In! is not counted as a consonant in the same way as it is in Mandarin. This leaves the third syllable as the only heavy syllable eligible for stress assignment. He concluded that Mandarin speakers' errors with stress in English productions occurred not only in placement but also in the phonetic process used to achieve stress.

Juffs drew a number of very interesting conclusions: (1) Mandarin speakers did not seem to be making tonal distinctions between word stress and sentence stress and they tended to place pitch accent on both. (2) They might have interpreted English stress as Mandarin tone. (3) They sometimes misplaced stress on polysyllabic words. And (4) their difficulties with English syllable structure may be linked to their difficulties with stress placement in English.

However, Juffs' study suffers from a number of limitations. First, his observations were based on very limited supporting data. In support of each of the above-mentioned claims, only a very small portion of the results were analyzed and discussed. Second, his analysis was limited to the pitch aspect of stress and did not examine duration and intensity, the other phonetic correlates of stress. Without knowledge of what other phonetic cues Mandarin speakers actually use to achieve stress in English, it is difficult to determine (1) whether Mandarin speakers make other phonetic distinctions between word and sentence stress or (2) whether Mandarin speakers actually interpret English stress as a purely tonal event like Mandarin tones. Third, the distinction between pitch movement (sentence stress) vs. pitch height (word stress) was based on the judgment of a single native listener. Because the perception of whether or not there is a pitch movement could

20 be subject to individual variation even among native listeners, it would be desirable to do acoustic measurements in addition to establishing inter-rater reliability by collecting judgements from several trained native English listeners. Fourth, the notion that word stress correlates most closely with pitch height and sentence stress correlates with pitch movement seems to be an overstatement. In English, stress is often but not always realized with higher pitch. Moreover, sentence stress, or accent can assume various tonal shapes, including both level tones and contour tones, depending on the meaning one wants to convey. Fifth, although the materials were taken from the subjects' English textbook, they may still have made errors in stress placement during the recording. To increase the reliability of the results, it would be desirable to collect multiple reading samples from each subject. Finally, Juffs' study only examined stress errors on individual lexical items. Although misplacement of stress in polysyllabic words may contribute in part to their difficulties with English rhythm, such errors largely reflect difficulties with the rhythmic patterns within the words. The data did not adequately address rhythmical problems at the sentence level, which involves not only the prominence relations among syllables within words, but also the prominence relations among words. To tap into the latter aspect of speech rhythm, one would have to look at longer stretches of speech.

Anderson-Hsieh and Venkatagiri (1994) examined syllable duration and pausing in the reading samples of three low intermediate and three high intermediate Mandarin ESL speakers and three English native speakers. For syllable duration, they compared native and non-native speakers on two types of syllables representing two durational extremes: (1) tonic syllables witli full vowels before a clause or sentence boundary, and (2) unstressed monosyllabic function words with reduced vowels in non-clause-final positions. Their results showed that although the tonic syllables were longer in duration than the unstressed syllables for all three groups, the tonic syllables were relatively much longer than the unstressed syllables for English native speakers and the high intermediate group than for the low intermediate group. Ratios of tonic syllable and unstressed syllable duration to total sentence duration are 0.37 vs. 0.09 for English speakers, 0.37 vs. 0.08 for the high intermediate group, and 0.24 vs. 0.12 for the low intermediate group. It appears

21 that the low intermediate speakers produce less differentiation between the two types of syllables.

The findings of Anderson-Hsieh and Venkatagiri's study have two important implications. First, the durational contrast between stressed and unstressed syllables could constitute a major area of difficulty for Mandarin speakers. Second, the acquisition of stress-timing is possible as proficiency improves, but Mandarin speakers may start out with a more syllable-timed rhythm at early stages. However, we need to be very cautious with the interpretation and generalization ofresults found for such limited samples. There are also issues that are not adequately addressed in this study.

Overall, I found Anderson-Hsieh and Venkatagiri's study limited in three ways. First, other key components of speech rhythm in English, such as pitch and intensity, were excluded from the analyses. Because Mandarin speakers have to make the transition from using pitch for tones to using pitch for stress, the investigation of pitch as a correlate of stress in speech is particularly significant. Second, the sample size was very small. Third, the research design made it impossible to disentangle stress from syllable position as two potential variables affecting duration. Based on the durational underdifferentiation between these two types of syllables, one can not generalize that Mandarin speakers would have the same problems with other types of syllables in other positions.

In summary, previous research in this area generally manifests a number of limitations: (1) Pitch and intensity have seldom been taken into consideration despite their proven influence on the perception of stress. (2) The number of subjects has usually been very small. Although acoustic analyses take a tremendous amount of time, it would improve the reliability and generalizability of the results to include a larger of number of subjects. (3) Rhythm has seldom been analyzed at the sentence level or over longer stretches of speech. Most studies examined rhythm of isolated words or highly selected syllables. (4) Few studies have considered possible developmental stages of speech rhythm by comparing learners of different proficiency levels. (5) Specifically, little is known about how non-native speakers realize stress and rhythm in the new language when their native language signals stress in a different way from their second language or 22 when the phonetic cues for stress in the second language are used for different major linguistic functions in their native language.

2.3 CHARACTERISTICS OF SPEECH RHYTHM IN MANDARIN CHINESE

Rhythm is one of the least researched areas in Chinese phonology. Little empirical evidence is available to confirm the classification of Mandarin dialects into any of the three recognized categories of speech rhythm (foot-timed, syllable-timed, mora-timed). In this section, I discuss the characteristics of speech rhythm in Beijing Mandarin and Taiwan Mandarin based on recent analyses of their foot structure, the contrasts between full-toned (heavy and stressed) vs. neutral-toned (light and unstressed) syllables, and the phonetic realization of stress.

2.3.1 FOOT STRUCURE IN MANDARIN

Duanmu (1999, 2001) proposes that Mandarin consists of left-dominant disyllabic feet, where prominence falls on the first syllable, which must be filled by a heavy (i.e., bimoraic) syllable. A heavy syllable has either a long vowel or a coda, whereas a light (monomoraic) syllable has a short vowel and no coda. Furthermore, he assumes the mora to be the tone bearing unit in Mandarin and that full tones are sequences of two level tones (Duanmu, 1990). Following from these assumptions, heavy (i.e., bimoraic) syllables can carry full tones, but light syllables, which are monomoraic, cannot.

The central notion of his proposal is that a minimal word has two levels of metrical structure: the moraic level (which he calls the 'M-foot') and the syllabic level (which he calls the 'S-foot'). He uses 'M-foot' to refer to a bimoraic foot and 'S-foot' to refer to a disyllabic foot. Based on the fact that only the second, but not the initial syllable of a Mandarin disyllabic word can be pronounced with a neutral tone, he proposes that disyllabic feet in Mandarin are left-dominant (see also Lin, 2001). Because a disyllabic foot in Mandarin has prominence on the first syllable, the first syllable must be an 'M-foot'. The second syllable of a foot can be either an 'M-foot' or an unfooted single mora. 23 According to Duanmu's analysis, a metrically acceptable foot in Mandarin consists of either two heavy syllables (heavy-heavy), as shown in example (11a), or a heavy followed by a light syllable (heavy-light), as shown in example (lIb), where foot boundaries are shown in parentheses, 0' is used to represent syllable, and J.l is used to represent mora.

(11) a. ( 0' 0' ) b. ( 0' 0') S-feet (J.lJ.l) (J.lJ.l) (J.lJ.l ) J.l M-feet heavy heavy heavy light

One kind of evidence for disyllabic feet in Mandarin comes from poetry. (12) shows how syllables fall into disyllabic feet. When only one syllable is available to form a foot, a silent beat, represented by 0, is inserted after the syllable to keep the beat. Aside from being realized silently, 0 can also be realized as lengthening on the preceding syllable.

(12)(Zhang Lao) (San 0), (WG wen) (nz 0), Zhang Lao San, I ask you, 'Zhang Lao San, let me ask you'

(n'i de) (jia xiang) (zed mi) (ti 0) you Gen. home town at which place 'Where is your home town?" (Duanmu, 2001, p.122).

(13) shows a similar example from a popular nursery rhyme in Taiwan Mandarin.

(13) (san lun) (che 0), (pao de) (kuai 0), three wheel car run fast 'A three-wheel wagon is running fast.'

(shang mian) (zuo ge) (lao tai) (tai 0) top surface sit one old lady lady 'With an old lady sitting on the top.'

Supporting evidence for disyllabic feet in Mandarin also comes from the fact that there exists a strong preference in all Chinese dialects for an expression to have at least

24 two syllables (Duanmu, 1999). Examples (14) through (17) show that a monosyllabic expression is usually prefixed or suffixed with a semantically redundant syllable. The symbol '*' is used to indicate an ill-formed expression.

(14) Personal address

*Wang la03 Wang xiaoWang 'Wang' 'Old Wang' 'Little Wang'

(15) Place names *Luo LuoCheng Shanghai 'Los Angeles' Los Angeles City top ocean 'Los Angeles' 'Shanghai'

(16) Country names *de deguo rlben Germany Germany Country 'Japan' 'Germany' 'Germany'

(17) Empty morphemes _ v *sun sunZl sunnyu grandson grandson Nominalizer grand daughter 'grandson 'grandson' ,granddaughter'

2.3.2 FULL-TONED VERSUS NEUTRAL-TONED SYLLABLES

Following Duanmu's analysis, the contrast between a heavy (bimoraic) and a light (monomoraic) syllable reflects a contrast between a stressed and an unstressed syllable, which must be manifested through the contrast between a full-toned and a neutral-toned syllable. But if we assume that the mora is the tone-bearing unit, the moraic account of foot structure already implies that Mandarin disyllabic feet are formed by sequences either of two full-toned syllables or a full-toned followed by a neutral-toned syllable. It seems redundant to assume an additional stress contrast between the two types of tone, i.e., to assume that stress is a rhythmic primitive for Mandarin.

25 Duanmu (2001) argues for stress as an independent construct in Mandarin by using it to solve the 'word length problem' (p.124). In Mandarin, there are monosyllabic and disyllabic synonyms as shown in (19).

(19) Monosyllabic Disyllabic Gloss su~m (garlic) da-suan (big garlic) 'garlic' zhong (plant) zhong-zhi (plant plant) 'to plant'

The examples in (20) illustrate the word-length constraint problem in BM and TM, where it is acceptable to have a monosyllabic verb (V) followed by a disyllabic object (0) (20b), but unacceptable to have a disyllabic V followed by a monosyllabic 0 (20d).

(20) Verb Object Gloss , a.zhong suan 'to plant garlic' b.zhong da-suan c. zhong-zhi da-suan , d.*zhong-zhi suan

According to Duanmu, the word length problem can be resolved if one adopts the notion of stress in Mandarin, assuming that a word with more stress should not have fewer syllables than a word with less stress and assuming that 0 must be more stressed than V. Based on these assumptions, 0 may not be shorter than V in a V-O. This analysis provides one solution to the problem. However, it is not necessary to resort to the notion of stress to solve this problem. An alternative is to postulate that V must not be heavier (counting moras) than O. This is to illustrate how difficult it is to justify the reality of stress in Mandarin. In this study, I follow Duanmu's (2001) assumption that Chinese has stress. But it is important to be aware of the fact that there has not been strong evidence or a clear definition for stress in TM or BM.

Despite the fact that it is difficult to justify stress in both TM and BM, it is uncontroversial that a neutral-toned syllable is prosodically weaker than a full-toned syllable. A neutral-toned syllable is relatively shorter in duration than a full-toned syllable and is variable in pitch. In order to illustrate the impact of the alternation

26 between full-toned and neutral-toned syllables on the speech rhythm, I now review the general characteristics of full and neutral tones in Mandarin. Dialectal variations regarding the use of neutral tones between BM and TM will be discussed later.

There are four full tones and one neutral tone in Mandarin. Among the four full tones, one is a level tone and the other three are contour tones.

Table 2.1 Tones in Mandarin Chinese

Tone number Description Pitch Tone letter Example Gloss 1 high level 55 1 Ma 'Mother' 2 high rising 35 1 Ma 'Hemp' 3 low falling rising 214 A Ma 'Horse' 4 high falling 51 ~ Ma 'Scold' 5 neutral variable Ma Particle

A popular view is that when a syllable is completely unstressed, its default tone disappears and becomes reduced to a neutral tone. When in isolation, the neutral tone is a low plateau close to the bottom of the speaker's pitch range and its duration is relatively short (Chao, 1968; Tseng, 1981). In connected speech, the pitch level of a neutral tone syllable is variable, depending on the pitch value of the preceding tone, except for those in sentence-final position whose pitch level is determined by the intonation (Chao, 1956; Cheng, 1973, Tseng, 1981). When non-final, the of a neutral tone is falling following Tone 1, 2, and 4, and rising following Tone 3 (Gao, 1980; Dreher and Lee, 1966). In an acoustic study conducted by Gao (1980), the tonal value of a neutral tone syllable at its starting point is lower than the ending point of a preceding Tone 1 and 2, higher than that of a preceding Tone 4, and continues upward from the pitch movement of a preceding Tone 3. His instrumental analyses indicated the following starting tones for the neutral tone following the four citation tones.

3 after Tone 1 3 after Tone 2 4 after Tone 3 2 after Tone 4

27 In the case of two or more successive neutral tone syllables, Chao (1933) proposed that the pitch level of each is determined by the immediately preceding tone. Thus if the pitch level of the first neutral tone is low, any following neutral tone syllables would also have low pitch. For example,

(21) kimjian 'to see' ~ J kimjian Ie 'to have seen' ~ JJ kanjian Ie mei you 'to have or have not seen' ~ JJJJ

However, in running speech, the pitch level of the neutral tone syllables following the first neutral tone syllable is not so much determined by the preceding neutral tone. Instead, it is heavily influenced by the intonation contour of the sentence (Shen, 1994).

A neutral tone syllable may occur under two conditions: (1) In polysyllabic words where any full-toned syllable except for the initial one may become reduced to a neutral­ toned syllable, and (2) in functional categories (Chao, 1932; Shen, 1994).

For example, tone loss happens to the second syllable of the disyllabic word pianyi

'inexpensive' (pianyi, if said without tonal neutralization) and the middle syllable of the tri-syllabic word laogudong 'old conservative person' (laogudong without tonal neutralization). Syllables that are toneless are often found in the following grammatical categories of words.

(22) Categories Examples Grammatical particles a, ba, ne, ma, ye Plural marker men as in women 'we', tamen 'they' Perfective marker Ie as in chIle 'have eaten', shulle 'is already asleep' Progressive marker zhe as in kanzhe 'watching', zouzhe 'walking' Possessive marker de as in wade 'my', n'ide 'your' Generic classifier ge as in y{gerin 'a person', y{gemeng 'a dream' Personal enclitic zi as in lauzi 'father', ha{zi 'child'

28 Disyllabic words, which constitute the majority of the Mandarin lexicon, may consist of sequences of two full tones or a full tone followed by a neutral tone. According to Chao, the majority of disyllabic words belong to this category, such as xianzai 'now' andjlzer 'egg'2, where the second syllable cannot lose its tone. Trochaic disyllabic words are fewer in number but higher in average frequency of occurrence. They account for words such as mianhua (cotton flower) 'cotton' and y'iba 'tail', where the second syllables have lost their inherent lexical tones3 due to lack of stress.

According to Chao, when words or phrases have three or more syllables, the final syllable has primary stress, the first secondary, and the medial tertiary stress as in huashengtang 'peanut candy', xiiishuobiidao 'stuff and nonsense,4. Chao's position is supported by Yan and Lin (1988), who found that the last syllable in an isolated trisyllabic word is longest. However, their results were later challenged by Duanmu (2001) and Lin (2001) in that the longer duration of the final syllable may be attributed to final lengthening. Wang and Wang (1993) measured the duration of polysyllabic words ranging from one to four syllables placed within a carrier sentence and found that the first syllable tends to be the longest. Their results provide supporting evidence for Lin's assertion that Mandarin polysyllabic words have main stress on the initial syllable.

Stress is also used in BM to distinguish minimal pairs of disyllabic words that have identical segmental and underlying tonal composition and differ only in their stress

2 Jizer 'egg' is an idiosyncratic expression used in Beijing Mandarin. In Taiwan Mandarin, 'egg' is referred to as jidan.

3 In Taiwan Mandarin, both syllables in mianhua 'cotton' and y'iba 'tail' are spoken with full tones mianhuii and y'ibii.

4 Xiiishuobiidao 'stuff and nonsense' has a different expression in Taiwan Mandarin, namely hushuobiidao.

29 patterns. There are approximately 200 such minimal pairs in Beijing Mandarin (Chen, 1984; Shen, 1993). A few examples are given in (23).

(23) sh'ifii 'right or wrong' shljei 'quarrel'

matou 'horse's head' matou 'pier'

shengq'i 'get angry' shengqi 'vitality'

day'i 'main point' dayi 'careless'

2.3.3 IDIOSYNCRASIES OF LEXICAL RHYTHM ACROSS DIALECTS

There are idiosyncrasies across various dialects of Chinese Mandarin when it comes to the rhythmic patterns of polysyllabic words. Unlike Beijing Mandarin where any syllable except the first may be pronounced with a neutral tone, Taiwan Mandarin seldom reduces any syllables in polysyllabic words to neutral tones. As a result, most of the neutral tone syllables in Taiwan Mandarin are grammatical morphemes (see examples in 22 above).

Many of the common trochaic disyllabic words in Beijing Mandarin are spoken with full tones on both syllables in Taiwan Mandarin.

(24) Disyllabic words Beijing Mandarin Taiwan Mandarin Gloss zhldao zhldao 'know' ben3shi bensh'i 'ability' m{ngbai m{ngbdi 'understand'

30 The stress rule that typically applies to polysyllabic words of three or more syllables in Beijing Mandarin is often non-existent in Taiwan Mandarin. In Beijing Mandarin, tone loss usually happens to the middle syllable of a tri-syllabic word. For example, the tri-syllabic word laushouzhang 'old senior officer' is usually said with a middle neutral-tone syllable laushouzhang, laugudong 'old conservative person' as lagudong, and guoluke 'passerby' as guoluke (Chao, 1948; Shen, 1994). In contrast, words like these are spoken with full tones on all syllables in Taiwan Mandarin, with

5 appropriate

rd 5 In Mandarin Chinese, when a 3rd tone is followed another 3 Tone, the first one is usually realized as a rising tone.

31 (23)Tri-syllabic words Beijing Mandarin Taiwan Mandarin Gloss laushouzhang laushouzhang 'old senior officer' laugudong laugudong 'old conservative person' guoluke guolitke 'passerby'

Moreover, stress does not seem to be contrastive in Taiwan Mandarin. No minimal pairs of words are distinguished solely on the basis of stress. The minimal pairs of words distinguished by contrastive tones in Beijing Mandarin as shown in (26) are homophones with full tones on both syllables in Taiwan Mandarin.

(26) Stressed-based minimal pairs Beijing Mandarin Taiwan Mandarin Gloss shlfii sh'ifii 'right or wrong' shlfei shlfii 'quarrel' matou matou 'horse's head' matou matou 'pier' shengq'i shengq'i 'get angry' shengqi shengqi 'vitality' day'i day'i 'main point' dayi day'i 'careless'

It is possible for both Beijing and Taiwan Mandarin to have two consecutive neutral-toned syllables in the middle or at the end of a sentence. For example,

(27) jiejie de shu 'older sister's book' gege de pengyou 'older brother's friend' nz kanzhe ba 'You'll see.'

However, it is very rare in Taiwan Mandarin to have three consecutive neutral­ toned syllables except when the last one is a sentence final particle. For example,

32 (28) daoshlhou, fangzi jiush'i baba de Ie. 'At that time, the house will have belonged to Daddy.'

Overall, it is more common to have two or more consecutive neutral tone syllables in BM than in TM due to the large number of neutral-toned syllables in polysyllabic words in BM. Long stretches of consecutive neutral tone syllables in BM are often interrupted by full tones in TM as described above.

(29) BM: kanjian Ie 'to have seen' TM: kanjian Ie

BM: kanjian Ie mei you 'to have or have not seen' TM: kanjian Ie mei you

BM: daban Ie 'to have dressed up' TM: daban Ie

BM: daban Ie mei you 'to have or have not dressed up' TM: daban Ie mei you

In summary, although both BM and TM allow the same types of foot structure (i.e., heavy-heavy and heavy-light), a heavy-light foot is much more common in BM than in TM because tonal neutralization is much less frequent in TM.

2.3.4 THE PHONETIC CORRELATES OF STRESS IN MANDARIN CHINESE

Chao (1968, p.35), based on his auditory impressions, first described tonic stress in Mandarin Chinese as "primarily an enlargement in pitch range and time duration and only secondarily in loudness." His position later gained support from several instrumental analyses (Howie, 1976; Tseng, 1981; Lin, Yan, & Sun, 1983; Shen, 1994). When stressed, high level tones are pushed higher, rising tones rise higher, dipping tones dip lower, and falling tones start higher and drop deeper. Moreover, the stressed syllables also have longer duration to realize the extended Po movement. Shen (1993) analyzed the 33 natural speech of a female Beijing Mandarin speaker and the reiterated speech of three Beijing Mandarin speakers and reported a duration ratio of approximately 3:2 and a difference of nearly 8 dB between stressed and unstressed vowels of the same quality. In addition, contrastive stress is primarily achieved by changes in duration.

Note that there are two kinds of stress that need to be distinguished. One is lexical stress, which is associated with the presence or absence of tones. The other is emphatic stress, which is achieved by increasing pitch range, length, and loudness. To illustrate this contrast, an example is given in Figure 2.1 showing a female Mandarin speaker saying wobaba 'my father' first without and second with emphatic stress on the middle syllable.

afflo 4Stll~' '- ~~~~~"""'~" ,~'~-,~"" ",~, ""~~""",, '~'~"" "~~,,~~,~~~,,, ....9 ';'. '" "~"'~~ "~,," I 40(J1'~~""~~ ""',', ,,-~'" ,~""-,-,," ~~"" '''~~''''''''''''''''''~~-'',", ,,~~~~,-,,~-"-'".9 . "~" ~'I 35tl "-"",,~,,, ,-~, -'" '~-"'~-""', ","~, ,~~, ,,,--,,-,,,~,",~, " "'-~-'" ,'-~,,~,~~,~~~, , ~'. """""'"'''''' ,~~~'" I -.... .''"'' ",~~~""",,'- "",-~"""""""-~~"",~~"""""",--~"",,,"""",""""--""""""""""'---,""-""~'-'""""'i""~>""'~"'--'~"I 2001;"""""-""""'"

!5tll~'=·0/6"'·-'~~"-"~b'"a."-~"-;i~b~a.~'-"'~-I-"""~~-~~"--'-~il-'-'--~;'··'wy"'''=.·----T"ba.'~------:ib;a-;'''---I

Figure 2.1 Lexical stress vs. emphatic stress in Mandarin Chinese

The middle syllables in both utterances in Figure 2.1 carry lexical stress but the middle syllable in the second utterance also carries emphatic stress. Because both are lexically stressed, both were said with full tones. However, the emphatically stressed syllable was said with a much broader pitch range (192Hz vs. 50Hz) with the Fo contour starting at 385Hz, peaking at 476Hz, and ending at 284Hz. Compare this with the same syllable without emphatic stress, whose Fo starts at 290Hz, peaks at 295Hz, and ends at 34 245Hz. The emphatically stressed syllable is also slightly longer (238 ms vs. 208 ms) and louder (peak intensity 56 dB vs. 51 dB) than its non-emphatically stressed counterpart.

Although pitch serves as a major correlate of emphatic stress in BM and TM, Fo information is not necessary in order for stress to be perceived. In her perception tests, Shen (1993) asked four Beijing Mandarin speakers to listen to natural and reiterant speech under three different conditions: (1) Low-pass-filtered, with a cutoff frequency of

400 Hz to eliminate segmental information, (2) Monotoned, with the Fo of the already filtered speech held constant at 135 Hz to eliminate Fo variations, and (3) Fixed intensity at a constant 60 dB so the only remaining cue is length. The subjects were then asked to circle syllables that they perceived as stressed. The results showed that the Chinese speakers perceived stress equally well under all three conditions. Neither the presence of

Fo nor intensity variations changed the stress judgments significantly. She also found much stronger correlations between duration and stress judgments than between intensity and stress judgments, suggesting that duration is a more important perceptual cue of stress than intensity for Mandarin Chinese speakers.

2.3.5 ARE BMAND TM MORA-TIMED, SYLLABLE-TIMED, OR FOOT-TIMED?

From this perspective, how do the rhythms of BM and TM be categorized? Are they mora-timed, syllable-timed, or foot-timed? First, they cannot be mora-timed because evidence from poetry as shown in (11) and (12) indicates that BM and TM are syllable­ counting (i.e. each foot must consist of two syllables, regardless of the second syllable being heavy or light), not mora-counting. Second, they cannot be syllable-timed, because both BM and TM maintain a distinction between heavy and light syllables. Third, if BM and TM are foot-timed, given that a foot can consist of either four moras (heavy-heavy), or three moras (heavy-light), I would expect their heavy-heavy feet and heavy-light feet to show approximately equal lengths. That is, the initial heavy syllable in a heavy-light foot would tend to be longer than that in a heavy-heavy foot.

35 2.4 POTENTIAL DIFFICULTIES TAIWAN MANDARIN AND BEIJING MANDARIN SPEAKERS MIGHT HAVB WITH THE PRODUCTION OF ENGLISH SPEECH RHYTHM

In order for a second language learner to produce a native-like rhythm in English, at least three important aspects of speech rhythm must be learned. First, one must place stress cues on the appropriate syllable(s). Second, one must employ the same phonetic cues to signal stress as native speakers. Although there are no invariant phonetic correlates for stress and no two stressed syllables are realized in exactly the same way, there are strong tendencies for stressed syllables to be longer, louder, and/or broader in pitch range. And third, one must produce native-like differentiation between stressed and unstressed syllables so that the rhythmical beats can be distinctively perceived. It is true that stress isochrony does not only mean lengthening stressed syllables and shortening unstressed ones. It means producing approximately equal time between stresses. However, because isochrony is only a mental target and phonetic isochrony is rarely established empirically in production, I will focus our investigation based on the three proposed criteria. The assumption is that if an ESL or EFL learner is capable of achieving all three, it would be a good indication that s/he is able to produce near native-like English speech rhythm.

While the speech rhythm of English has been widely researched, little empirical evidence is directly available to either characterize or classify the speech rhythm of Taiwan Mandarin or Beijing Mandarin into any of the three recognized rhythmic categories. However, I might be able to glean some ideas about the potential difficulties TM and BM speakers might have with the production of English speech rhythm from what I have learned about the two languages in sections 2.2 and 2.3.

First, TM speakers should be able to use duration, intensity, and pitch as correlates of stress in English with at least some degree of success because the same correlates are used to signal stress in TM (see Figure 2.1). However, because of the strong predisposition to pitch in their first language, TM speakers may rely more on pitch in producing the distinction between stressed and unstressed syllables than on duration and intensity. If this prediction is true, I would expect TM speakers to produce similar or

36 greater pitch contrasts but smaller duration and intensity contrasts between stressed and unstressed syllables than English speakers do.

Second, because most syllables are produced with full tones in Mandarin, TM speakers might tend to assign tones to syllables. If they do, they might have difficulty with reducing unaccented syllables, including those that bear some stress and those that are totally unstressed. Given that toned syllables must be spoken with a minimal length in order for the tones to be sufficiently realized, which could limit the possible variations in duration across syllables. Also because accented syllables must also be stressed i~ English, by assigning tones to syllables, TM speakers would likely be perceived as assigning stresses to syllables. Note that the distinction between accented and unaccented syllables is not necessarily realized within a foot in English because not all stressed syllables bear pitch accents. This would suggest that TM speakers' difficulty with English speech rhythm (assuming that there are as many levels of rhythm as levels of stress) may be realized at the foot level (when the distinction between accented and unaccented syllables coincide with the distinction between stressed and unstressed syllables) or at a level above the foot level (when the distinction is between accented syllables and unaccented weakly stressed or unstressed syllables).

(30) x (Swwww) x x x x x x x Mom made itfor me.

Consider the example in (30), where the main stressed syllable mom, shown in boldface, is stronger in relation to all the other syllables (made, it, for, me) in the sentence. The distinction between syllables that bear main stress and those that do not is hereafter defined as 'Strong' (main-stressed) syllables versus 'Weak' (secondary-stressed and unstressed) syllables on a metrical grid in the sense that main stressed syllables are relatively stronger than the others. At this level of the metrical grid, it is possible in English to have stretches of relatively weaker and tonally unspecified syllables, which could be potentially difficult for TM speakers.

37 TM Speakers, in particular, may experience greater difficulties with the production of English speech rhythm than BM speakers do. Although both BM and TM consist of left-dominant disyllabic feet, alternating full-toned (stressed) and neutral-toned (unstressed) syllables in polysyllabic words are much less common in TM than in BM. From this perspective, TM may sound syllable-timed because most of the syllables in polysyllabic words are produced with full tones. In addition, TM speakers may sound relatively syllable-timed when they speak English because they may tend to produce most syllables with full tones. This makes TM and English a more interesting pair of languages to compare in terms of rhythm.

However, it is also possible that TM speakers may have little problem with the alternating stress pattern in English given that a Strong-Weak foot structure is legitimate in TM. If this is true, TM speakers would have little trouble realizing the alternating strong and weak syllables in English, particularly when the weak syllables consist of unstressed function words as they usually are in TM.

Realizing the hierarchical organization of stresses in English could be a challenge for TM speakers. It is likely that TM speakers could produce a more English-like rhythm when words are produced in isolation as compared with the phrase or sentence level where reorganization of prominence relations is required. For example, they might have less trouble producing correct stress patterns for elevator and operator as two isolated words than as a single phrase. Instead of assigning greater prominence to the first than to the second word as shown in (31a), they might assign equal prominence to both words as shown in (31b).

38 (31) a. s w b. SS / \/\ / \ /\ S wSw S wSw /\ /\ /\ /\ /\ /\ /\ /\ SwSwSwSw SwSwSwSw elevator operator elevator operator

If this were true, they would produce stress on all syllables bearing main lexical stress at the same level.

If these predictions are correct, it appears that unaccented ('weak') syllables (including those that bear some stress and those that are completely unstressed) would pose a great challenge to Taiwan Mandarin speakers. And rhythmic patterns of connected speech would be more difficult than words in isolation. Because TM allows alternating full-toned and neutral-toned syllables when the neutral-toned syllables are also grammatical morphemes, TM speakers may be able to produce alternating strong and weak (in this case, unstressed grammatical morphemes) syllables with some degree of success. Thus potentially, sentences that contain long stretches of weak syllables may be rhythmically more difficult than those that contain alternating weak syllables.

39 CHAPTER 3: THE PRESENT STUDY

3.1 PURPOSE OF THE STUDY

The current study investigates Taiwan Mandarin (TM) speakers of two different proficiency levels with respect to their difficulties in producing English speech rhythm by analyzing three well-attested correlates of stress in English - duration, intensity, and pitch - of Strong and Weak syllables in non-final versus final position over complete sentences. The purpose is to correct some of the limitations of previous research on the acquisition of English speech rhythm (as noted in 2.2.2) and to provide physical evidence that would help identify difficulties, if any, in two key components of speech rhythm: (1) the physical realization of stress, i.e., whether TM speakers are able to use duration, intensity and pitch effectively in the discrimination between syllables that have more stress versus syllables that have less stress, and (2) the distribution of stress, i.e., whether TM speakers are able to focus stress on the appropriate syllables.

3.2 RESEARCH QUESTIONS

Based on Anderson-Hsieh and Vekatagiri's findings and the characteristics of English and Mandarin speech rhythms as reviewed in 2.2 and 2.3, I have six general predictions. More formally they are:

Prediction 1

TM speakers should be able to use duration, intensity, and pitch as correlates of stress in English with at least some degree of success. Among the three variables, TM speakers may rely more on pitch in producing stress distinction than on duration and intensity.

40 Prediction 2

TM speakers might tend to assign tones to syllables and therefore have difficulty with reducing unaccented syllables, including those that bear some stress and those that are totally unstressed.

Prediction 3

TM speakers may produce less differentiation between strong and weak syllables than do English speakers.

Prediction 4

TM speakers may have little problem realizing the alternating strong and weak syllables in English, particularly when the weak syllables consist of unstressed function words.

Prediction 5

Realizing the hierarchical organization of stresses in English could be a challenge for TM speakers.

Prediction 6

TM speakers who have a higher overall proficiency in English may produce a more English-like rhythm than those who have a lower proficiency in English.

With these general predictions in mind, the current study will aim at addressing five research questions. In the following five subsections, 3.2.1 through 3.2.5, we will discuss specific predictions for each question and the kinds of data needed to answer them.

41 3.2.1 DO TAIWAN MANDARIN SPEAKERS HAVE DIFFICULTIES WITH THE DURATION, INTENSITY, OR PITCH OF STRONG SYLLABLES, WEAK SYLLABLES, OR BOTH?

If Prediction 2 is correct, reducing weak syllables is expected to be more difficult for TM speakers than strengthening strong syllables, where 'Strong' refers to syllables that bear main stress (including nuclear-pitch-accented and non-nuclear pitch-accented) and 'Weak' refers to those that are unaccented (including those that bear some stress and those that are totally unstressed). TM speakers may find it difficult to reduce weak syllables in English partly because they may tend to assign tones to syllables, and pitch accents on syllables are often perceived as stressed by English speakers.

In order to test this prediction, we will examine their strong and weak syllables in two types of sentences. Type A sentences feature long stretches of weak syllables and Type B sentences feature a highly regular rhythmic pattern of alternating strong and weak syllables. We will compare TM and English speakers with respect to the distribution of identifiable differences in strong and weak syllables in these two types of sentences. Specifically, we will focus on those differences in each of the three variables that usually weaken the contrasts between strong and weak syllables, namely, relatively shorter, softer, lower-pitched strong syllables versus relatively longer, louder, and higher-pitched weak syllables. Strong syllables which are longer, louder, and higher-pitched than those of English speakers or weak syllables which are shorter, softer, and lower-pitched are not considered to be difficulties because they strengthen rather than weaken the target rhythmic pattern by broadening the contrast between strong and weak syllables. As supporting evidence, we will also compare TM and English speakers with respect to the relative duration, intensity, and pitch of the strong versus weak syllables..

42 3.2.2 DO TAIWAN MANDARIN SPEAKERS PRODUCE LESS DIFFERENTIATION IN DURATION, INTENSITY, OR PITCH BETWEEN STRONG AND WEAK SYLLABLES THAN ENGLISH SPEAKERS?

If Prediction 3 is correct, I expect TM speakers to experience difficulties producing a native-like differentiation between strong and weak syllables. Furthermore, if Prediction 1 is also correct, TM speakers are expected to have greater difficulty realizing duration and intensity contrasts than realizing pitch contrasts between strong and weak syllables. It might be easier for TM speakers to discover a correlation between pitch and stress in English because they are predisposed by their first language to be sensitive to variations in pitch. Ifthis is true, I expect them to produce a pitch differentiation between strong and weak syllables, which is similar to or perhaps greater than that of English speakers.

There are two ways of measuring duration, intensity, and pitch contrasts between strong and weak syllables. Each approach tells only part of the story. One is to measure the contrasts between target (i.e., expected) strong and weak syllables. This provides a general index of how far TM speakers are from being native-like in a particular sentence. The other is to measure the contrasts between actual strong and weak syllables produced by TM and English speakers. This tells us whether TM speakers produce a greater or smaller differentiation between their strong and weak syllables than English speakers, regardless of any differences that they might have from English speakers in terms of the distribution of stress. However, the latter distinction is hard to determine based on physical measurements of duration, intensity, and pitch of syllables alone. As an alternative to the latter approach, we will try to identify possible factors (i.e., lexical categories such as content vs. function words) which might better predict the placement of strong vs. weak syllables in the speech of TM speakers when the expected strong and weak syllables do not match the actual strong and weak syllables.

43 3.2.3 DO TAIWAN MANDARIN SPEAKERS CORRELATE DURATION, INTENSITY, AND PITCH WITH THE TARGET STRESS PATTERNS INANENGLISH-LIKE WAY? Even when the same sentences are presented to TM and English speakers within a context that strongly favors a particular stress pattern, there is no guarantee that TM speakers will produce the expected stress pattern. If Prediction 5 is correct, TM speakers are expected to have difficulty realizing the prominence relations among syllables of a sentence in production. Without assuming that TM speakers intend to produce the same stress patterns as English speakers, the current research question is indeed twofold. The first dimension to this question is whether TM speakers put main stress on the right syllable. The other dimension to this question is whether TM speakers correlate duration, intensity, and pitch with stress, no matter how different their stress patterns might be from those of English speakers. For the latter, the goal is to identify possible variable(s) that might influence the distribution of stress in the speech of TM speakers. For example, given that toneless syllables are usually function morphemes in Taiwan Mandarin (see 2.3.2), do they tend to associate lack of stress with monosyllabic function words and stress with monosyllabic content words in English?

We want to examine the extent to which variations in duration, intensity, and pitch correlate with the target stress patterns in Type A and Type B sentences. First we will trace the fluctuations of duration, intensity, and pitch across syllables by counting the number of strong and weak syllables classified as relatively longer, louder, and higher­ pitched versus those that are relatively shorter, softer, and lower-pitched than their preceding syllables. In cases where TM and English speakers differ in their placement of stress, we will investigate whether other variable(s) (e.g., function words vs. content words) better correlate with the placement of stress for TM speakers.

44 3.2.4 DO TM SPEAKERS PRODUCE MORE ENGLISH-LIKE DURATION, INTENSITY, AND PITCH PATTERNS WITH IMPROVED PROFICIENCYAND EXPOSURE TO ENGLISH?

If Prediction 6 is correct, TM speakers are expected to produce a more English-like duration, intensity, and pitch patterns as their overall proficiency in English and exposure to an English-speaking environment improve. In order to compare two proficiency groups of TM speakers with respect to their degree of success in achieving the target rhythmic patterns using duration, intensity, and pitch, we will examine the following two aspects of their stress patterns: (1) the physical realization of stress, and (2) the placement of stress. We mentioned in Chapter Two that some of the prosodic cues used to mark rhythmic impulses in English are employed for other linguistic functions in TM. Although increased pitch range is used to signal emphatic stress (See Figure 2.1), variations in pitch within a syllable are primarily used for discriminating lexical tones in TM (see Table 2.1). And duration is often linked with the contrast between full-toned and neutral-toned syllables. We will investigate how successful TM speakers are in correlating pitch and duration with prominence in English at different stages of acquisition.

Second, as reviewed in 2.3.3, we have seen that at the lexical level stress is a feature of polysyllabic words in English but in Taiwanese Mandarin polysyllabic words are often spoken with full tones on all syllables. At the sentence level, any lexically stressed syllables in English have the potential for bearing sentential stress. It is possible to pronounce the same English sentence with a variety of stress patterns depending on which syllable(s) are emphasized. Therefore, in order to produce native-like English speech rhythm with any success at the sentence level, TM speakers must also be able to focus stress on the appropriate syllables.

Answers to these two research questions will be drawn from a wide variety of sources. First, we will compare TM learners of English of two different proficiency levels with respect to the basic statistics of the relative duration, intensity, and pitch of individual syllables, induding standard deviations, ranges and means where appropriate. The information will give a general picture of how native-like the two groups of TM

45 speakers are with respect to each variable. Second, we will compare the two groups of TM speakers with respect to the number of strong syllables which are shorter, softer, and lower-pitched and the number of weak syllables which are longer, louder, and higher­ pitched than those of English speakers. The results will provide crucial information about the degree of difficulty each group has with the duration, intensity, and pitch of strong versus weak syllables. Third, we will compare the two groups of TM speakers with respect to the number of strong versus weak syllables whose duration, intensity, or pitch values rise or fall from their preceding syllables. With this information, we will be able to gauge the extent to which their duration, intensity, and pitch patterns conform to the target stress patterns. Fourth, we will compare the two groups of TM speakers with respect to the correlation coefficients of their duration, intensity, and pitch across the syllables of the sentences with those of English speakers. A higher correlation will indicate a more native-like pattern. Finally, we will compare the two groups of TM speakers and the English native speakers with respect to the relative degree of differentiation between strong and weak syllables in each of the three variables. The results will provide information about the extent to which TM speakers produce native­ like differentiation between strong and weak syllables at different stages of acquisition.

3.2.5 ARE DURATION, INTENSITY, AND PITCH COORDINATED AS CORRELATES OF STRESS FOR TAIWAN MANDARIN SPEAKERS? After we have analyzed how successful TM speakers are in using each of the three variables for achieving the target stress patterns, an important question to ask is how well the three variables are coordinated among TM speakers. The purpose of this investigation is to identify any similarities and/or differences between TM and English speakers about the ways and the extent to which duration, intensity, and pitch coordinate with one another in the realization of stress.

Three different measures will be used for comparing TM and English speakers with respect to the coordination among these three variables. First, we will compare duration, intensity, and pitch in terms of the number and type of differences each shows between English native speakers and the two groups of TM speakers. Second, we will compare 46 duration, intensity, and pitch with respect to their variations across the syllables of the sentences. Specifically, we will count the number of strong versus weak syllables whose duration, intensity, and pitch values rise or lower from their preceding syllables, assuming that syllables that are longer, louder, and higher-pitched are more stressed and syllables that are shorter, softer, and lower-pitched are less stressed. Third, we will compare duration, intensity, and pitch in terms of the differentiation between strong and weak syllables for each of the three subject groups.

The results will provide information about whether the three variables present similar problems to TM speakers, whether the three variables tend to rise and fall together according to stress, and whether variations in each of the variables are consistent with one another in creating a native-like differentiation between strong and weak syllables.

Last but not least, if Prediction 4 is correct, I expect TM speakers to show little trouble with strengthening strong syllables, weakening syllables, producing a native-like contrasts between strong and weak syllables, correlating duration, intensity, and pitch with the target stress patterns, and coordinating duration, intensity, and pitch in the production of sentences that feature alternating strong and weak syllables in English, particularly when the weak syllables consist of unstressed function words.

3.3 METHOD

3.3.1 SUBJECTS

Table 3.1 Profile of the three subject groups

Group Number Age Onset Age of English Time in US All M F (years) Instruction (years) (years) Control English Native 10 5 5 24.8 - - Experimental TMESL 10 5 5 27.6 11.9 3.8 TMEFL 10 5 5 18.9 11.5 -

47 The participants in this study were 10 native speakers of English, 10 TM ESL learners, and 10 TM EFL learners. The English speakers were either graduate or undergraduate students at the University of California in Irvine or at the University of Hawai'i and all came from the US west coast. None of them were raised as bilinguals or considered themselves as fluent in any foreign language. The average age of the English native speakers was 24.8. The two groups of TM speakers were distinguished from each other along two criteria: (1) exposure to an English-speaking environment, and (2) general proficiency. The members of the ESL group were 10 TM learners of English from Taiwan, who were enrolled in a variety of graduate programs at the University of Hawai'i. Nine out of the 10 speakers provided age information. The average length of stay in the United States was three years nine months. The average age of the TM ESL speakers was 27.6 and the average onset age of English instruction was 11.9 years old. The members of the EFL group were 10 TM learners in Taiwan, who were all freshmen students enrolled at the Tamkang University in Taipei and none of them had ever been to any English-speaking countries prior to the experiment. The average age of the TM EFL speakers was 18.9 and the average onset age of English instruction was 11.5 years old. Each of the three subject groups consisted of five males and five females.

3.3.2 MATERIALS

The materials include an Agreement to Participate Form, a short questionnaire, an instruction sheet, and two stacks of test cards.

The current project and its official Agreement to Participate Form were reviewed and approved by the Committee on Human Studies of the University of Hawai'i. The Form described specifically what the subjects were asked to do for the experiment, including the kinds of data that were collected and the ways in which these data were collected, stored and used.

Two separate questionnaires were developed for each language group, with one written in English and the other in Chinese. The content of the questionnaires generally overlapped except for a few questions, which were relevant only to specific language

48 groups. For instance, the English speakers were asked about their home state and whether they had lived in any non-English speaking countries. The TM speakers were asked about their age at the onset of English instruction, their length of stay in the United States or any other English-speaking countries, and their TOEFL scores, if available.

The instructions were also presented in the subject's native language, with the English version printed one side of a piece of laminated A4 paper and the Chinese version on the other side. The purpose was to ensure that the participants understood the procedures clearly. The two versions were identical in terms of content. Both included step-by-step directions, followed by a few practice examples in English. The English examples were designed to familiarize the subjects with the type of test stimuli that they would be asked to read and to preclude any priming effect as a precaution that TM speakers might tend to process the stimuli using their own native language after reading the instructions in Chinese.

The test stimuli comprise two prosodically diverse sets of sentences with seven tokens in each set. Type A sentences feature long stretches of weak syllables. Each of these sentences is narrow-focused and consists of either a single strong syllable or two widely spaced strong syllables. The classification of 'Strong' versus 'Weak' is defined at the top level of the metrical grid where a syllable is either 'Strong' (bears pitch accent) or 'Weak' (bears no pitch accent). The distinction is made based on the prediction that TM speakers may experience difficulty in reducing unaccented syllables (see discussion in section 2.4). The weak syllables include both unaccented syllables that bear some degree of stress and totally unstressed syllables. The example in (32) shows how syllables are classified into 'Strong' (S) or 'Weak' (W) on a metrical grid for a Type A sentence, where the syllables it, for, me are weak in relation to made, which is weak in relation to Jane, which bears main stress. This results in one strong and four weak syllables.

(32) S w w w w x x x x x x x x Jane made it for me

49 Four subsets of rhythmic patterns are represented for Type A sentences. Sentences Al and A2 have the five-syllable rhythmic pattern Swwww, sentences A3 and A4 have the seven-syllable rhythmic pattern SwwwwwS, sentences A5 and A6 have the seven­ syllable rhythmic pattern wSwwwww, and sentence A7 has the eight-syllable rhythmic pattern SwwwwwSw.

Type B sentences are broad-focused and feature a highly regular rhythmic pattern of alternating strong and weak syllables where each strong syllable is immediately preceded and followed by a weak syllable. Three different rhythmic patterns are represented in the Type B sentences. Sentences BI through B4 have the six-syllable iambic (in a poetic sense) pattern wSwSwS, sentences B5 and B6 have the seven-syllable trochaic pattern SwSwSwS, and sentence B7 has the eight-syllable iambic pattern wSwSwSwS. Furthermore, the alternating rhythmic pattern featured in Type B sentences was reinforced by choosing lexical categories so that all strong syllables are content words, which are more likely to bear stress than function words, which make up the weak syllables.

Each test sentence was embedded within a brief context designed to help the subjects arrive at the same interpretation of the test sentences so as to elicit the target rhythmic patterns. Examples of these two types of test sentences with their contexts are given in (33) and (34) (see Appendix A for a full list of all sentences used in the experiment). The test sentences are shown in italics and the expected strong syllables are shown in boldface. In the actual experiment, the subjects were not made aware of the purpose of the study, nor of which sentences were being tested, nor which syllables were targeted as strong (i.e., No syllables were shown in italics or boldface in the actual experiment).

(33) Type A This is a beautiful bag! Did Mary make it for you? No, Jane made itfor me.

50 (34) Type B Well, I didn't know what happened. But anyway, Mom and Dad were mad at Jim.

Efforts were made to keep the lead-in sentences casual in order to encourage a natural speech style. For examples, we tried to incorporate colloquial expressions such as Well, Anyway, and Sure into the contexts for the test sentences. However, we deliberately avoided using formulaic expressions, (i.e., everyday phrases that learners pick up as chunks) such as How are you?, Where are you going?, Where're you from ?, That's what I thought., I donna., See you later., etc. Non-native speakers can often sound near native­ like when saying these commonly used expressions even though they might have problems with English rhythm in their non-formulaic speech.

Several criteria were taken into consideration in the construction of the experimental sentences. First, to minimize controversies over syllabification, the majority of the test sentences were composed of monosyllabic words. English speakers do not always agree about how syllables should be divided in multi-syllabic words. Treiman and Danis (1988) found that native speakers of English disagreed about where to place the syllable boundary when asked to segment words such as honest, happy, letter, and ladder into syllables. Syllabification in an interlanguage could be even more complicated. Some of the phonological processes that help define syllable boundaries may not be present in the interlanguage. For instance, the aspiration of stops in English often indicates they are syllable-initial. However, TM speakers are known to produce aspirated stops in contexts where American English speakers normally would not, such as the second Itl in the word tomato, or the Itl in kitty.

Second, efforts were made to choose sequences of words that would make the segmentation reliable. We deliberately avoided sequences of highly similar segments at syllable boundaries, such as "wan[t t]o," where it is almost impossible to determine the boundary between the two sounds within the bracket. Where possible, we also tried to avoid vowel-vowel or vowel-glide sequences because the transition between vowels is so 51 gradual that it is very difficult to pinpoint the boundary. In addition, we preferred combinations of vowels and stops or vowels and fricatives at word boundaries because they involve distinct changes of acoustic events to help with the segmentation.

Third, we tried to use words that would generate clear pitch tracks. For instance, we tried to use words with as many voiced segments as possible because pitch information for voiceless sounds is very difficult to extract. We also preferred stops to fricatives, which are relatively more difficult to extract pitch from due to their high-frequency noise. In general, the more sonorant a sound is, the better it is for pitch extraction.

Fourth, we tried to avoid sounds and sound sequences that are known to be difficult for TM speakers because the purpose of the study was not to test the pronunciations of sounds. We tried to avoid English words with sounds that are not present in TM, such as the interdental fricatives 18,01, the postvocalic alveolar lateral approximant Ill, and the voiced palato-alveolar fricative 131. Efforts were also made to avoid words with heavy initial or final consonant clusters because there are no consonant clusters in TM except for consonants followed by glides.

Fifteen sentences were originally formulated for each sentence type, and the seven that elicited the most consistent results in an informal pretest with 10 English native speakers were selected as the final test sentences. The final test sentences were then reviewed by three assistant professors teaching EFL at three different universities in Taiwan and informally pretested with two freshmen students from each of their classes to make sure that the sentences were appropriate for their proficiency levels. The 14 test sentences are given in Table 3.2, where the target strong words are shown in boldface to show the rhythmic patterns because it is hard to infer the expected rhythm when the test sentences are removed from their lead-in sentences. Note that the actual test sentences were presented to the subjects without boldfacing. A complete list of the sentences used in this study including the lead-in sentences can be found in Appendix A.

52 Table 3.2 Labels, stress patterns, and syllable numbers of test sentences

Type Item Test Sentence Rhythm Syllable Total A Al Jim wrote it with me Swwww 5 A2 Jane made it for me Swwww 5 A3 You like me to wear the jeans SwwwwwS 7 A4 You want me to bring the wine SwwwwwS 7 A5 My mom made the lemon pie wSwwwww 7 A6 The old man gave it to me wSwwwww 7 A7 Jane's the one wearing the blue dress SwwwwwSw 8 B BI I need it back by noon wSwSwS 6 B2 They play with dad and laugh wSwSwS 6 B3 We learn to read and write wSwSwS 6 B4 I met with John at noon wSwSwS 6 B5 Mom and dad were mad at Jim SwSwSwS 7 B6 John is good at baking bread SwSwSwS 7 B7 I need a ride to work at once wSwSwSwS 8

3.3.3 PROCEDURES

All subjects were recorded individually in a quiet place using an Audio-Technica ATR20 unidirectional microphone connected to a Toshiba laptop computer. The utterances were digitally recorded using Speech Analyzer v. 1.06 at 16 bits with a sampling frequency of 1l025Hz. The subjects were instructed to speak into the microphone and hold the microphone close to their mouths.

Prior to recording, subjects were first given an opportunity to read the consent form and sign it if they agreed to participate in the experiment. The experiment did not proceed without their consent. After that, they were requested to fill out a short questionnaire in their native language, which collected anonymously coded biographic information.

Upon completion of the agreement form and the questionnaire, the subjects were presented with instructions printed on a piece of laminated A4 size paper written in their native language and were given the opportunity to read the English practice sentences, which resembled the actual stimuli.

The subjects were then request@d to read from two stacks of laminated color-coded cards, with Type A sentences in one stack and Type B sentences in the other. They were 53 instructed to say the sentences naturally at their normal speed as if they were talking to a friend. The cards with the sentences were presented to the subjects in random order to minimize the possibility of any ordering effect. Once all seven sets of sentences in each stack had been read, the subjects were instructed to shuffle the cards. The same procedure was repeated twice so that each card was read three times by each speaker. After the recording for one stack of cards was completed, the same procedure was repeated for the other stack of cards. Half of the subjects within each group read Type A sentences first and the other read Type B sentences first to counterbalance any potential sequencing effect.

3.3.4 ANALYSES

3.3.4.1 Acoustical analyses The speech data were segmented into syllables and subsequently analyzed for duration, intensity, and pitch using PitchWorks v. 4.5. The duration, peak intensity, and Extreme

Point Fo' innovatively defined as the maximum Fo of a syllable whose pitch rises most of the time, or the minimum Fo of a syllable whose pitch drops most of the times, were obtained for each syllable of each test sentence (See 3.3.4.1.3 for a full description of

Extreme Point Fo). The total number of utterances analyzed was 630 and the total number of syllables analyzed was 4140 for each sentence type.

3.3.4.1.1 Measurement of duration The durations of the syllables were measured based on the placement of the syllable boundaries. Given that the test sentences were largely composed of monosyllabic words, syllable boundaries usually coincided with word boundaries. The segmentation of the syllables generally followed three basic principles. First, wherever possible, the syllable boundary was placed at a distinct acoustic event that could be reliably and consistently identified. For instance, the end of a stop can be measured at the sudden burst of energy from the stop release, while the onset of a labiodental fricative can be measured at the onset of a quasi-random wave on the waveform and the onset of a high frequency noise region on the spectrogram, both of which are characteristic of fricatives. Second, when it

54 was difficult to determine syllable boundaries, the location of a syllable boundary was placed close to the end of a real word or morphological boundary for practical reasons. For instance, the boundary between need and it in sentence B1 is placed at the onset of dark formant bands of the second vowel. In this case, it could be difficult to determine whether to syllabify the intervocalic consonant with the first or the second syllable if it was pronounced as a flap. However, sometimes the morphological boundary perfectly matched the syllable boundary. For instance, the syllable boundary between with and

John in sentence B4 could be placed at the onset of silence due to the closure of/d3/. And third, in cases where there was no visible acoustic event to help, the syllable boundary could be determined via reasonable arbitrary rules that could be reliably applied. For example, the syllable boundary between the two syllables mo[m mjade in sentence A5 was arbitrarily placed at the midpoint between the end of formant bands of the first vowel [0] and the onset of formant bands of the second vowel [e]. Note that in doing so, it is assumed that the [m] coda of the first syllable and the [m] onset of the second syllable have equal lengths, which may not always be true.

3.3.4.1.2 Measurement of intensity The intensity of a syllable was measured at its intensity maximum. The measurement of peak intensity was preferred over the measurement of the average intensity of a syllable because the measurement of peak intensity minimizes segmental variation as a potential variable. Due to the fact that intensity is highly sensitive to the phonation type, i.e., whether the segment is a vowel or a fricative, etc., a syllable that contains voiceless stops and fricatives (e.g., socks), is likely to have lower average intensity than a syllable that contains only vowels (e.g., ah). Unlike average intensity, which takes into consideration the intensity of all segments in the syllable, peak intensity considers only the loudest portion of a syllable, which usually occurs within the vowel. Therefore, we chose peak intensity over average intensity as our measure of intensity as a correlate of stress.

3.3.4.1.3 Measurement of pitch

Measuring the Fo value of a syllable is often not as straightforward as measuring duration and intensity. In a language like English where contour pitches on accented syllables are 55 common, determining an Fo value that best represents a syllable can be a great challenge.

As part of an effort to develop a suitable approach for measuring the Fo value of a syllable in English, we first review three existing approaches.

MiddleFo

This approach is sometimes used in experimental studies of intonation. With this approach, the Fo is measured right in the middle of each syllable. The biggest advantage of this method is that it can be reliably and easily applied. However, it has two major disadvantages. First, the location of Middle Fo can vary due to segmental variations. For instance, the middle Fo of a syllable with a heavy initial consonant cluster (e.g., splash) wi11likely occur earlier within the syllable than in a syllable that has no initial consonants at all (e.g., ash). Even when two syllables have identical pitch contours, we are likely to come up with two different Fo measurements. Although the middle of the syllable is likely to fall inside the vowel, it can be anywhere within the vowel. Since pitch values within a vowel can vary rapidly, Middle Fo could fall either in the lower portion or the higher portion of a pitch contour depending on the segmental composition of the syllable.

Second, it leads underestimation of a syllable's full Fo range. For instance, in a rising pitch accent L+H*, the Fo range of the syllable actually goes as high as the peak of the H*, which usually occurs toward the end of the syllable's pitch track. However, the middle of the syllable would probably occur earlier. As a result, Middle Fo is likely to be

Fo value that is somewhat lower in this case.

F0 at peak intensity

With this approach, the Fo value is measured at the maximum intensity of each syllable. Compared with the first approach, peak intensity Fo is a more dependable reference point because it is less likely to be influenced by segmental variations. In addition, the maximum intensity does not necessarily coincide with the point where the Fo excursion reaches its maximum. Another problem with this approach is that the peak intensity can correspond to several Fo values within a syllable. One would have to

56 arbitrarily decide which Foto choose, or as we decided to do in the example at the end of this section, average all the Fos that correspond to the intensity peak.

MeanFo

With this approach, the Fo value is measured by averaging all Fo values sampled within the boundaries of the syllable. This approach is quite valid for languages where there is little pitch movement inside a syllable. For languages that contain both level and contour pitch accents, however, measuring the mean Focan be problematic. By averaging the high and low elements of a contour pitch accent, we flatten its actual pitch range. For instance, the mean Foof an H* is likely to appear greater than that of an L+H* because the low element of an L+H* will bring the average Fodown.

So how can we measure the pitch of a syllable in English? For our purposes, a desirable way of measuring the pitch of a syllable should achieve three goals. First, it should show as closely as possible where pitch prominence actually occurs. Second, it should provide a good estimate of the physical range of pitch variations over the utterance. Third, it should be in crucial ways consistent with our knowledge of the characteristics of English pitch accents. As complicated as pitch is, it is difficult to achieve all of the above by sampling a single Fovalue for each syllable. A fourth goal is that when the sampled Fos of each individual syllable are connected, the result approximates the contour of the original pitch track as closely as possible. With these principles in mind, the current study proposes an alternative approach for sampling the Fo. named the Extreme Point F0(EPF0),

Extreme Point F0

This approach measures Foat the Fomaximum or minimum within a syllable. The purpose is to capture the shape and the range of the pitch contour across the utterance as closely as possible. When the syllable is accented, the approach is aimed at sampling the upper bound of an H* in pitch accents such as H* or L+H*, or the lower bound of an L* in pitch accents such as L* or L*+H. And when the syllable is unaccented, this approach is designed to sample the maximum or minimum pitch of the syllable relative to the next 57 tonal event (See Pierrehumbert, 1980; Ladd, 1996 for a review of the phonetics and phonology of English intonation). Three sampling rules were developed to serve this purpose.

Rule #1 Ifthe pitch values ofa syllable rise most ofthe time, pitch will be measured at the Fa maximum.

In English, a pitch accent can consist of either a level or a contour tone. A level pitch accent can be either High or Low. The High and Low of a pitch accent generally refer to the ceiling and the floor of the speaker's pitch range. The common types of pitch accent in English include H*, L*, L*+H, L+H*, etc. The asterisk "*,, is a convention of the TOBI system to mark the weighted tone (Ladd, 1996). For instance, the pitch accent L+H*, a very common type of pitch accent in English, has more weight on the high than

on the low (e.g. Jane in Figure 3.1). The Fo values of an L+H* usually rise rapidly throughout. Even when the pitch accent is a monotone H*, it usually appears to be slightly rising because of the transition from the preceding syllable. With this rule, the

EPFo should sample the upper bound of the H*. When the syllable in question does not carry any pitch accent, this rule directs EPFo to sample the upper bound of a tonal interpolation between a low tone and a high tone.

Rule #2 If the pitch values of a syllable fall most of the time, pitch will be measured at the Fo minimum.

This rule is intended to capture the L* in pitch accents such as L* or an L*+H. In

an L*+H, the Fo values usually remain low within most of the syllable with a relatively

late onset of the Fo rise. Therefore, if the pitch values of a syllable show a predominant fall, it contains a weighted low tone. Even when the pitch accent is a monotone L*,.it usually appears slightly falling due to the tonal transition from the previous syllable. When the syllable in question does not carry any pitch accent (e.g., is in Figure 3.1), this

rule directs the EPFo to sample the lower bound of a tonal interpolation between a high tone and a low tone.

58 Rule #3 Ifthe Fa drop occurs on the final syllable ofa declarative utterance and is preceded by an Fa rise, pitch will be measured at the Fa maximum.

Phrase or sentence-final syllables have to be treated differently because they are also marked by phrasal and boundary tones in English. For instance, when a predominant

Fo drop is observed on the final syllable of a declarative sentence and the Fo drop is preceded by an Forise, it is usually an L+H* or an H* followed by a low phrasal tone and a low boundary tone (e.g., Mom in Figure 3.1). In that case, this rule directs the EPFoto sample the upper bound of the high pitch accent rather than the trailing low phrasal or boundary tones.

With gracious help from Dr. Jonas Chen, a computer program was written to automatically sample the EPFos of individual syllables from the segmented pitch log files for all utterances generated using PitchWorks. Figure 3.1 illustrates how the Extreme Point Fos are measured. The sentence was produced by a male English native speaker. The complete context within which the target sentence was embedded is given below. The target sentence is shown in italics and the target strong syllables are shown in boldface.

59 (23) I think you're mistaken. Jane is not my mom. She is my sister.

..yll.. bln jane is not my mop Il:In~ L+HiI L+HiI L+H* L-L~ E:pr ~ iI *I iI *l ·~~jr. ~ TIlT II ."w,rrrrrrr r """.._\...... , ~ ~" /" V··'~~--~'c .•__, • I,//~~"""-~ ~_r I,f'/'/ r'~~~\- r~""'\·\ r/'"///- 250 -_. 200 •

150 ...._! !I...... •• 10. • ••• ...... • • ...... 10...... - 100 ...... - I I I I I 50 '"" 200 400 SOD 800 1000 Figure 3.1 The sampling of the Extreme Point Fosfor the utterance Jane is not my mom.

Three syllables, Jane, not, and mom, are assigned L+H* pitch accents for this sentence. The Fos of these syllables were measured at the Fa maxima to capture the weighted H*, following sampling rule #1 for the syllables Jane and not, and rule #3 for the final syllable mom. The pitch movements within the unaccented syllables, is and my, generally decline due to the tonal interpolation between the preceding high tone and the following low tone. For these syllables, the EPFs were sampled at the Fa minima within the syllable following rule #2. The vertical lines immediately to the right of the asterisks in the "EPF" row mark the point in time where each EPF was sampled.

Figure 3.2 shows what the sampled Fos look like on the original pitch track. The labeled points mark the Fa values and the points in time where the Extreme Point Fos were sampled.

60 170

160

150

143 140 e.N' .r:. 130 u := 0. • 120

110

100

90 <:l ~~~~~~~~~~~~~~~~~~~~~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ti ti ~ ~. ~ ~ ~ ~ ~ Time (Ms)

Figure 3.2 The corresponding Extreme Point Fos on the original pitch track for the utterance Jane is not my mom.

The EPFoapproach is not only innovative but also theoretically sound. To show its strength, we compare this approach against the three earlier described approaches for measuring Fowith respect to how closely each approach corresponds to the pitch accents and the pitch contours of the original pitch track.

61 170

160

150

140 -t-MidFO Ne. _dBMax ..r::: 130 MeanFO ~ ii: "ooq)'I!'"'''' EPFO 120

110

100

90 jane is not my mom Syllable

Figure 3.3 The Middle Fos, the Fos at Peak Intensity, the Mean Fos, and Extreme Point Fos for the utterance Jane is not my mom.

A few observations can be made based on the results shown in Figure 3.3. First, all three earlier approaches, including the Middle Fo' Peak Intensity Fo' Mean Fo' fail to represent the high pitch accent on the final syllable. Both Middle Foand Peak Intensity Fo of the final syllable reflect instead the lengthy low phrasal and low boundary tones at the

end of the declarative statement. The Mean Foof the final syllable averages the high weighted tone of the pitch accent with the lengthy low phrasal and boundary tones, therefore underestimating the height of the high tone. EPFo is the only approach that successfully passes this test. Second, Middle Fo' Peak Intensity Fo' and Mean Foeach may underestimate the range of the pitch excursion of a contour pitch accent (e.g., Jane). Mean Fois particularly susceptible to flattening the contour pitch accents by averaging out

the high and the low elements. We can see in this example that the Mean Foconsistently lowers the upper bounds of the high pitch accents and raises the lower extremes. Among

62 the four approaches, EPFomost authentically represents both the pitch accents and the pitch contour of the original pitch track.

3.3.4.2 Normalization of data After the duration, intensity, and pitch of individual syllables were measured, the absolute measurements were then normalized within each utterance to control for variations in speech rate, loudness of speech, and pitch ranges across utterances.

The absolute duration, intensity, and pitch data are valuable in showing the actual differences between subject groups. For instance, by using the absolute duration in milliseconds to compare the groups, one can see whether TM learners generally take longer to produce each utterance. However, caution must be taken when one uses absolute duration, intensity, and pitch to compare groups because these differences often heavily reflect individual variations in speech rate, loudness, and pitch ranges. In order to eliminate these factors as potential variables, the absolute duration, peak intensity, and Extreme Point Fomeasurements were normalized within each utterance before any further computations or comparisons were made. After all, as far as rhythm is concerned, it is the relative prominence relationships among syllables of an utterance that count.

3.3.4.2.1.1 Normalization of duration There are two ways of normalizing duration within an utterance. The first approach arbitrarily assigns the longest syllable a value of one and converts others into proportions. The second approach divides the duration of individual syllables by the duration of the entire utterance and shows them as percentages of the length of the entire utterance. The second approach is advantageous over the first in two ways. First, only the second approach truly eliminates speech rate as a variable. The second approach does exactly that. After the normalization procedure, the duration of all utterances becomes 1.00 so that there are no more rate differences across utterances. However, with the first approach, even after normalization, the durations of entire utterances are bound to vary and therefore so are the speech rates. Second, the duration ratio obtained using the first approach is biased by the duration of a single syllable, whose length is an independent random variable. For example, suppose two people both produce the greatest lengthening 63 on the final syllable. One speaker produces a moderate final lengthening while the second speaker produces considerably greater final lengthening. With the first approach, the duration ratios of the other syllables will appear much smaller for the second speaker than for the first with other things being equal. By taking the duration of all syllables into consideration, the second approach is less susceptible to the duration variations of a single syllable. Third, the first approach is computationally more demanding. As the longest syllable may vary across utterances, one will have to manually compute the duration ratio for each syllable of each utterance. With the second approach, the duration percentages can be more easily computed. Consequently, the current study normalized duration within an utterance by applying the second approach.

3.3.4.2.1.2 Normalization of intensity The normalization of intensity across syllables within an utterance is less complicated. The intensity ratio of a syllable was simply obtained by dividing the obtained peak intensity for that syllable by the maximum intensity of the utterance. In doing so, we eliminate variations in loudness of speech as a potential variable across utterances.

3.3.4.2.1.3 Normalization of pitch The normalization of pitch among syllables within an utterance proceeded in two

steps. First, the Fo frequency in Hz was converted into semitones to make the units of comparison perceptually equal. The reason for doing this is that a given absolute difference in Hz is perceived differently at different pitch ranges. It takes a larger difference in Hz to achieve the same amount of perceptual difference at a higher pitch range than at a lower pitch range. Semitones take such perceptual differences into consideration, so that a difference of one semitone at a higher pitch range is perceived the

same as a difference of one semitone at a lower pitch range. After the Fo frequency in Hz was converted into semitones, the semitone value of each syllable was calculated as a ratio of the highest semitone value in that utterance.

The concept of a semitone originates from music theory. Each of the 12 notes in an octave is one semitone higher than its adjacent note toward the lower end of the pitch 64 scale. The ratio of frequencies between successive notes is a constant. When we multiply the frequency of a note (or any frequency) by this constant 12 times, we end up doubling that frequency. Therefore, this constant ratio equals the twelfth root of 2, or approximately 1.0595. This constant ratio is called a semitone. In the current study, we converted frequency data in Hz into semitone units. The procedure took shape in two steps.

First, we established a Hertz to Semitone Conversion Scheme. The 'Note' to 'Frequency' to 'Semitone' conversion chart for music is expanded to be inclusive of the

Fo range of ordinary speech data. The chart starts with "C" at 32 Hertz because this is the closest "C" to the estimated lowest frequency sample in the speech data. This C is then arbitrarily set to be the zero of the semitone scale. In the rightmost column, you will find the corresponding semitone value for the notes at various frequencies. As the frequency ratio between successive notes is constant at one semitone, the semitone scale is designed so that a difference of one on the scale means exactly a difference of one semitone.

65 Table 3.3 The Hertz to Semitone conversion chart

Note Name Frequency Semitone (Hertz) Scale

C 32*2°112 32 0

C# 32*21/12 34 1

D 32*22112 36 2 3112 D# 32*2 38 3

......

C 32*i2l12 64 12

C# 32*213112 68 13

D 32*i4l12 72 14 15112 D# 32*2 66 15

......

C 32*i4l12 128 24 25 C# 32*2 /12 136 25

D 32*226/12 144 26 27112 D# 32*2 152 27

...... 36112 C 32*2 256 36

7 C# 32*i /12 272 37 38 D 32*2 /12 288 38 39112 D# 32*2 305 39

......

66 Next, we conducted a series of mathematical derivations for converting frequency in Hz into semitones. Given that X represents the frequency in Hz and Y represents the corresponding semitone level, the relationship between the two variables can be written as follows.

(35) X =Frequency in Hz Y =Semitone X =32*2Y/1Z

The relationship between X and Y should become clearer with the help of Table 3.4.

Table 3.4 Mathematic derivation of semitones from frequency in Hz

Frequency Mathematic Semitone (Hertz) Derivation

32 32*2°112 0

64 32*i2l12 12

128 32*224112 24

256 32*236112 36

Based on the relationship between X and Y, the following step-by-step mathematic derivations were performed to obtain y.

(36) Step 1. X =32*2YIIZ

Step 2. X/32 =2Y/IZ Step 3. logiX/32) =Y/12 Step 4. 12* logz(X/32) =Y Step 5. Y =12* logz(X/32)

67 Now that we know how Y can be mathematically derived from X, we can convert any frequency into a semitone. Once frequency is converted into semitones, we can describe the pitch differences among syllables in semitone units.

3.3.4.3 Statistical analyses A set of four statistical analyses were performed to provide supporting evidence for answering the research questions. First of all, general comparisons were drawn from the basic statistics, including means, standard deviations, and ranges of both the absolute and the normalized data. Second, Student's t-tests were conducted for each individual syllable between each pair of the three speaker groups to identify significant differences. We focused on the number of strong versus weak syllables that were significantly different in non-final versus final positions between English native speakers and the two groups of TM speakers. The results provide crucial information about the degree of difficulty TM speakers experience with each of the three variables, the distribution of difficulties between strong vs. weak syllables in non-final vs. final positions, and any differences between the two groups of TM learners of English. Third, Pearson Product-moment correlation coefficients were obtained for the duration, intensity, and pitch of each sentence between pairs of the three subject groups. The results provide an estimate of how closely the duration, intensity, or pitch ratios between pairs of the three groups co­ vary across syllables of a sentence. Fourth, Test-retest reliability indices were calculated for the three productions of each sentence for each group. The 'results measure how consistent the subjects were with their duration, intensity, and pitch ratios across the three productions of the test sentences. Three reliability indices were obtained for each sentence for each group. The significant level for all statistical tests were set at p

68 with such a small P-value. Therefore, Bonferoni adjustment was not performed on any of the statistical tests.

69 CHAPTER 4: RESULTS AND DISCUSSION (1): DURATION

This chapter reports and analyzes results from the duration data of Type A and Type B sentences. English, TM ESL, and TM EFL speakers are compared with respect to their use of duration as a correlate of stress in English. Results of the two rhythmically diverse sets of sentences are reported separately and then combined for more detailed analyses. The focus of the current investigation will be the extent to which these two types of rhythmic patterns differ in terms of the types and degrees of difficulty in duration presented to TM speakers and how such differences may help us understand the source of any difficulties.

The duration data of Type A are reported in Section 4.1 and those for Type B sentences in Section 4.2. These two sections are organized into five subsections each. The first pair of sections, 4.1.1 and 4.2.1, report basic statistics and patterns revealed by the absolute durations of syllables in ms. The data are valuable in showing the absolute durational differences between subject groups. However, caution should be taken when one compares absolute durational differences across groups because these differences often heavily reflect individual variations in speech rates. Therefore, interpretations of the absolute duration data are limited to revealing general patterns.

The second pair of sections, 4.1.2 and 4.2.2, report basic statistics and general patterns revealed from relative durations. The relative duration of a syllable is obtained by converting its absolute duration into a percentage of the duration of the entire utterance. The purpose is to eliminate speech rate as a potential variable. In the third pair of sections, 4.1.3 and 4.2.3, direct syllable-to-syllable comparisons are made between speaker groups based on their relative durations. Both similarities to and significant differences from the duration patterns of English speakers are reported. The fourth pair of sections, 4.1.4 and 4.2.4, report correlation coefficients of overall duration patterns between speaker groups for each test sentence to provide additional information about how close to target ESL and EFL speakers are. The fifth pair of sections, 4.1.5 and 4.2.5,

70 report test reliability information for all test sentences to provide information about how consistent NS, ESL, and EFL speakers are with their duration patterns across three productions of the same sentence.

4.1 DURATION PATTERNS OF TYPE A SENTENCES

This section reports duration results of Type A sentences, which feature long stretches of weak syllables. Each sentence contains either one single strong syllable or two widely spaced strong syllables. Four rhythmic patterns are represented. Sentences Al and A2 feature the five-syllable rhythmic pattern Swwww, sentences A3 and A4 feature the seven syllable rhythmic pattern SwwwwwS, sentences A5 and A6 feature the seven syllable rhythmic pattern wSwwwww, and sentence A7 has the eight-syllable rhythmic pattern SwwwwwSw. The total number of utterances analyzed is 630 and the total number of syllables analyzed is 4140.

4.1.1 ABSOLUTE DURATiON PATTERNS

This section summarizes results based on the absolute durations of the individual syllables. A group average duration in milliseconds was obtained for each syllable of every test sentence. Overall observations based on the lengthening and shortening of syllables across sentences and the basic statistics, including means, standard deviations, ranges of absolute duration, and the number of syllables uttered per second, are reported.

71 Table 4.1 Mean syllable durations in ms for Type A sentences

Syllable Mean SD Range SylJ sec wrote it with me 162.33 108.00 193.33 208.00 257.67 204.00 221.67 271.67 325.33 271.00 256.67 295.00 made it for 171.33 100.67 174.33 241.33 123.00 205.33 276.33 293.67 192.67 240.67 You like me to wear the 162.67 220.00 123.00 101.00 167.00 89.00 189.67 266.67 188.00 193.00 219.67 133.00 193.67 299.00 233.33 272.67 280.67 161.00 You want me to bring the 170.67 138.00 151.33 91.67 243.33 65.00 197.00 229.67 219.67 192.67 281.00 105.67 227.00 267.33 257.00 239.67 351.67 151.33 M made the Ie mon 152.67 197.00 65.33 119.33 156.33 182.67 268.00 110.00 144.67 190.67 214.00 302.00 192.33 182.33 231.00 The old man gave it to 117.00 189.00 173.33 105.00 101.00 129.33 269.00 264.00 231.33 177.67 109.67 178.67 316.33 320.00 311.67 277.00 155.67 Jane's the one wear .ing the 49.00 195.67 175.00 115.00 39.67 91.67 311.00 223.(i7 149.33 77.00 209.67 369.00 272.67 214.00 126.00

The syllables are generally shortest for NS, longest for EFL and in between for ESL. NS produce the shortest syllables across the board except for the final syllable in sentence A3. EFL produce the longest syllables among all subject groups except for the first syllable in A2, and the final syllables in sentences A3, A4, and A5. The mean syllable duration for all Type A sentences averages 177.38 ms for NS, 232.75 ms for

ESL, and 273.28 IDS for EFL. The mean syllable duration reflects the flip side of speech rates, which are fastest for NS, slowest for EFL, and in between for ESL. NS on average

72 speak at the rate of 5.64 syllables per second, ESL at the rate of 4.31 syllables per second, and EFL at the rate of 3.67 syllables per second. Advancement in English proficiency yields positive improvement on overall speech rates for TM speakers. The variation of absolute duration among syllables of a sentence tends to be greatest for ESL, smallest for EFL, and in betweenfor NS. NS and ESL on average produced very similar standard deviation of absolute duration for Type A sentences. The average standard deviation of absolute duration for all Type A sentences is 80.61 ms for NS, 80.06 ms for ESL, and 70.06 ms for EFL. The ranges between the longest and the shortest syllables of the sentences are generally broadest for ESL, narrowest for EFL, and in between for NS. The average range of syllable duration for all Type A sentences is 228.71 ms for NS, 240.48 ms for ESL, and 195.29 ms for EFL. Among the three subject groups, EFL speakers produce syllables that are longest but least variable in length.

73 (a) Mean syllable duration (ms) for sentences Al (Swwww)

400 A .. 350 " .. "A '- " .. 300 ...... 250 ". ",,- 1/1 " .§...... • NS c --ESL :8 200 l! ...... EFL ::I 0 150

100

50

0 Jim wrote it with me Syllable

(b) Mean syllable duration (ms) for sentence A2 (Swwww)

350

300 .. t1 & ... , 250

'iii E 200 • NS --c --ESL 0 i...... EFL ::I 150 0

100

50

0 Jane made it for me Syllable

74 (c) Mean syllable duration (ms) for sentence A3 (SwwwwwS)

450 -r------,

400 +------1IIIIO------f

350 +------

300 +------>it<------¥----I

"iii I' ...... r;;;;.~NSl S 250 +-__~"___"'c--_-----'''--~'''-----..------'----''=_I_--______j I- • NS l$ --ESL ~ 200 +------.ItiIII"---,~-~-____"Io=_--__-=---'l-----"lr____,f+_------f ...... EFL c:::l 150 +------''''------'~------~~'"""-~'!EU'-----I

100 -1------~~~----~y:..-.----~

50 +------1

O+------,.----r----,...------,-----,----,....----I You like me to wear the jeans Syllable

(d) Mean syllable duration (ms) for sentence A4 (SwwwwwS)

500

450

400

350 , "iii 300 " E .IF • NS --c ...... --ESL 0 250 ~ ...... EFL c:::l 200

150

100

50

0 You want me to bring the wine Syllable

75 (e) Mean syllable duration (ms) for sentence A5 (wSwwwww)

400

350 .. 300

Ui" 250 E • NS --r:: --II-ESL 0 200 ....III 'Ill - EFL ~ - c 150

100

50

0 My mom made the Ie man pie Syllable

(f) Mean syllable duration (ms) for sentence A6 (wSwwwww)

400

350 .. -... - .. I• 300 ,. -...... 250 III §. • NS r:: 0 200 --II-ESL ..I! - 'Ill - EFL ~ c 150

100

50

0 The old man gave it to me Syllable

76 (g) Mean syllable duration (ms) for sentence A7 (SwwwwwSw)

500

450

400 '\ 350

'iii' 300 .§. • NS g 250 --ESL ;: III... - ... - EFL d 200

150

100

50

0 Jane's the one wear ing the blue dress Syllable

Figure 4.1 Mean syllable durations (ms) for sentences Al through A7

Five observations can be made based on the results summarized in Table 4.1 and Figure 4.1. First, syllables are noticeably longer for ESL and EFL speakers than for NS across the sentences. Generally speaking, EFL speakers produce the longest syllables across groups. Second, the longest syllables of the sentences are usually the first strong syllables or the final syllables. The longest syllables are the first strong syllables in three sentences AI, A2, and A7 for NS and ESL speakers, and in three sentences AI, A5, and A7 for EFL speakers. The longest syllables are the final syllables in three sentences A3, A4, and A5 for NS, in four sentences A3, A4, A5, and A6 for ESL speakers, and in A2, A3, A4, and A6. Third, regardless of major differences in speech rates, syllables appear to lengthen and shorten in similar fashion across groups. The syllables that are lengthened by English speakers are also lengthened by TM speakers, but the syllables that are shortened by English speakers are not necessarily shortened by TM speakers. For example, EFL speakers lengthen the weak syllable made in sentence A2, and the weak

77 syllable gave in sentence A6, both of which are shortened by English speakers. Fourth, in addition to strong syllables and final syllables, weak syllables, especially weak content words, are sometimes lengthened. Here content words are defined as words that denote object, property, or action, including nouns, adjectives, adverbs, and verbs. For instance, the weak syllables wear in sentence A3, bring in sentence A4, man in sentence A6, and one in sentence A7 are lengthened by all groups. Fifth, other factors, such as the number of segments in a syllable and the inherent length of a syllable, could also affect the actual length of a syllable. For example, fewer number of segments may explain why the target strong syllable you in sentence A3 appears relatively shorter than the immediately following target weak syllable like. Additionally, the larger number of syllables and the presence of inherently longer segments such as fricatives may explain why the syllable with in Al and the syllable for in A2 are relatively longer than the preceding weak syllable it in the speech of NS.

4.1.2 RELATIVE DURATIONS PAITERNS

This section reports results based on the relative durations of the syllables of a sentence. The relative duration of a syllable is calculated as its percentage of the lengths of the entire sentence. To eliminate variations in speech rates across utterances, the lengths of the syllables are normalized within each utterance first. After that, a group average duration (%) is obtained for each syllable of a sentence by averaging the durations (%) of that syllable in the 30 utterances. These basic statistics, including means, standard deviations, and ranges, are summarized in Table4.2.

78 Table 4.2 Mean syllable duration as its percentage of the total sentence duration for Type A sentences

GrOll S lIable Al wrote it with me NS 17.55 11.63 20.97 22.20 ESL 20.02 15.78 17.53 20.98 EFL 21.19 17.80 16.90 19.67 A2 made it for NS 19.03 11.08 19.24 ESL 21.05 10.49 17.93 EFL 20.97 22.22 14.43 18.23 A3 You like me to wear the NS 12.84 17.33 9.76 7.91 13.21 7.02 ESL 11.71 16.61 11.79 12.08 13.72 8.36 EFL 10.62 16.23 12.61 14.71 15.30 8.76 A4 You want me to bring the NS 14.16 11.32 12.47 7.50 20.07 5.36 ESL 11.62 13.48 12.92 11.30 16.89 6.35 EFL 12.07 14.31 13.68 12.75 18.94 8.10 AS M made the Ie mon NS 12.38 15.93 5.29 9.71 12.61 ESL 12.02 17.54 7.15 9.61 12.55 EFL 12.12 16.99 10.78 10.07 12.98 A6 The old gave it to NS 10.04 16.23. 14.88 9.01 8.64 ESL 8.83 17.99 17.63 15.65 11.97 7.37 EFL 9.35 16.62 16.80 16.57 14.53 8.21 A7 Jane's the one wear .ing the blue dress NS 3.43 13.73 12.30 8.08 2.78 16.90 18.21 ESL 4.70 15.77 11.76 7.77 3.97 16.80 17.29 EFL 9.00 15.85 11.79 9.24 5.37 15.03 14.63 AI·7 Ion est shortest NS 26.88 7.49 ESL 23.98 8.50 EFL 21.02 10.26

Six observations can be made based on the results summarized in Table 4.2. First, the longest syllables of the sentences are usually the first strong syllables or the final syllables. The longest syllables are the first strong syllables in three sentences for NS (AI, A2, A7), ESL (AI, A2, A7) andEFL (AI, AS, A7) and they are the final syllables in three sentences for NS (A3, A4, AS) and in four sentences for ESL (A3, A4, AS, A6) and EFL (A2, A3, A4, A6). Second, the longest syllables of the sentences are relatively

79 longer for NS than for ESL and EFL. The relative duration of the longest syllables of all sentences averages 26.88% for NS, 23.98% for ESL speakers, and 21.02% for EFL speakers. Third, the shortest syllables of the sentences are all unstressed function words, such as it, the, to, except for Ie of EFL speakers in sentence A5. The unstressed function words with reduced vowels, especially it, the, to, are particularly short in the speech of NS. Fourth, the shortest syllables tend to be relatively shorter for NS than for ESL and EFL speakers. The shortest syllables of the sentences are relatively shortest for NS in five sentences AI, A3, A4, A5, and A7. The relative duration of the shortest syllables of all sentences averages 7.49% for NS, 8.50% for ESL speakers, and 10.26% for EFL speakers. As a direct result of observations Two and Four, NS produce the widest ranges of relative duration for all Type A sentences. The average range of duration percentages is 19.39% for NS, 15.49% for ESL, and 10.75% for EFL speakers. Fifth, the standard deviations of duration percentages are greatest for NS and smallest for EFL speakers for all Type A sentences. The average standard deviation of duration percentages of all sentences is 6.85% for NS, 5.57% for ESL, and 3.88% for EFL speakers. Sixth, English speakers produce greater lengthening on final syllables than ESL and EFL speakers. The final syllables of NS are relatively longer than those of ESL and EFL for all Type A sentences.

Figure 4.2 shows the duration patterns of Type A sentences in terms of percentages of the lengths of the entire sentences.

80 (a) Mean syllable duration (%) for sentence Al (Swwww)

35 -r------,

30 +------1

25 +------"!'...... ,~------I

~ 20 +------":~~_------_F=---_____c:..,,_~ii'_----j • NS c:o ! -- ESL 15 15 +------"'~----"'=----~'------I - .. - EFL

10 +------j

5+------1

0+-----..,.-----..,-----...,.....------,------1 Jim wrote it with me Syllable

(b) Mean syllable duration (%) for sentence A2 (Swwww)

35 -r------,

30 +------1

25 +-----....~,_____------_::J __----j

~ 20 +----==------'''=*-_------::;;;;;c:;;;;orttP-'''------I • NS c: o --ESL ~ 15 15 +------~-__"__,.:__""-~~------l - .. - EFL

10 +------1------1

5+------l

0+-----..------,,------.,.------,------1 Jane made it for me Syllable

81 (c) Mean syllable duration (%) for sentence A3 (SwwwwwS)

35

30

25

~ 20 • NS c 0 :;::: --ESL l! .... EFL c:I 15

10

5

0 You like me to wear the jeans Syllable

(d) Mean syllable duration (%) for sentence A4 (SwwwwwS)

35 T"""------...,

30 +------=:----1

25 +------6------1

~ 20 +------..------I--.IIr-----1 • NS c o --ESL ~ IS 15 +-----::::------.------...£-,f-'--~---_I_----l - .. - EFL

10 +------"'~-_+------'Il~_4'------1

5+------~~----_:_I

O+-----,.-----r----r----,.---....-----,,....----:-I You want me to bring the wine Syllable

82 (e) Mean syllable duration (%) for sentence A5 (wSwwwww)

35

30

25

~ 20 • NS c:: 0 :;:; --ESL lIS ... • 'lIt.. EFL 6 15

10

5

0 My mom made the Ie mon pie Syllable

(f) Mean syllable duration (%) for sentence A6 (wSwwwww)

35

30

25

~ -- 20 • NS c:: 0 i --ESL ... • 'lIt.. EFL 6 15

10

5

0 The old man gave it to me Syllable

83 strong or weak, weakly stressed or unstressed. However, TM speakers produce relatively shorter final syllables than English speakers.

4.1.3 SIGNIFICANT DIFFERENCES BETWEENNS, ESL AND EFL

T-tests were performed on individual syllables to determine whether or not the obtained differences in relative duration between pairs of groups were statistically significant or likely by chance. Results of the t-test scores for the duration percentages of individual syllables between pairs of groups are shown in Table 4.3.

Table 4.3 Student's t-test scores for duration (%) ofindividual syllables between pairs of groups for Type A sentences

Al with me NS-ESL ---:2::-=.47::078*-:---+--:0:=:.5=-=87"8---1-----+---+----11 NS-EFL -=2~.7:..::5:..::6_*----1f-'1~.4.:,:5:....:4-_I_---+-----+--__j1 ESL-EFL 0.416 0.616 A2 for me NS-ESL 1.234 1.440 NS-EFL 0.951 1.118 ESL-EFL 0.282 0.416 A3 to NS-ESL NS-EFL ESL-EFL 0.715 A4 You NS-ESL 2.645* NS-EFL 2.649* ESL-EFL 0.475 AS My mom NS-ESL 0.549 0.028 NS-EFL 0.293 0.477 ESL-EFL 0.130 0.353 A6 The old NS-ESL 1.233 1.429 NS-EFL 0.641 0.400 ESL-EFL 0.503 1.411 A7 Jane's the wear NS-ESL 2.685* 2.814* 0.784 NS-EFL 0.689 ESL-EFL 2. bO~.0;,;7,,;;0=~0;;.;.04~2=~~~=='=.;;;;;;;~=~~~= *p<.05 (tcrlt=2.101), **p<.Ol(tcrlt=2.878), df=18, two-tailed

85 In the next two subsections we will examine which syllables are found to be significantly different between groups in tenns of duration percentages. Section 4.1.3.1 reports and compares differences between native and non-native speakers. Results provide crucial infonnation as to the kinds of difficulties TM speakers have with duration as a correlate of stress, as well as in what ways ESL and EFL are similar or different in their problems with duration. Section 4.1.3.2 examines significant differences between ESL and EFL. Results in this section provide infonnation about the changes that might have taken place from EFL to ESL in the way duration correlates with stress.

4.1.3.1 Differences between NS and ESL vs. differences between NS and EFL Out of 46 syllables compared for Type A sentences, a total of 19 syllables was found to be significantly different between NS and ESL. They comprise two relatively shorter strong syllables in non-final position found in sentences A4 (you) and A7 (Jane's), 12 relatively longer weak syllables in non-final position in sentences Al (it), A3 (me, to, the), A4 (want, to, the), A5 (the), A6 (it) and A7 (the, one, the), three relatively shorter weak syllables in non-final position in sentences Al (with), A4 (bring), and A6 (man), one relatively shorter strong syllable in final position in sentence A3 (jeans), and one relatively shorter weak syllable in final position in sentence A5 (pie).

Table 4.4 Number of strong and weak syllables with duration (%) significantly different from NS in non-final vs. final positions for Type A sentences

Position Stress Weak Total (k=5) differ Lon er shorter o 1 19 EFL 0 =,,;;;2~d=~= o 2 27 * Shaded cells dilute the contrast, unshaded ones do not.

A total of 27 syllables was found to be significantly different between NS and EFL. They comprise six relatively shorter strong syllables in non-final position in sentences Al (Jim), A2 (Jane), A3 (you), A4 (you), A7 (Jane's, blue), 15 relatively longer weak syllables in non-final position in sentence Al (wrote, it), A2 (it), A3 (me, to, wear, the), 86 A4 (want, to, the), AS (the), A6 (man), and A7 (the, one, the), two relatively shorter weak syllables in non-final position in sentences Al (with) and A6 (it), two relatively shorter strong syllables in final position in sentences A3 (jeans) and A4 (wine), and two relatively shorter weak syllables in final position in sentences AS (pie) and A7(dress).

ESL produced the same types of difficulties with duration as a correlate of stress as EFL. Significant differences between NS and ESL and significant differences between NS and EFL highlight three types of difficulties, (1) relatively shorter strong syllables in non-final positions (you in A4 and Jane's in A7), (2) relatively longer weak syllables in non-final positions (it in AI, me, to, the in A3, want, to, the in A4, the in AS, it in A6, the, one, the in A7), and (3) relatively shorter final syllables (jeans in A3). Despite having the same types of problems, ESL speakers experience slightly less difficulty with duration than EFL speakers. A smaller number of syllables are found to be significantly different between NS and ESL than between NS and EFL. Compared with EFL speakers, ESL speakers produce fewer relatively shorter strong syllables and relatively longer weak syllables in non-final positions. EFL speakers produce a very high rate of relatively shorter strong syllables. Six of their eight strong syllables in non-final positions are significantly shorter than those of NS. ESL speakers have slightly less difficulty with lengthening final syllables than EFL speakers have. In particular, both ESL and EFL show little difficulty lengthening final unstressed function words. None of their final function words in sentences AI, A2, and A6 are significantly different from those of NS.

4.1.3.2 Significant differences between ESL and EFL This section reports significant differences in relative duration between ESL and EFL. Syllables that are found to be significantly different between these two groups are further examined under four categories: strong syllables in non-final position, weak syllables in non-final position, strong syllables in final position, and weak syllables in final position. The purpose is to identify patterns in changes that might have taken place due to improved proficiency and increased exposure to English.

87 Table 4.S Number of strong and weak syllables with duration (%) significantly different between EFL and ESL in non-final vs. final position for Type A sentences

Position Non-Final Final Total Stress Strong Strong (k=8) (k=2) T e Ion er shorter Ion er Shorter EFLvs.ESL 2 ESUNS only EFUNSonly 1 1 1 EFUESUNS 1 1 1 ..ESUEFL only 1

Of the 46 syllables compared, the relative duration of 17 syllables was found to be significantly different between ESL and EFL. Of these 17 syllables, EFL speakers produced two relatively shorter strong syllables in non-final position in A7 (Jane's, blue), 10 relatively longer weak syllables in non-final position in A2 (it), A3 (to, wear), A4 (bring, the), AS (the), A6 (it), A7 (the, -ing, the), two relatively shorter strong syllables in final position in A3 (jeans) and A4 (wine), and three relatively shorter weak syllables in final position in AS (pie, me, dress). It appears that EFL speakers differ from ESL speakers in ways that are higWy similar to the ways ESL and EFL differ from NS. While ESL and EFL speakers both produce relatively shorter strong syllables, relatively longer weak syllables in non-final position, and relatively shorter strong and weak syllables in final position than NS, EFL speakers produce more instances of each of these difficulties than ESL speakers. The results suggest positive improvement from EFL to ESL on the lengthening of strong syllables, shortening of weak syllables, and final lengthening.

88 Table 4.6 Number of strong and weak syllables with duration (%) significantly different between EFL and ESL speakers categorized as content and function in non-final vs. final position

Position Non-Final Final Stress Strong (k=8) Weak (k=31) Strong (k=2) Weak (k=5) Length Shorter (k=2) Longer (k=1O) Shorter (k=2) Shorter (k=2) Svl.Tvoe content I function content I function content I function content I function Number 2 I 2 I 8 2 I 2 I

ESL speakers have less difficulty shortening weak function words than EFL speakers. Of the 10 relatively longer weak syllables produced by EFL speakers, eight are weak function words. They are the syllable it in sentences A2 and A6, the syllable to in sentences A3, the syllable -ing in sentence A7, and the syllable to in sentences A3, A4, AS, and A7. We also notice that both of the relatively shorter strong syllables in non-final position and all of the relatively shorter strong and weak final syllables are content words. These results suggest that TM learners may start out having greater difficulty with shortening unstressed function words, lengthening strong content words, and lengthening final syllables at earlier stages of acquisition.

Not all of the differences between ESL and EFL speakers translate into difficulties with speech rhythm. Simply because EFL and ESL differ from each other on the relative duration of one syllable does not mean either of them produce the syllable significantly different from English speakers. For example, the syllable me in sentence A6 is found to be significantly different between ESL and EFL, but not between NS and ESL or NS and EFL. This is a typical example of differences between ESL and EFL that do not amount to difficulties for ESL or EFL. Differences that are interpreted as difficulties are syllables that are significantly different between NS and ESL and/or between NS and EFL. When a syllable is found to be significantly different both between NS and ESL and between NS and EFL, it implies a difficulty common to both ESL and EFL. When a syllable is found to be significantly different between NS and ESL but not between NS and EFL, it implies

89 a difficulty unique to ESL speakers. Similarly, when a syllable is found to be significantly different between NS and EFL but not between NS and ESL, it implies a difficulty unique to EFL speakers.

Figure 4.3 Distribution of significant differences between ESL and EFL in terms ofthe spread of difficulties to ESL and/or EFL

Among the 17 syllables that are found to be significantly different between ESL and EFL, nine suggest difficulties common to both ESL and EFL, five suggest difficulties unique to EFL speakers, one suggests difficulties unique to ESL speakers, and two indicate difficulties to neither ESL nor EFL speakers. These results suggest that whenever a significant difference in syllable duration is found between ESL and EFL, it is more likely for EFL than for ESL to be the group having difficulty. It also suggests that if ESL is having difficulty, EFL will very likely manifest this difficulty as well, but the reverse will not apply.

90 4.1.4 CORRELATION OF DURATION PATTERNS BETWEEN SPEAKER GROUPS

Pearson Product-moment correlation coefficients were obtained to estimate how closely syllable duration within the same sentence covaries between pairs of subject groups. Correlation tests were performed between NS and ESL, NS and EFL, and ESL and EFL respectively for all seven type A sentences. Complete results of the correlation tests are summarized in Table 4.7.

Table 4.7 Pearson Product-moment Correlation Coefficients for mean syllable duration between groups for Type A sentences

Sentence r NS andESL NS and EFL ESL and EFL Al 0.8767 0.6459 0.9268* A2 0.9233* .....:0:.;,;.6::.::9.,::8::..3 0.9181* A3 0.8614* A4 AS A6 A7 *P<.05, **p<.Ol

There are two major observations. First, the duration patterns of ESL and EFL speakers tend to covary closely. Correlation coefficients between ESL and EFL are found to be significant at p

The results lead to two suggestions. First, ESL and EFL speakers are quite similar with their duration patterns. Second, the duration of syllables tends to vary in a more similar fashion between NS and ESL speakers than between NS and EFL speakers.

91 4.1.5 TEST REUABILITY This section summarizes results of test-retest reliability between the three administrations of all seven Type A sentences. Three reliability indices were obtained for each sentence for each group. High test-retest reliability indicates speakers are consistent with their duration patterns.

Table 4.8 Test-retest reliability for syllable duration from three productions for Type A sentences

Sentence Administration Test-retest r NS Sl 00 Al 1 and 2 0.9870** ISland 3'" 0.9860** 0.9703** . 200 and 3rd 0.9993** ~0.9721** Sl nd A2 1 and 2 0.9506** 0.9899** 0.9674** ISland 3n1 0.9827** 0.9726** 0.9699** 200 and 3'" 0.9894** 0.9871** 0.8969** Sl nd A3 1 and 2 0.9952** 0.9872** 0.9613** ISland 3rd 0.9981** 0.9750** 0.9698** 2nd and 3n1 0.9974** 0.9880** 0.9778** Sl nd A4 1 and 2 0.9957** 0.9948** 0.9972** ISland 3rd 0.9962** 0.9881** 0.9875** 2nd and 3rd 0.9879** 0.9914** 0.9905** Sl nd AS 1 and 2 0.9927** 0.9975** 0.9831** ISland 3'" 0.9967** 0.9828** 0.9382** 2nd and 3n1 0.9971** 0.9831** 0.9576** Sl nd A6 1 and 2 0.9854** 0.9956** 0.9892** ISland 3'" 0.9906** 0.9703** 0.9854** 200 and 3rd 0.9914** 0.9561** 0.9415** Sl nd A7 1 and 2 0.9910** 0.9872** 0.9986** ISland 3n1 0.9978** 0.9927** 0.9944** 2nd and 3rd 0.9822** 0.9799** 0.9648** *p<.05, **p<.Ol

Results of the test-retest reliability check indicate that speakers of all groups are highly consistent with their duration patterns in all three productions of Type A sentences. All but one correlation coefficient are statistically significant at p

92 4.1.6 SUMMARY OF RESULTS FOR TYPE A SENTENCES

The absolute lengths of the syllables are on average shortest for NS, longest for EFL and in between for ESL. Despite differences in speech rates, NS, ESL and EFL speakers generally lengthen and shorten very similar syllables. The longest syllables of the sentences are usually the first strong syllables or the final syllables although weakly stressed syllables are sometimes lengthened. The longest syllables of the sentences are relatively longer and the shortest syllables are relatively shorter for NS than for ESL and EFL speakers. NS produce the widest ranges and the greatest standard deviations of duration percentages in all Type A sentences, as opposed to EFL speakers, who produce the narrowest ranges and the smallest standard deviations of duration percentages. Lengthening of the final syllables is common for speakers in all groups, but TM speakers do not produce as much lengthening as English speakers.

ESL and EFL show the same types of difficulties with duration as a correlate of stress. Significant differences between NS and ESL and between NS and EFL indicate that ESL and EFL sometimes produce relatively shorter strong syllables, relatively longer weak syllables in non-final positions, and relatively shorter final syllables than native English speakers. ESL speakers evidence less difficulty with duration than EFL speakers. A larger number of syllables are found to be significantly different between NS and EFL than between NS and ESL. Most of all, ESL speakers produce fewer instances of each type of the identified difficulties. Relatively longer weak syllables account for the majority of differences between TM and English speakers.

Relatively longer weak syllables produced by EFL speakers also account for the majority of differences between ESL and EFL speakers. All of the significant differences between ESL and EFL speakers indicate shorter strong syllables, longer weak syllables; and shorter final syllables on the part of EFL speakers. In addition, the majority differences between ESL and EFL indicate difficulties common to both ESL and EFL, or difficulties unique to EFL.

The duration patterns ofESL and EFL speakers are higWy correlated. Despite great similarities between these two groups of TM speakers, ESL speakers produce more 93 native-like duration patterns than EFL speakers. The duration of syllables covaries more closely between NS and ESL than between NS and EFL. Speakers of all groups are highly consistent with their duration patterns across three productions of Type A sentences. All but one test-retest reliability index between two administrations of the same sentence fall below significance at p

4.2 DURATION PATTERNS OF TYPE B SENTENCES

This section reports results from the duration patterns for the Type B sentences, which feature a highly regular rhythmic pattern of alternating strong and weak syllables. Each strong syllable is immediately preceded and/or followed by an weak syllable, and vice versa. All sentences were embedded in broad-focused contexts to encourage speakers to introduce all words that carry content information as new and strong. The alternating stress pattern is further reinforced with lexical category. Researchers have long divided words into two lexical categories: content words vs. function words. Content words are words that denote objects, properties, or actions and function words are words that serve purely grammatical functions. For Type B sentences, all strong syllables are content words, which are more likely to bear stress than the function words and morphemes, which make up the weak syllables.

Three different rhythmic patterns are represented in Type B sentences. Sentences B1 through B4 feature the six-syllable iambic rhythmic pattern wSwSwS, sentences B5 and B6 feature the seven-syllable trochaic rhythmic pattern SwSwSwS, and sentence B7 features the eight-syllable iambic rhythmic pattern wSwSwSwS. The total number of utterances analyzed is 630 and the total number of syllables analyzed is 4140.

4.2.1 ABSOLUTE DURATION OF SYLIABLES This section summarizes results based on the absolute durations of the individual syllables. A group average duration in milliseconds was obtained for each syllable of every test sentence. Overall observations based on the lengthening and shortening of

94 syllables across sentences and the basic statistics, including means, standard deviations, ranges of absolute durations, and speech rates, are reported.

Table 4.9 Mean syllable durations in ms for Type B sentences

Mean SD Range Syll S llable sec Bl I need it back b NS 108.00 151.67 132.00 243.00 110.00 ESL 142.33 218.00 189.00 289.67 138.67 EFL 162.00 307.33 313.00 338.67 196.00 B2 They play with dad and NS 108.67 231.00 118.33 147.00 ESL 124.33 301.00 154.33 207.67 EFL 186.00 338.00 228.33 294.67 B3 We learn to read and NS 127.33 237.00 118.33 173.33 119.33 ESL 135.00 324.67 167.00 287.00 187.67 EFL 199.00 269.33 304.00 242.33 B4 I met with John at NS 91.00 180.67 132.67 280.33 114.67 ESL 141.33 257.67 197.33 354.67 188.33 EFL 167.67 289.67 404.33 230.67 337.67 B5 Mom dad were mad at NS 247.33 202.67 110.33 245.67 81.00 ESL 300.67 264.33 168.67 316.67 147.00 EFL 294.67 286.00 292.67 389.00 204.33 B6 John is 000 at bake .ing NS 239.33 136.67 192.67 157.67 184.67 142.67 ESL 281.33 177.67 265.00 234.67 237.33 163.00 EFL 280.33 218.67 348.67 302.67 292.67 201.67 B7 I need a ride to work NS 92.67 157.00 91.67 274.67 111.67 240.67 ESL 117.33 208.00 117.33 312.00 150.67 285.00 EFL 152.67 283.67 205.00 215.67 314.33 1-7 NS ESL EFL

The syllables are generally shortest for NS, longest for EFL and in between for ESL. NS produce the shortest syllables across the board. EFL speakers produce the longest syllables, except for the final syllables in sentences B3, B4, B5, B6, and B7. The average syllable length for Type B sentences is 185.49 ms for NS, 243.45 ms for ESL, and 285.56 ms for EFL speakers. Speech rates, which are the flip side of the mean syllable lengths, average fastest for NS, slowest for EFL, and in between for ESL speakers. NS on average uttered 5.40 syllables per second, ESL speakers 4.13 syllables

95 per second, and EFL speakers 3.51 syllables per second. TM speakers' speech rates improve with advancement in English proficiency. The variations in absolute duration among syllables of a sentence tend to be greatest for ESL, smallest for EFL, and in between for NS. The average standard deviation of syllable lengths for all Type B sentences is 83.49 ms for NS, 96.64 ms for ESL, and 73.27 ms for EFL speakers. The ranges between the longest and the shortest syllables of a sentence are generally widest for ESL, narrowest for EFL, and in between for NS. The average range of syllable lengths for all Type B sentences is 214.33 ms for NS, 258.95 ms for ESL, and 204.17 ms for EFL speakers. EFL speakers produce syllables that are longest but least variable in length across groups.

96 (a) Mean syllable duration (ms) for sentences Bl (wSwSwS)

• NS --ESL .. 'll" EFL

(b) Mean syllable duration (ms) for sentence B2 (wSwSwS)

450

400

350

300 'iii" .s 250 • NS c 0 --ESL :;::l l! 200 .. 'll" EFL :J Q 150

100

50

0 They play with dad and laugh Syllable

97 (c) Mean syllable duration (ms) for sentence B3 (wSwSwS) 400 , 350 ... fA., 300 ...... 1.'- 250 'iii" .....E • NS ~ 200 --ESL ~ .. 'It. .. EFL a:::J 150

100

50

a We learn to read and write Syllable

(d) Mean syllable duration (ms) for sentence B4 (wSwSwS) (e) Mean syllable duration (ms) for sentence B5 (SwSwSwS)

500

450

400

350

'0' 300 ...... E • NS S 250 --ESL ~ ...... EFL g 200

150

100

50 a Mom and dad were mad at Jim Syllable

(f) Mean syllable duration (ms) for sentence B6 (SwSwSwS)

450

400

350 , .. ... if .. 300 '0' .§. 250 • NS c 0 --ESL :;::l l! 200 ...... EFL ::l Q 150

100

50

a John is good at bake -ing bread Syllable

99 (g) Mean syllable duration (ms) for sentence B7 (wSwSwSwS)

Figure 4.4 Mean syllable durations (ms) for sentences Bl through B7

Four observations can be made based on the results summarized in Table 4.9 and Figure 4.4. First, syllables are generally longest for EFL speakers, shortest for NS, and in-between for ESL speakers. Second, NS, ESL, and EFL speakers generally lengthen and shorten the same syllables. The lengthening and shortening of the syllables generally coincides with the alternation between the target strong and weak syllables. Syllables are usually lengthened when strong and shortened when weak except for the final syllable laugh of sentence B2, the syllable dad in sentence B5, and the syllable bake in sentence B6 in the speech of EFL speakers. Third, although EFL speakers lengthen and shorten syllables according to stress most of the time, they sometimes produce a succession of syllables of similar length. For example, the successive syllables need it back in Bl, the syllables mon and dad were in B5, and the syllables at bake in sentence B6 are all of similar length. Fourth, the longest syllables of the sentences are generally the final syllables for NS and ESL. The final syllables are the longest syllables in 6 sentences for

100 NS and ESL, but in only three sentences for EFL speakers. For EFL speakers, the longest syllable can be either the final syllable or the first or second strong syllable of the sentences.

4.2.2 RELATNE DURATION PAITERNS

This section reports results based on the relative durations of the syllables of a sentence. The length of each syllable was converted into a percentage of the lengths of the entire sentence. An average group duration percentage was obtained for each syllable of each sentence. The basic statistics, including means, standard deviations, and ranges, are summarized in Table 4.10.

101 Table 4.10 Mean syllable duration as its percentage of the total sentence duration for Type B sentences

GrOll Bl I need it NS 10.11 14.19 12.32 ESL 10.43 15.87 13.67 EFL 9.47 18.12 18.31 B2 They play with NS 9.66 20.34 10.40 ESL 8.69 20.99 10.70 EFL 10.96 20.14 13.42 B3 We learn to read and NS 12.18 22.71 11.33 16.60 11.47 Ii--=E=S=L=----r:-9~.2..;;..1_ 21.90 11.30 19.36 12.70 EFL 11.80 15.94 18.32 14.55 B4 I met with John at NS 7.98 15.77 11.62 24.58 10.02 Ij-:E=.;S::.::L=---+-=-9..:..:.1..;;..3_+-=16:..:..;.7~8--+-=1=2.:..=.80=-- 23.10 12.07 EFL 9.42 19.24 15.99 13.05 19.26 B5 Mom and dad were mad at NS 17.78 8.13 14.50 7.85 17.62 5.83 ESL 16.10 11.07 14.05 8.99 16.94 7.88 EFL 13.54 13.42 13.10 13.41 17.77 9.35 B6 John is 000 at bake -ing NS 17.60 10.05 14.18 11.57 13.65 10.62 ESL 15.69 9.97 14.62 13.25 13.37 9.24 EFL 13.76 10.79 17.38 15.09 14.40 9.84 B7 I need a ride to work NS 6.76 11.40 6.60 19.84 8.04 17.32 ESL 6.96 12.42 6.97 18.39 8.86 16.80 EFL 7.31 13.52 9.76 10.29 15.14 Bl-7 NS ESL EFL

The variations in duration (%) among syllables of a sentence are on average greatest for NS, smallest for EFL, and in between for ESL for Type B sentences. The average standard deviation of duration (%) for all Type B sentences is 6.90% for NS, 6.14% for ESL, and 4.03% for EFL speakers. The ranges between the longest and the shortest syllables of a sentence are on average widest for ESL, narrowest for EFL, and in between for NS. The average range of duration (%) for all Type B sentences is 17.60% 102 for NS, 16.34% for ESL, and 11.21% for EFL speakers. The longest syllables of the sentences are relatively longer for NS than for ESL and EFL. The relative duration of the longest syllables of all Type B sentences averages 26.40% for NS, 25.10% for ESL speakers, and 20.90% for EFL speakers. EFL speakers produce syllables that are longest but least variable in length across groups.

Figure 4.5 shows the duration patterns of Type B sentences in terms of the percentages of the lengths ofthe syllables to the lengths of the entire sentences.

103 (a) Mean syllable duration (%) for sentence Bl (wSwSwS) 35,....------,

30 +------JII'----1

25 +------#1----1

~ 20 +------J~~,__---____.1I~'------1 r~;::::::NSl -fA. .. .. NS c ...... 1-. o --ESL ~ 5 15 +------;r~~ ...... o;;;;:_-_7Jr_-----'~-___j------1 .. 'li." EFL

10 +------1I111F------_I------I

5-1------1

0-1----,..----.....------,,..------.-----.----4 need it back by noon Syllable

(b) Mean syllable duration (%) for sentence B2 (wSwSwS)

30 ,..---~------__,

25 -1------.1-+------1

20 ...... ~ e... NS c • 0 15 ~ ~ --ESL ::J .. 'li." EFL Q

10

5+------1

0-1----.,..----.....----,..------.-----.----4 They play with dad and laugh Syllable

104 (c) Mean syllable duration (%) for sentence B3 (wSwSwS)

30....------....,

25 +------...... --1

20 ...... ~ e... NS c • 0 15 --ESL ~ .. 'll" EFL c::l

10

5-1------1

O+----...,.-----,----..,.-----,...-----r-----I We learn to read and write Syllable

(d) Mean syllable duration (%) for sentence B4 (wSwSwS) 35....------.,

30 -1------__--1

25 +------__------1-..,------1

it...... 20 +------.------,;r:;r------'tc------E''------.----I '~~ml c ~ 1-. NS o f' --ESL ~ 5 15 -I-----I-~~~""_:_-=-_I_------'~-__#__JL-----I - 'll" EFL

10 +-----,""'#------1

5-1------1

0+----...,.-----,----..,----..,.-----,...----1 met with John at noon Syllable

105 (e) Mean syllable duration (%) for sentence B5 (SwSwSwS) 30...------,

25 -1------11111--1

20 ...... :.l! e.... NS c • 0 15 _-ESL ~ . ... EFL a:J 10

5+------=------1

0+---....,...---.,...---....,...---.,.---....,...---.,.----1 Mom and dad were mad at Jim Syllable

(f) Mean syllable duration (%) for sentence B6 (SwSwSwS) 30 _.._------..

25 +------1

20 ...... :.l! ~ e...... NS c , .. • 15 --ESL I i I . ... EFL a:J 10

5+------1

O+---..,....-----.,----r----.,...---.....---..,----! John is good at bake -ing bread Syllable

106 (g) Mean syllable duration (%) for sentence B7 (wSwSwSwS)

25 -.,..------,

20 +------_...------_IIIIIIIt--t

'0' 15 +------I----"'f------I-A-'l~--_I_____:&--I r:::e:::::r;ffil ~ ~ c . o _-ESL EFL :e::I ...... c 10 +------..-I------'~___"\1tr_I_------'.....~---4

5+------1

O+-----r---.,...--...,.---"..----,.----r---...,.-----! need a ride to work at once Syllable

Figure 4.5 Mean syllable durations (%) for sentences B1 through B7

Five observations can be made based on the results summarized in Table4.l0 and Figure 4.5. First, NS and ESL produce highly similar duration patterns, with strong syllables lengthened and weak syllables shortened. Not only do they lengthen and shorten the same syllables, but they do so with very similar duration percentages. Second, the longest syllables of the sentences are all final syllables for NS and ESL speakers except for sentence B2, where the second strong syllable dad is longest for all groups. For EFL speakers, the longest syllables are final syllables in three sentences B1, B5, and B6, and the first or the second strong syllables in other sentences. Third, the non-final strong syllables of EFL and ESL speakers are sometimes longer and sometimes shorter than those of NS, but their final strong syllables are always much shorter than those of NS speakers. This suggests that they might have difficulty with final lengthening. Fourth, EFL speakers generally produce relatively longer weak syllables than NS. 19 of the 22 weak syllables for Type B sentences are relatively longer for EFL speakers than for NS.

107 All of these weak syllables are monosyllabic function words. Fifth, the relative duration of the syllables varies within the narrowest ranges for EFL speakers. The relatively narrower ranges are largely due to the substantially less lengthening of their final syllables, as well as their relatively longer weak syllables.

4.2.3 SIGNIFICANTDIFFERENCES BETWEENNS, ESLAND EFL

T-tests were performed on individual syllables to determine whether or not an obtained difference in relative duration between pairs of speaker groups was statistically significant or likely by chance. Results of the t-test scores for the duration percentages of individual syllables between groups are listed in Table 4.11.

108 Table 4.11 Student's t-test scores for durations (%) of individual syllables between groups for Type B sentences

Group Bl I NS-ESL 0.531 NS-EFL 0.861 ESL-EFL 1.610 B2 They NS-ESL 1.495 NS-EFL 1.473 ESL-EFL 2.476 B3 NS-ESL NS-EFL ESL-EFL B4 I NS-ESL 1.629 NS-EFL 1.589 ESL-EFL 0.354 B5 Mom NS-ESL 1.687 NS-EFL ESL-EFL B6 NS-ESL NS-EFL ESL-EFL B7 NS-ESL NS-EFL ESL-EFL 0.561 1.5 ~0;.,;;.3;,;;5~4=~~~===!:.,;;;;;;;;.;".=~,;,;..;,~= *p<.05 (tcril=2.10l), **p<.Ol(tcril=2.878), df=18, two-tailed

In the following two subsections, we will examine which syllables are significantly different between groups in terms of duration percentages. Section 4.2.3.1 reports and compares differences between NS and ESL and differences between NS and EFL. Results provide crucial information as to the kinds of difficulties TM speakers have with duration as a correlate of stress,as well as the ways in which ESL and EFL speakers are similar or different with respect to their problems with duration. Section 4.2.3.2 examines significant differences between ESL and EFL speakers. Results in this section will

109 provide information about the changes that might have taken place between EFL and ESL speakers in the way duration correlates with stress, if there are any changes.

4.2.3.1 Differences between NS and ESL vs. differences between NS and EFL A much smaller number of syllables are significantly different between NS and ESL than between NS and EFL. Of the 46 syllables compared for Type B sentences, nine were significantly different between NS and ESL speakers and 30 between NS and EFL speakers.

Table 4.12 Number of strong and weak syllables with durations (%) significantly different from NS in non-final vs. final positions for Type B sentences

Position Non-Final Final Stress Strong Weak Strong Weak Total (k=17) (k=22) (k=7) (k=O) differ Ion er shorter Ion er shorter Ion er Shorter Lon er shorter 1 2 9 IrE=FL===---+--4"-- 30 *Shaded cells dilute the contrast, unshaded ones do not.

Nonetheless, not all of the differences between TM and English speakers are disruptive of the speech rhythm of TM speakers. Relatively shorter strong syllables and relativelylonger weak syllables produced by TM speakers are considered differences that weaken the contrast between strong and weak syllables. Relatively longer strong syllables and relatively shorter weak syllables may not be damaging as far as speech rhythm is concerned because of the greater contrast between strong and weak syllables.

With this distinction in mind, we found that six of the nine significant differences between NS and ESL constitute differences that dilute the contrast between strong and weak syllables. They include four relatively longer weak syllables in non-final position in sentences B1 (it), B5 (and, at), and B6 (at), and two relatively shorter strong syllables in final position in sentence B4 (noon) and B5 (Jim).

Of the 30 syllables that were significantly different between NS and EFL speakers, 26 constitute differences that weaken the contrast between strong and weak syllables.

110 They include five relatively shorter strong syllables in non-final position in sentences B1 (back), B3 (learn), B5 (Mom), B6 (John), and B7 (work), 14 relatively longer weak syllables in non-final position in sentences B1 (it), B2 (with, and), B3 (to, and), B4 (with, at), B5 (and, were, at), B6 (at), and B7(a, to, at), and seven relatively shorter strong syllables in final position in sentences B1 (noon), B2 (laugh), B3 (write), B4 (noon), B5 (Jim), B6 (bread), and B7(once).

ESL and EFL share similar types of difficulties with duration, but to different degrees. Both ESL and EFL speakers produce (1) relatively longer weak syllables in non­ final positions, and (2) relatively shorter strong syllables in final positions. In particular, EFL speakers produce a very high rate of relatively longer weak syllables in non-final positions and relatively shorter final strong syllables. ESL speakers produce fewer instances of each than EFL speakers. Of the 22 weak syllables in non-final positions, 14 are relatively longer than those of NS for EFL speakers, as opposed to two for ESL speakers. And all of the seven final syllables produced by EFL speakers are relatively shorter than those of NS, but only two produced by ESL speakers are relatively shorter than those of NS. Besides having greater difficulties than ESL speakers within these two problem types, EFL speakers sometimes produce relatively shorter strong syllables in non-final position, which we do not see among ESL speakers.

The results show that ESL speakers are able to produce native-like duration patterns except for a small number of relatively longer non-final weak syllables and relatively shorter final strong syllables. The results also suggest that less proficient EFL learners haver difficulty shortening weak syllables and lengthening final syllables, but they can improve with increased proficiency and exposure to the target language.

4.2.3.2 Significant differences between ESL and EFL This section reports significant differences in relative duration between ESL and EFL. Syllables that are found to be significantly different between these two groups are further examined under four categories: strong syllables in non-final position, weak syllables in non-final position, strong syllables in final position, and weak syllables in final position.

111 The purpose is to identify changes that might have taken place due to improved proficiency and increased exposure to English.

Table 4.13 Number of strong and weak syllables with duration (%) significantly different between EFL and ESL in non-final and final positions for Type B sentences

Position Non-Final Final Total Stress Weak (k=O) T Lon er shorter EFLvs.ESL 21 ESUNSonly 1 EFUNSonly 3 1 5 16 EFUESUNS 2 4 ..ESUEFL only o

Of the 46 syllables compared, the relative duration of 21 syllables was significantly different between ESL and EFL speakers. EFL speakers produced three significantly longer non-final strong syllables, one significantly shorter non-final strong syllable, 10 significantly longer non-final weak syllables, and seven significantly shorter final strong syllables than ESL speakers. EFL speakers differ from ESL speakers in similar ways as EFL speakers differ from NS. They produced relatively shorter non-final strong syllables, relatively longer non-final weak syllables, and relatively shorter final syllables than NS and ESL speakers. EFL speakers' difficulties with shortening weak syllables in non-final position and lengthening final syllables contribute in large part to the significant differences between ESL and EFL speakers.

112 I1Ii1 Difficulties unique to ESL ,5%

Figure 4.6 Distribution of significant differences in syllable duration between ESL and EFL speakers for Type B sentences

All of the syllables that are significantly different between ESL and EFL speakers are syllables that are difficult to one or both groups. The majority ofthe differences imply difficulties on the part of EFL speakers only. Of the 21 syllables that are significantly different between ESL and EFL, 16 involve difficulties unique to EFL speakers, four difficulties common to both ESL and EFL speakers, and only one involves difficulties unique to ESL speakers. The results suggest that whenever a significant difference in syllable duration is found between ESL and EFL, it is more likely for EFL than for ESL speakers to be the group having difficulty. It also suggests that if ESL is having difficulty, EFL speakers likely have the same difficulties. The results are also consistent with our earlier finding about positive improvement from EFL to ESL on the shortening of weak syllables and the lengthening of final syllables.

113 4.2.4 CORREIATION OFDURATION PATTERNS BETWEEN SPEAKER GROUPS

Pearson Product-moment correlation coefficients were obtained to estimate how closely duration covaries across syllables of the same sentence between pairs of subject groups. Correlation tests were performed between NS and ESL, NS and EFL, and ESL and EFL respectively for all seven type B sentences. Complete results of the correlation tests are summarizedin Table 4.14.

Table 4.14 Pearson Product-moment Correlation Coefficients for mean syllable duration between pairs of groups for Type B sentences

Sentence NSandESL B1 0.9916** B2 0.9843** B3 0.9553** B4 0.9947** B5 0.9927** B6 0.9614** B7 0.9933** *p<.05, **p<.OI

The duration of syllables covaries more closely between NS and ESL speakers than between NS and EFL speakers. The correlation coefficients between NS and ESL speakers are higher than those between NS and EFL speakers across Type B sentences. The number of significant correlations is also greater between NS and ESL than between NS and EFL speakers. Significant correlations between NS and ESL are established for all sentences, but cannot be established at p

4.2.5 TEST RELIABIliTY

This section summarizes results of test-retest reliability between the three sets of administrations of all seven Type B sentences. Three reliability indices were obtained for

114 each sentence of each group. High test-retest reliability indicates that speakers are consistent with their duration patterns.

Table 4.15 Test-retest reliability for syllable duration from three productions Of Type B sentences

Sentence Administration Test-retest r NS ESL EFL B1 lSI and 200 0.9911** 0.9936** 0.9861** ISland 3rd 0.9971** 0.9944** 0.9588** 200 and 3rd 0.9974** 0.9882** 0.9649** B2 lSI and 2nd 0.9887** 0.9957** 0.9785** ISland 3rd 0.9882** 0.9764** 0.9779** 2nd and 3rd 0.9970** 0.9813** 0.9850** B3 lSI and 2nd 0.9927** 0.9924** 0.9482** ISland 3rd 0.9971** 0.9864** 0.8992** 2nd and31ll 0.9936** 0.9829** 0.9027** B4 lSI and21ll1 0.9971** 0.9847** 0.9796** ISland 3rd 0.9959** 0.9917** 0.9856** 2nd and 3rd 0.9955** 0.9961** 0.9970** B5 lSI and 2nd 0.9964** 0.9967** 0.9689** ISland 31ll 0.9980** 0.9926** 0.9476** 2nd and 3rd 0.9991** 0.9905** 0.9846** B6 lSI and 21111 0.9634** 0.9705** 0.9849** ISland 3rd 0.9794** 0.9399** 0.9733** 2nd and 3rd 0.9915** 0.9837** 0.9367** B7 lSI and 200 0.9820** 0.9920** 0.9383** ISland 31ll 0.9914** 0.9884** 0.9763** 21111 and31ll 0.9974** 0.9977** 0.9611** *p<.05, **p<.Ol

Results of the test-retest reliability tests indicate that speakers of all groups are highly consistent with their duration patterns across the three productions of Type B sentences. All of the obtained correlation coefficients are statistically significant at p

4.2.6 SUMMARY OFDURATIONRESULTS FOR TYPE B SENTENCES

The absolute lengths of the syllables are on average shortest for NS, longest for EFL and in between for ESL speakers. NS, ESL and EFL speakers generally lengthen and shorten the same syllables. The lengthening and shortening of the syllables coincide closely with

115 the alternation between strong and weak syllables, especially for NS and ESL speakers. EFL speakers sometimes produce several syllables of similar lengths in a row. Not only do NS and ESL lengthen and shorter similar syllables, but they do so with similar relative duration. The longest syllables of the sentences are usually the final ones. Lengthening of final syllables is common for speakers in all groups, but ESL and EFL speakers produce relatively less final lengthening than NS. EFL speakers produce the narrowest ranges and the smallest standard deviations of duration percentages for all Type B sentences. The greater ranges of duration percentages for NS and ESL speakers are largely due to the substantial lengthening of their final syllables, and their relatively short weak syllables.

ESL speakers manifest fewer types of difficulties than EFL speakers with duration as a correlate of stress. Both ESL and EFL speakers produce relatively longer non-final weak syllables and relatively shorter final strong syllables than NS, but ESL speakers do not produce relatively shorter strong syllables for Type B sentences. ESL speakers also produce fewer instances' of each difficulty. EFL speakers produce a rather high rate (64%) of relatively longer weak syllables in non-final position and a very high rate (100%) of relatively shorter final syllables, as opposed to 18% and 29% for ESL speakers.

EFL speakers differ from ESL speakers in ways similar to how they differ from NS, in that they tend to produce longer weak non-final syllables and shorter final syllables. The majority of the differences between ESL and EFL speakers involve EFL speakers' difficulties with shortening weak syllables in non-final position and lengthening final syllables. The results suggest an improvement from EFL to ESL with respect to the shortening of non-final weak syllables and the lengthening of final syllables.

The duration patterns of NS and ESL speakers are highly correlated The duration of syllables tends to covary more closely between NS and ESL than between NS and EFL speakers. Speakers of all groups are highly consistent with their duration patterns across the three productions of Type B sentences. All of the obtained test-retest reliability indices are significant at p

4.3.1 DO TM SPEAKERS HAVE DIFFICULTYLENGTHENING STRONG SYLlABLES, SHORTENING WEAK SYLLABLES, OR BOTH?

To answer this question, we first look for significant differences between NS and ESL and between NS and EFL to see how these differences distribute over strong and weak syllables. Then we focus on the differences that weaken the durational contrast between strong and weak syllables, namely, shorter strong syllables and longer weak syllables. Longer strong syllables and shorter weak syllables produced by TM speakers are not considered to be difficult because they exaggerate rather than weaken the target rhythmic pattern by broadening the contrast between strong and weak syllables.

Table 4.16 Number of strong and weak syllables with duration (%) significantly different from NS in non-final vs. final positions for Type A and Type B sentences

Position Non-Final Stress

B

Both ESL and EFL show difficulty with shortening weak syllables in non-final position. EFL speakers produce a substantial number of relatively longer weak syllables in both Type A (15 out of 31 or 48%) and Type B sentences (14 out of 22 or 64%). ESL speakers show greater difficulty with shortening weak syllables for Type A sentences (12 out of31 or 39%) than for Type B sentences (4 out of 22 or 18%).

ESL speakers show only slight difficulty with the lengthening of strong syllables in non-final position for Type A sentences (2/8) and show no ~ign of such difficulty for Type B sentences. 25% of their non-final strong syllables for Type A sentences are . relatively shorter than those of English speakers. EFL speakers show greater difficulty

117 with lengthening strong syllables than ESL speakers for both Type A and Type B sentences. But the difficulty is greater for Type A (6/8) than for Type B sentences (5/7). Their non-final strong syllables are relatively shorter than those of English speakers 75% of the time for Type A sentences but only 29% of the time for Type B sentences.

Because Type A sentences consist of two types of weak syllables (those that have some stress and those that have no stress) and Type B sentences consist of only weak syllables that are totally unstressed, I wondered how much the weakly stressed syllables in Type A sentences contribute to the relatively greater difficulties that TM speakers have with the duration of syllables for Type A than for Type B sentences. Table 4.17 shows the number of strong (stressed and accented), weakly stressed (stressed and unaccented), and unstressed syllables with duration (%) significantly different from those of NS for Type A sentences.

Table 4.17 Number of strong, weakly stressed, and unstressed syllables with duration (%) significantly different from NS in non-final vs. final position for Type A sentences

Position Final Stress Strong Strong Weak Stressed Unstressed

As can be seen in Table 4.17, TM speakers had difficulties with shortening both weakly stressed and unstressed syllables for Type A sentences. In particular, both ESL and EFL speakers had greater difficulties shortening unstressed (2/12 for ESL vs. 4/12 for EFL) than weakly stressed syllables (10/19 for ESL vs. 11/19 for EFL). This suggests that TM speakers' difficulties with shortening weakly stressed syllables is only one of the reasons why Type A sentences are more difficult for them than Type B sentences. The fact that TM speakers have greater difficulties reducing unstressed syllables for Type A than for Type B sentences suggest that the prosodic differences between these two types of sentences (Le. Type A sentences consist of stretches of relatively weaker syllables) may have contributed this. 118 The distinction between the difficulties with strong and weak syllables is somewhat neutralized in final position for ESL and EFL speakers, in that their final syllables tend to be shorter than those of English speakers regardless of stress. Given that ESL and EFL speakers tend to produce longer weak syllables in non-final position but consistently produce shorter weak syllables in final position than English speakers, the relative shorter duration of their final syllables are likely due to their difficulty with final lengthening. More likely, TM speakers seem to have difficulty with final lengthening on top of their difficulties with lengthening strong syllables and/or shortening weak syllables. The combined difficulties with lengthening strong syllables and lengthening final syllables may explain why lengthening final strong syllables seems quite difficult for TM speakers. EFL speakers, who have difficulty with lengthening strong syllables, in general, produce relatively shorter final strong syllables 100% of the time for both Type A and Type B sentences. ESL speakers, who show only slight difficulty with lengthening strong syllables for Type A sentences, produce relatively shorter strong final syllables 100% of time (212) for Type A sentences and 29% of the time (217) for Type B sentences. While TM speakers' tendency to produce relatively shorter stressed syllables make their difficulty with lengthening final strong appear worse, their tendency to produce relatively longer weak syllables seems to complement their difficulty with lengthening final weak syllables. This may explain why TM speakers generally have less difficulty lengthening weak syllables than strong syllables in final position. For Type A sentences, ESL speakers produce relatively shorter weak final syllables 20% of the time (1/5) and relatively shorter strong final syllables 100% (212) of the time, whereas EFL speakers produce relatively shorter weak final syllables 40% (2/5) of the time and relatively shorter strong final syllables 100% of the time (212).

119 Table 4.18 Group average duration (%) of strong and weak syllables in non-final and final positions for Type A and Type B sentences

Position Non-Final Final Stress Stron Weak Stron Weak GrOll NS ESL EFL NS ESL EFL NS ESL EFL NS ESL EFL A Al 27.66 25.68 24.43 16.72 17.78 18.63 22.20 20.98 19.67 A2 28.44 25.41 20.97 16.45 16.49 18.29 22.21 25.11 24.15 A3 12.84 11.71 10.62 11.05 12.51 13.52 31.93 25.73 21.76 A4 14.16 11.62 12.07 11.34 12.19 13.56 29.13 27.43 20.15 AS 20.03 20.00 19.64 11.18 11.77 12.59 24.05 21.13 17.43 A6 16.23 17.99 16.62 12.99 12.29 13.09 18.82 20.57 17.91 A7 20.74 19.37 17.06 8.06 8.79 10.25 18.21 17.29 14.63

B Bl 22.44 21.90 20.30 10.90 11.43 13.04 30.42 28.73 23.02 B2 22.31 22.05 19.42 11.02 11.28 13.91 20.37 18.54 14.37 B3 21.67 22.26 19.23 11.66 11.07 14.10 25.71 25.53 19.53 B4 23.46 22.00 20.51 9.87 11.33 12.82 30.03 26.12 19.26 B5 19.55 18.02 15.96 7.27 9.31 12.06 28.29 24.98 19.41 B6 16.94 16.89 16.07 10.75 10.82 11.91 22.34 23.87 18.74 B7 17.58 16.87 15.82 7.42 8.14 9.44 21.76 19.85 14.8

Results from the average duration (%) of strong and weak syllables in non-final and final positions are consistent with five of our earlier findings. First, as compared with NS, both ESL and EFL speakers produce relatively longer weak syllables in non-final position for both Type A and Type B sentences. The relative durations of their non-final weak syllables are consistently longer than those of English speakers across the sentences. Second, ESL and EFL produce relatively shorter strong syllables in non-final positions for Type A and Type B sentences, but the differences are quite small between NS and ESL for Type B sentences. The relative durations of their non-final strong syllables are consistently shorter than those of English speakers. Third, ESL and EFL speakers generally produce relatively shorter final syllables than NS. The relative duration of their final strong and weak syllables are consistently shorter than those of English speakers across the sentences. Fourth, final strong syllables are more difficult for TM speakers than final weak syllables. The differences between TM and English speakers are greater in the relative duration of their final strong syllables than in the 120 relative duration of their final weak syllables. Fifth, EFL speakers have greater difficulties with the lengthening of strong syllables, the shortening of weak syllables, and the lengthening offinal syllables than ESL speakers.

Table 4.19 shows the average duration (%) of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences.

Table 4.19 Group average duration (%) of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A sentences

Position Non-fInal Final Stress Strong Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) NS 20.11 15.62 9.75 30.53 21.13 21.08 ESL 18.89 15.81 10.57 26.58 19.21 22.22 EFL 17.31 16.36 12.08 20.96 16.03 20.58

The results are consistent with the earlier findings that for Type A sentences, TM speakers have difficulty lengthening strong syllables and weakening weak syllables in non-final position, including those that are weakly stressed and those that are unstressed. In final position, TM speakers have difficulty lengthening all three types of syllables.

In sum, TM speakers have difficulties shortening weak syllables and lengthening strong syllables. ESL speakers show only slight trouble with the lengthening of strong syllables. Overall, shortening weak syllables appears to be more difficult for TM speakers than lengthening strong syllables. In particular, shortening unstressed syllables is more difficult than shortening weakly stressed syllables for ESL and EFL speakers for Type A sentences. Both ESL and EFL speakers have trouble lengthening final syllables. Their difficulties with shortening weak syllables and lengthening strong syllables appear to interact with their difficulties with final lengthening.

4.3.2 DO TM SPEAKERS USE DURATIONASA CORRElATE OF STRESS? Although all the test sentences are framed within a context to induce certain stress patterns, there is no guarantee that speakers will produce duration patterns that exactly 121 match the expected stress patterns. Without assuming that TM speakers intend to highlight the same syllables as English speakers do, in this section, we examine the extent to which syllable duration varies with the target stress patterns in Type A and Type B sentences. As a first step, I trace the relative lengths of syllables across each sentence and summarize the results in Table 4.20, assuming that a syllable that is relatively longer than its preceding syllables is more stressed and a syllable that is relative shorter than its preceding syllable is less stressed. The duration of a syllable is considered "+LRPS" (Length Relative to Preceding Syllable) when (1) it is longer in ms than its preceding syllable or (2) it is longer than its immediately following syllable if the syllable in question is the initial syllable of the sentence. When the duration of a syllable is the same as its preceding syllable, its duration is considered "+LRPS" if the duration of the preceding syllable is "+LRPS". If the preceding syllable is the initial syllable of the sentence, its duration is considered "+LRPS" if it is longer than the next syllable with a different duration. The durations of two syllables are arbitrarily treated as equal when their difference in length is equal to or lesser than 20 milliseconds. 10 syllables from Type A sentences and four syllables from Type B sentences were treated as equal to their adjacent syllables as a result of this criterion.

Table 4.20 Number of strong and weak syllables classified as "+LRPS" or "-LRPS" in non-final and final positions for Type A and Type B sentences

Position Stress Stron Duration + A NS 23 2 23 ESL 21 2 21 EFL 18 2 18 B NS 21 7 21 ESL 22 7 22 EFL 19 6 19

122 25 -r------,

20 +------

:I ~ 15 +------~ BNS '0 IIESL I!JEFL !E 10 +------­ :J Z

5

o Type A Strong +LRPS Type A Weak-LRPS Type B Strong +LRPS Type B Weak-LRPS Non-final

Figure 4.7 Number of strong and weak non-final syllables classified as "+LRPS" or "­ LRPS" for Type A and Type B sentences

123 7 ...... ------

6+------

5 -1------Sl :a ::! 4 +------­ ~ IINS '0 IIESL .8 3 +------­ IJEFL E :J Z 2

o Type A Strong +LRPS Type A Weak +LRPS Type B Strong +LRPS Final

Figure 4.8 Number of strong and weak final syllables classified as "+LRPS" for Type A and Type B sentences

As can be seen in Figure 4.7 and Figure 4.8, TM and English speakers generally produce very similar duration patterns, in that both TM and English speakers show a tendency to produce strong syllables that are +LRPS in non-final position, weak syllables that are -LRPS in non-final position, and final syllables that are +LRPS. TM and English speakers are very much alike except for weak syllables in non-final position. EFL speakers, in particular, produce +LRPS weak syllables somewhat more frequently than English speakers. Overall, the results suggest that TM speakers are capable of using duration as a correlate of stress. However, TM speakers, especially EFL speakers, occasionally show difficulties with the placement of stress or with the degree to which syllables are lengthened or shortened.

For all subject groups, the correlation between duration and stress is stronger for Type B than for Type A sentences. Specifically, weak syllables in non-final position are more consistently -LRPS for Type B sentences than fur Type A sentences. Given that 124 syllable duration is higWy subject to segmental variations across syllables, it is expected that we will not always see a perfect correlation between duration and stress in real speech. The fact that the unstressed syllables in the Type B sentences are made of inherently shorter function words or morphemes may explain why they are consistently shorter than their adjacent strong content words. Here "content word" is used for words that denote objects, properties, or actions, including Nouns, Verbs, Adjectives, and Adverbs, while "function word" is used for words that serve purely grammatical functions, including Prepositions, Articles, Conjunctions, Pronouns, and Auxiliary Verbs.

However, we still need some explanations why TM speakers are somewhat more likely to produce -LRPS strong syllables and +LRPS weak syllables than English speakers and why shortening weak syllables seems slightly more difficult for TM speakers for Type A than for Type B sentences. Could the fact that all the weak syllables are totally unstressed for Type B but not for Type A sentences contribute to the greater difficulty TM speakers have for Type A sentences than for Type B sentences? To answer this question, I went one step further and examined the relationships between duration and stress levels. The results are summarized in Table 4.21.

Table 4.21 Number of "+LRPS" vs "-LRPS" strong, weakly stressed, vs. unstressed syllables in non-final vs. final position for Type A sentences

Position Non-fmal Final Stress Strong Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) Type +LRPS -LRPS +LRPS -LRPS +LRPS -LRPS +LRPS -LRPS +LRPS -LRPS +LRPS -LRPS NS 7 1 5 7 3 16 2 2 3 ESL 6 2 6 6 3 16 2 2 3 EFL 6 2 8 4 5 14 2 2 3

Two observations can be made based on the results in Table 4.21. First, TM speakers, particularly EFL speakers, are more likely to produce +LRPS weak syllables, including weakly stressed syllables and unstressed syllables. Second, speakers of all groups produce +LRPS final syllables, regardless of stress level.

125 Table 4.22 Average duration (%) of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A and Type B sentences

Position Non-fmal Final Stress Strong Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) NS 20.11 15.62 9.75 30.53 21.13 21.08 A ESL 10.57 26.58 19.21 22.22 EFL 12.08 20.96 16.03 20.58 NS ...... 18.06 - 9.73 25.56 - - B ESL 17.89 - 10.38 23.95 -- EFL 17.62 - 12.33 18.45 - -

As can be seen in Table 4.22, similar to NS, ESL speakers are capable of using duration to distinguish at least three stress levels (strong, weakly stressed, and unstressed), although the duration differentiation between these three types of syllables is not as distinct as it is for NS. Instead of making a three-way distinction (strong, weakly stressed, and unstressed), EFL speakers, seem to make a binary distinction (stressed vs. unstressed). They do not produce a clear length distinction between strong syllables and weakly stressed syllables. This suggests that EFL speakers may not distinguish as many levels of stress as English speakers do.

4.3.3 DO TM SPEAKERS PRODUCE SMALLER DURATION CONTRASTS BETWEEN STRONG AND WEAK SYLLABLES THAN ENGUSH SPEAKERS? Although TM speakers may differ from English speakers in terms of the relative duration of the strong and weak syllables, those differences may not be damaging to the speech rhythm if TM speakers maintain native-like duration contrasts between strong and weak syllables. In this section, we focus on the examination of duration contrasts between strong and weak syllables in non-final and final positions for Type A sentences and in non-final positions only for Type B sentences, where all final syllables are strong. The duration contrasts between strong and weak syllables in non-final and final positions will be discussed separately because according to our earlier findings they seem to behave differently.

I focus first on the duration contrasts between the target strong and weak syllables. 126 Table 4.23 Duration contrasts (%) between strong and weak syllables in non-final position for Type A sentences

Position Non-Final Grall NS ESL EFL Stress S w S w S w Al 27.66 16.72 25.68 17.78 24.43 18.63 A2 28.44 16.45 25.41 16.49 20.97 18.29 A3 12.84 11.05 11.71 12.51 10.62 13.52 A4 14.16 11.34 11.62 12.19 12.07 13.56 AS 20.03 11.18 20.00 11.77 19.64 12.59 A6 16.23 12.99 17.99 12.29 16.62 13.09 A7 20.74 8.06 19.37 8.79 17.06 10.25 Mean 20.01 12.54 18.83 13.12 17.34 14.28 NS-NNS 1.17 -0.58 2.67 -1.73

25..,...------,

20 +------tIk------1

l15 • NS c o i _-ESL ...... EFL :l Q 10 +------1

5+------1

0+------r'------4 Strong Non-final Weak Non-final Type A Sentences

Figure 4.9 Group average duration (%) of strong vs. weak syllables in non-final position for Type A sentences

TM speakers appear to produce smaller duration contrasts between strong and weak non-final syllables than English speakers for Type A sentences. The smaller contrasts are attributed to relatively shorter strong syllables and relatively longer weaker syllables in the speech of TM speakers. However, the relative shorter strong syllables on average 127 account for a larger share of the smaller contrasts than the relatively longer weak syllables. For ESL speakers, their non-final strong syllables are on average shorter than those of NS by 1.17%, whereas their non-final weak syllables are on average longer than those of NS by 0.58%. For EFL speakers, their non-final strong syllables are on average shorter than those of NS by 2.67%, whereas their non-final weak syllables are on average longer than those of NS by 1.73%. Compared with ESL speakers, EFL speakers produce slightly smaller duration contrasts between strong and weak syllables in non-final positions for Type A sentences. The average difference in duration (%) between strong and weak syllables in non-final position for Type A sentences is 7.47% for NS, 5.71% for ESL, and 3.07% for EFL speakers.

Table 4.24 Duration contrasts (%) between strong and weak syllables in final position for Type A sentences

Position Final Gron NS ESL EFL Stress S W S U S U Al 22.20 20.98 19.67 A2 22.21 25.11 24.15 A3 31.93 25.73 21.76 A4 29.13 27.43 20.15 AS 24.05 21.13 17.43 A6 18.82 20.57 17.91 A7 18.21 17.29 14.63 Mean 30.53 21.10 26.58 21.02 20.96 18.76 NS-NNS 3.95 0.08 9.57 2.34

128 35 . 30 a.. ~ 25 ... --~...... A .. _ Ie ... -- au • .. .. ., = ... • NS _-ESL ..... EFL

10

5

o Strong Final Weak Final Type A Sentences

Figure 4.10 Group average duration (%) of strong vs. weak syllables in final position for Type A sentences

TM speakers produced relatively smaller duration contrasts between strong and weak syllables in final position than English speakers. The average durational contrast between strong and weak final syllables for Type A sentences is 9.43% for NS, 5.56% for ESL, and 2.20% for EFL speakers. The smaller contrasts here are caused primarily by the relatively shorter strong syllables in the speech of the two groups of TM speakers. Although the strong and weak final syllables are both relatively shorter for TM than for English speakers, the differences between TM and English speakers are greater in the relative duration of their strong final syllables than in the relative duration of their weak final syllables. For ESL speakers, their final strong syllables are on average shorter than those of NS by 3.95%, whereas their final weak syllables are on average shorter than those of NS by 0.08%. For EFL speakers, their final strong syllables are on average shorter than those of NS by 9.57%, whereas their final weak syllables are on average shorter than those of NS by 2.34%. As can be seen in Figure 4.10, the average contrast

129 between strong and weak final syllables is so small for the EFL speakers that the dotted line is almost flat.

TM speakers consistently produce relatively shorter final syllables than English speakers regardless of stress. This strongly suggests that TM speakers may have difficulty with final lengthening, on top of their difficulties with duration as a correlate of stress. That might explain why TM speakers experience greater difficulty with the duration of the strong final syllables than with the duration of the weak final syllables. Their difficulty with shortening weak syllables may on the surface compensate their difficulty with lengthening final weak syllables, consequently narrowing the difference between them and English speakers in the duration of final weak syllables. At the same time, their difficulty with lengthening strong syllables combining with their difficulty with lengthening final syllables makes lengthening final strong syllables particularly difficult.

Aside from comparing the durations of strong vs. weak syllables, one may wonder how alike the durations of NS, ESL, and EFL speakers would be, if one compares only the strong vs. weak syllables for Type A sentences; Table 4.25 presents the duration contrasts between strong and unstressed syllables for Type A sentences.

Table 4.25 Duration contrasts (%) between strong and unstressed syllables in non-final and final position for Type A sentences

Position Non-final Final Stress Stron Unstressed Unstressed NS 20.11 9.75 21.08 ESL 18.89 10.57 22.22 EFL 17.31 12.08 20.58

Alternatively, one may compare all stressed syllables (primary or not) vs. all totally unstressed syllables.

130 Table 4.26 Duration contrasts (%) between stressed and unstressed syllables in non-final and final position for Type A sentences

Position Non-fmal Final Stress Stressed Unstressed Stressed Unstressed NS 17.41 9.75 25.83 21.08 ESL 17.04 10.57 22.90 22.22 EFL 16.74 12.08 18.49 20.58

The results show that in all three types of comparisons (strong vs. weak, strong vs. unstressed, and stressed vs. unstressed), NS consistently produced grater duration contrasts than ESL, who in turn, produced greater duration contrasts than EFL for Type A sentences. However, NS and TM speakers are least alike when comparing the durations of strong vs. unstressed syllables.

Table 4.27 Duration contrasts (%) between strong and weak syllables in non-final positions for Type B sentences

Position Non-final Grall ESL Stress Stron Stron Weak B1 22.44 21.90 11.43 B2 22.31 22.05 11.28 B3 21.67 22.26 11.07 B4 23.46 22.00 11.33 B5 19.55 18.02 9.31 B6 16.94 16.89 10.82 B7 17.58 16.87 8.14 Mean 20.57 20.00 10.48 NS-NNS 0.57 -0.64

131 25,..------,

20 +------IIII"iIo;;;:------I

~ 15 +------"'Iil...._------I r::.~iSl L 1-. ~ § --~ J5 ...... EFL c 10 +------"'__------1

5+------1

0+------.------1 Strong Non-final Weak Non-final Type B Sentences

Figure 4.11 Group average duration (%) of strong vs. weak syllables in non-final position for Type B sentences

Similar to what we have found for Type A sentences, TM speakers produce smaller duration contrasts between strong and weak non-final syllables than do English speakers for Type B sentences. The smaller contrasts are also caused by relatively shorter strong syllables and relatively longer weak syllables in the speech of TM speakers. However, the differences are very small between NS and ESL speakers. The average durational contrast between strong and weak non-final syllables for Type B sentences is 10.73% for NS, 9.52% for ESL, and 5.72% for EFL speakers.

In sum, although ESL and EFL both produce relatively smaller durational contrasts between strong and weak syllables than English speakers, the contrasts are consistently smaller for EFL speakers than for ESL speakers in non-final and final positions for both Type A and Type B sentences.

132 4.3.4 DO TM SPEAKERS PRODUCE DURATION PATTERNS THATARE CLOSER TO TH,E TARGET SPEECH RHYTHM WITH IMPROVED PROFICIENCYAND INCREASED EXPOSURE TO ENGliSH?

There are several indications from the results that ESL speakers produce more native-like duration patterns than the less proficient EFL speakers.

Table 4.28 A comparison of duration results between ESL and EFL

Mean Duration (%) Standard Range of No.ofSyI. Correlation Contrast between Deviation of Duration (%) Different Coefficients Stron & Weak: Duration (%) fromNS withNS Groll NS ESL EFL NS ESL EFL NS ESL EFL ESL EFL ESL EFL Al 10.94 7.90 5.80 5.93 3.78 2.98 16.03 9.90 7.53 2 4 0.88 0.65 A2 11.99 8.92 2.68 6.27 6.15 3.78 17.36 14.92 9.72 0 2 0.92 0.70 A3 1.79 -0.80 -2.90 8.54 5.62 4.23 24.91 17.37 13.00 4 6 0.98 0.86 A4 2.82 -0.57 -1.49 8.08 6.60 4.12 23.77 21.08 12.05 5 5 0.96 0.91 AS 8.85 8.23 7.05 6.31 5.34 3.70 18.76 13.98 9.57 2 2 0.98 0.88 A6 3.24 5.70 3.53 5.29 4.99 3.90 13.73 13.20 9.70 2 2 0.87 0.78 A7 12.68 10.58 6.81 7.52 6.52 4.47 21.15 17.97 13.71 4 6 0.99 0.95

BI 11.54 10.47 7.26 8.18 7.15 5.19 20.31 18.55 13.55 1 4 0.99 0.83 B2 11.29 10.77 5.51 7.86 5.84 3.32 16.57 17.93 12.79 0 3 0.98 0.83 B3 10.01 11.19 5.13 6.23 6.53 3.16 14.38 16.32 8.05 2 4 0.96 0.83 B4 13.59 10.67 7.69 8.8 6.69 4.90 22.05 16.99 13.61 1 4 0.99 0.80 BS 12.28 8.71 3.90 7.86 5.84 3.32 22.46 17.10 10.06 3 5 0.99 0.85 B6 6.19 6.07 4.16 4.38 4.83 3.22 12.29 14.63 8.90 2 4 0.96 0.74 B7 10.16 8.73 6.38 6.21 5.20 3.71 15.16 12.88 11.48 0 6 0.99 0.89

First of all, although both ESL and EFL speakers produce less duration differentiation between strong and weak syllables than do English speakers for both Type A and Type B sentences, ESL speakers consistently produce more native-like duration contrasts between strong and weak syllables than EFL speakers for all sentences.

Second, although both ESL and EFL speakers tend to produce smaller variations in duration (%) among syllables of the sentences than English speakers, ESL speakers consistently produce more native-like variations in duration (%) for all sentences.

Third, although the duration (%) ranges between the longest and the shortest syllables of the sentences tend to be smaller for both ESL and EFL speakers than for 133 English speakers, ESL speakers consistently produce more native-like ranges of duration (%) than EFL speakers.

Fourth, despite that ESL and EFL show similar types of significant differences from English speakers for both Type A and Type B sentences, there are fewer differences between NS and ESL speakers than between NS and EFL speakers. A total of 19 syllables are found to be significantly different between NS and ESL as opposed to 27 syllables between NS and EFL for Type A sentences. A total of nine syllables are significantly different between NS and ESL as opposed to 30 syllables between NS and EFL speakers for Type B sentences.

Last but not least, the correlation coefficients of the duration (%) across syllables of the same sentences are consistently stronger between NS and ESL speakers than between NS and EFL speakers for all Type A and Type B sentences. Additionally, a larger number of significant correlations was found between NS and ESL than between NS and EF;L speakers. For Type A sentences, significant correlations were found in six sentences between NS and ESL and in five sentences between NS and EFL. For Type B sentences, correlation coefficients are significant in seven sentences between NS and ESL and in five sentences between NS and EFL speakers.

4.3.5 SUMMARY FOR THE DISCUSSION OFDURATION

TM speakers have greater difficulties shortening weak syllables than lengthening strong syllables. ESL speakers show only slight trouble with the lengthening of strong syllables. EFL speakers show greater difficulties with both than do ESL speakers. Type A sentences appear to be more difficult than Type B sentences for ESL speakers as far as syllable duration is concerned. ESL speakers produce more relatively longer weak syllables and relatively shorter strong syllables in Type A than in Type B sentences. For EFL speakers, Type A and Type B sentences seem equally difficult. Both ESL and EFL speakers have trouble lengthening final syllables, regardless of stress.

134 TM speakers generally correlate duration with the target stress patterns, although the correlation is not as strong as it is for English speakers. TM speakers, especially EFL speakers, occasionally lengthen or shorten wrong syllables. In particular, the weakly stressed syllables of EFL speakers are almost as long as their strong syllables. Unlike NS and ESL speakers who produced a three-way durational distinction for the three types of syllable (strong, weakly stressed, unstressed), EFL speakers produce a binary durational distinction (stressed vs. unstressed).

ESL and EFL speakers produce less durational differentiation between strong and weak syllables than English speakers in non-final and final positions for both Type A and Type B sentences. The smaller contrasts produced by TM speakers are primarily attributed to the relatively shorter strong syllables and the relatively longer weak syllables in non-final position and to the relatively shorter strong syllables in final positions. ESL speakers consistently produce greater differentiation between strong and weak syllables than EFL speakers do. For Type A sentences where there are at least three levels ofstress (strong, weakly stressed, unstressed), TM speakers consistently show smaller duration contrasts than English speakers whether the comparison was made between strong and weak syllables, strong and unstressed syllables, or stressed and unstressed syllables.

TM speakers' difficulties with shortening weak syllables and lengthening strong syllables may interact with their difficulties with lengthening final syllables. The interaction may explain why TM speakers consistently experience greater difficulty with lengthening final strong syllables than lengthening final weak syllables.

ESL speakers, who have higher overall proficiency and exposure to the English-speaking environment, produce more native-like duration patterns. Various aspects of the duration results, including the mean duration contrasts between strong and weak syllables, the standard deviations of duration (%) among syllables, the ranges of duration (%), the number of syllables significantly different from NS, and the correlations of duration ratios across sentences, all provide supporting evidence that ESL speakers perform better than EFL speakers with respect to their duration patterns.

135 CHAPTER 5: RESULTS AND DISCUSSION (2): INTENSITY

This chapter reports and analyzes results from the intensity data of Type A and Type B sentences. English, TM ESL, and TM EFL speakers are compared with respect to their use of intensity as a correlate of stress in English. Results of the two rhythmically diverse sets of sentences are reported separately and then combined for more detailed analyses. The focus of the current investigation will be the extent to which these two types of rhythmic patterns differ in terms of the types and degrees of difficulty in intensity presented to TM speakers and how such differences may help us understand the source of any difficulties.

Intensity data of Type A sentences are reported in section 5.1 and those for and Type B sentences are reported in section 5.2. These sections are each organized into five subsections. The first pair of subsections, 5.1.1 and 5.2.1, report basic statistics and patterns revealed by the peak intensity of syllables in dB. The data are valuable in showing the actual differences in dB between subject groups. However, caution should be taken when one compares differences in intensity across groups based on dB information alone because these differences often heavily reflect individual variations in loudness of speech. As a result, interpretations of the dB data are limited to revealing general patterns.

The second pair of subsections, 5.1.2 and 5.2.2, report basic statistics and general patterns revealed from intensity ratios. The intensity ratio of a syllable is obtained by dividing its peak intensity in dB by the maximum intensity of an utterance. The purpose is to eliminate variations in volume of speech as a potential variable. In the third pair of subsections, 5.1.3 and 5.2.3, direct syllable-to-syllable comparisons are made between speaker groups based on their relative syllable intensity. Both similarities to and significant differences from the intensity patterns of English speakers are reported. The fourth pair of subsections, 5.1.4 and 5.2.4, report correlation coefficients of overall

136 intensity patterns between subject groups for each test sentence to provide additional information about how close to target ESL and EFL speakers are. The fifth pair of subsections, 5.1.5 and 5.2.5, report test reliability information for all test sentences to provide information about how consistent NS, ESL, and EFL speakers are with their intensity patterns in three productions of the same sentence.

5.1 INTENSITY PATIERNS OF TYPE A SENTENCES

This section reports intensity results for the Type A sentences, which feature long stretches of weak syllables. Each sentence contains either one single strong syllable or two widely spaced strong syllables. Four rhythmic patterns are represented. Sentences Al and A2 feature the five-syllable rhythmic pattern Swwww, sentences A3 and A4 feature the seven syllable rhythmic pattern SwwwwwS, sentences A5 and A6 feature the seven syllable rhythmic pattern wSwwwww, and sentence A7 has the eight-syllable rhythmic pattern SwwwwwSw. The total number of utterances analyzed is 630 and the total number of syllables analyzed is 4140.

5.1.1 INTENSITY PATTERNS IN DB This section summarizes results based on the peak intensity in dB of the individual syllables. A group average intensity in dB was obtained for each syllable of every test sentence. Overall observations based on the fluctuation of intensity from syllable to syllable and the basic statistics, including mean, standard deviation, and range of peak intensity, are reported.

137 Table 5.1 Group Average intensity in dB ofindividual syllables for Type A sentences

Item Mean SD Range GrOll S llable Al it with me NS 21.73 19.97 16.70 ESL 28.33 25.83 20.57 EFL 27.27 26.07 24.20 A2 it for me NS 18.00 18.93 17.13 ESL 24.93 22.60 19.03 EFL 26.17 26.33 24.40 A3 to wear the jeans NS 25.13 25.73 22.40 24.87 ESL 25.37 27.70 25.40 25.00 EFL 28.57 30.63 27.23 27.23 A4 to bring the wine NS 19.83 20.37 17.87 20.77 ESL 28.13 25.67 27.30 25.03 26.20 EFL 28.90 27.77 29.37 25.87 27.53 A5 My the Ie mon ie NS 26.87 21.30 22.30 21.53 25.17 ESL 23.63 22.17 22.60 22.80 23.33 EFL 24.13 24.63 25.00 24.77 24.40 A6 The it to me NS 29.47 19.53 18.50 22.27 ESL 29.13 29.47 26.27 25.67 EFL 28.30 29.03 27.37 27.80 A7 Jane's one wear .ing the blue dress NS 26.50 24.77 22.37 21.60 27.97 25.20 ESL 31.27 31.67 28.50 28.50 31.77 28.43 EFL 31.20 31.27 28.90 27.80 30.30 29.40

ESL and EFL tend to produce syllables with higher peak. intensity than do NS. The mean syllable intensity for all sentences tends to be lowest for NS, the highest for EFL, and in between for ESL. The mean syllable intensity for all sentences averages 23.25 dB for NS, 26.35 dB for ESL, and 27.61dB for EFL. The variation of intensity in dB among syllables of a sentence is consistently the greatest for NS, the smallest for EFL, and in between for ESL. The average standard deviation of intensity for all sentences is 3.78 dB for NS, 2.44 dB for ESL, and 1.62 dB for EFL. The intensity range of a sentence is consistently the widest for NS, the narrowest for EFL, and in between for ESL. The 138 average intensity range of all Type A sentences is 9.69 dB for NS, 6.59 dB for ESL, and 4.40 dB for EFL. In summary, EFL speakers tend to produce higher means, smaller variations, and narrower ranges of intensity in dB than ESL speakers, who in tum produce higher means, smaller variation, and narrower ranges of intensity in dB than do NS.

139 (a) Peak intensity (dB) of syllables for sentence Al (Swwww)

35 .~.~. 30 ---- ' ~ .. ~ .. ...~...... 25 ....

iii' 20 ~ • NS ~ --ESL IIIc .l!! 15 -- .. - -EFL .5

10

5

0 Jim wrote it with me Syllable

(b) Peak intensity (dB) of syllables for sentence A2 (Swwww)

35,..------,

30 +----.,.------1

...... ~ 25 +-----"=OOOO--=-=---=--_.IIIIi=------"'-_==-----1....

!..... 20 +------''-''"------...... - ...... ---=::---1 r;;;;;.~iSl - 1-. NS

III --~ ~ .15 +------1 - .. CD EFL

10 +------1

5+------1

0+------,-----.,..----....,.-----,....------1 Jane made it for me Syllable

140 (c) Peak intensity (dB) of syllables for sentence A3 (SwwwwwS) 35-.------....,

30 -1---t~--&_=_------__;:;r----"'----...._------~

25 +------'"

iii' 20 +------1 r;;;;;.~iSl ~ 1-. ~ ~ ~-~ i 15 +------_1 ...... EFL

10 +------1

5+------1

0-1----.,...----.,---...,.----,---..,.-,...---...,.----1 You like me to wear the jeans Syllable

(d) Peak intensity (dB) of syllables for sentence A4 (SwwwwwS) 35-.------.

30 +---,...... ,-c:!"'---.~------______..______------_I ~ ~...... til ...... ~ ...... ".""...... A 25 +- ---'~-----..l!!!!!!!...------'1111111"""------_1-- - ""," -- . +------~!'!'!!!!II- lOiii;:---______::;;~~__1 iii'~ 20 ...... --... • NS ~ ~-ESL i 15 +------1 ...... EFL

10 -1------1

5+------_1

O-l----..,----.,..-----..,----.,----r----.,---~ You want me to bring the wine Syllable

141 (e) Peak intensity (dB) of syllables for sentence AS (wSwwwww) 35;------.

30 +------f

r:~;;;;;;;~l iii's 20 +------1 • ~

~III --~ i 15 +------f ...... EFL

10 +------l

5+------l

O+----..,----r-----r----.,...---,...----r-----! My mom made the Ie man pie Syllable

(f) Peak intensity (dB) of syllables for sentence A6 (wSwwwww)

35 -r----~=------__.

30 +-----..~------~:w--~~~,...... ------_I

25 -f------~------"-___I

iii' 20 +------~.------:;~---1r;;;;;+~iSl S I". ~ ~ ~ --~ i 15 -f------___I ...... EFL

10 +------___1

5+------1

O+----..,------,,...----r----.,...---r---..,.-----! The old man gave it to me Syllable

142 (g) Peak intensity (dB) of syllables for sentence A7 (SwwwwwSw) 35 ...... ------~__• 2z-­ 30 +-----=--"lI1IlI--"..'--'"-----~-_'__"'lo;::------:-~,...... '_::__=r_____i 111- ....C.. ..

25 +------'~__------_F_--- _____i

iii 20 +------_____i ~ • NS ~ --ESL III C ~ 15 +------_____i ...... EFL

10 +------1

5+------;

O-!---,..----r-----r----,---...,.---..,..----or----! Jane's the one wear ing the blue dress Syllable

Figure 5.1 Group Average peak syllable intensity in dB for sentences Al through A7

Three observations can be made based on the results summarized in Table 5.1 and FigureS.I. First, ESL and EFL tend to speak with a noticeably higher intensity than NS. Their syllables often average higher in dB than NS throughout the sentence. Intensity levels of ESL and EFL are more similar to each other than they are to those of NS. Second, intensity patterns of NS appear highly consistent with the target stress patterns. Their intensity usually rises at target strong syllables, drops at weak syllables and declines continuously over the long stretch of weak syllables. In contrast, the intensity patterns of ESL and EFL sometimes exhibit wrongful placement of prominence. Target strong syllables are sometimes produced with lowered intensity while more often weak syllables are produced with raised intensity. For instance, both ESL and EFL raise intensity at target weak syllables wrote in AI, wear in A3, bring in A4, made in AS, gave in A6, and wear in A7. Third, intensity tends to decline more rapidly over weak syllables

143 toward the end of a sentence for NS than for ESL and EFL. Intensity appears more level or tends to vary at a narrower range throughout an utterance for ESL and EFL.

5.1.2 REIATIVEINTENSITYPATTERNS

This section reports results based on the relative intensity of the syllables of a sentence. The intensity data in dB are converted into ratios so that the syllable with the highest peak intensity in a sentence assumes the value of one and others as its ratios. A group average. intensity ratio was obtained for each syllable of a sentence. These basic statistics, including mean, standard deviation, and range, are summarized in Table 5.2.

144 Table 5.2 Group Average intensity ratios of individual syllables for Type A sentences

Item Mean SD Range GrOll S llable At with me NS 0.68 0.57 ESL 0.80 0.64 EFL 0.82 0.76 A2 for me NS 0.66 0.60 ESL 0.79 0.65 EFL 0.89 0.82 A3 to wear the jeans NS 0.79 0.81 0.70 0.79 ESL 0.84 0.91 0.83 0.82 EFL 0.90 0.85 0.86 A4 to brin the wine NS 0.69 0.72 0.62 0.74 ESL 0.91 0.82 0.89 0.80 0.85 EFL 0.92 0.89 0.94 0.83 0.88 AS My the Ie mon pie NS 0.90 0.71 0.74 0.71 0.83 ESL 0.89 0.83 0.85 0.85 0.88 EFL 0.89 0.91 0.92 0.91 0.90 A6 The gave it to me NS 0.87 0.77 0.57 0.54 0.64 ESL 0.85 0.90 0.86 0.76 0.74 EFL 0.87 0.93 0.90 0.85 0.86 A7 Jane's -ing the blue dress NS 0.75 0.71 0.94 0.84 ESL 0.83 0.83 0.92 0.83 EFL 0.87 0.84 0.92 0.89 Total NS ESL EFL

Note that in Table 5.2 the highest peak intensity ratios of the sentences are not necessarily 1.00. This is because intensity was normalized within each of the 30 utterances of each sentence prior to an group average intensity ratio was obtained for each syllable to eliminate variations in volume of speech across utterances. Unless a syllable has the highest peak intensity in all 30 utterances produced by a speaker group, it would have an average intensity ratio that is smaller than 1.00. 145 ESL and EFL tend to produce higher intensity ratios on weak syllables than do NS. The mean intensity ratio is consistently the highest for EFL, the lowest for NS, and in between for ESL throughout Type A sentences. The mean intensity ratio of Type A sentences is 0.79 for NS, 0.87 for ESL, and 0.90 for EFL. The variation in intensity ratios among syllables of a sentence is consistently the smallest for EFL, the greatest for NS, and in between for ESL. The standard deviation of intensity ratios among syllables of an utterance averages 0.14 for NS, 0.08 for ESL, and 0.05 for EFL. Syllables vary within a narrower range of dB ratios for TM speakers than for English speakers. The range of intensity ratios is consistently the narrowest for EFL, the broadest for NS, and in between for ESL. The average range of intensity ratios of Type A sentences is 0.35 for NS, 0.22 for ESL, and 0.14 for EFL. The results show that syllables tend to be relatively louder but with smaller intensity variation in the speech of EFL than in the speech of ESL, who in tum are relatively louder but less variable in intensity than NS.

Average intensity ratios are relatively high in the speech of ESL and EFL for two major reasons. First, these speakers raise intensity on a greater number of syllables than do NS. Although they occasionally lower intensity on target strong syllables, more often they raise intensity on target weak syllables. Another contributing reason is that their intensity does not decline as rapidly as for NS over the long stretch of weak syllables. TM speakers tend to produce syllables that vary within a higher and narrower range of intensity ratios than English speakers.

Figure 5.2 shows intensity patterns ofType A sentences in tenns of intensity ratios.

146 (a) Mean intensity ratios of syllables for sentence Al (Swwww)

1.00

0.95

0.90

0.85

:2 0.80 i! • NS --~ 0.75 --ESL iii c .. "It. .. EFL S .5 0.70

0.65

0.60

0.55

0.50 Jim wrote it with me Syllable

(b) Mean intensity ratios of syllables for sentence A2 (Swwww)

1.00 ,---A------,

0.95 +----~~...... ,.------l

0.90 +------'~.-,,""____,_=='''____;;:-----____,------l

0.85 +------~-----_____.".~----~____,----I

:2 0.80 +------~-----~-~__------I l • NS ~ 0.75 +------~____,------~------I --ESL .. "It. .. EFL -_scf:! 0.70 +------~------"----I

0.65 -I------+-~,.,-t'!~~"007_--.-___1

0.60 +------",.,--1

0.55 +------1

0.50 +----..,--,r----.....,..----...,.------r-----I Jane made It for me Syllable

147 (c) Mean intensity ratios of syllables for sentence A3 (SwwwwS)

1.00

0.95

0.90

0.85

:2 0.80 ...... i! • NS ~ 0.75 --ESL iii ...... EFL I 0.70

0.65

0.60

0.55

0.50 You like me to wear the jeans Syllable

(d) Mean intensity ratios of syllables for sentence A4 (SwwwwwS)

1.00...------,

0.95 L~~~~L------___..____------____J

0.90 +------\~ir-----;;---~=------.t.--;

II' 0.85 +-----~-.....".O----;:-,-.,r-~--~--=-----:------:III-I

:2 0.80 +------+------11-....------1 ! • NS ~ 0.75 +------+------..------1 --ESL _".,.",:::--~----_F_-----I .... EFL J0.70 +------""""......

0.65 +------"ll_~~----J

0.60 +------1

0.55 +------1

0.50 +----,..---r----....,-----,.---..,..----.,..-----i You want me to bring the wine Syllable

148 (e) Mean intensity ratios of syllables for sentence AS (wSwwwww)

1.00..,.....------,

0.95 +-----A'.-..="----==----I-"'------l

0.90 +--4P'------"'=~~~------==-..c=---..!~,...,&_----l.....

0.85 +------~-_____=--_=_IIIIIIII__--""-Iu=------j

:2 0.80 +------~------I------j ! • NS ~ 0.75 +------~---___...:---______.'_------j --ESL iii ... 'Ill ... EFL I 0.70 +------=~------==------j

0.65 +------j

0.60 +------j

0.55 +------cI

0.50 +-----r"'---.,.----...,...-----.----r----..,..--~ My mom made the Ie mon pie Syllable

(f) Mean intensity ratios of syllables for sentence A6 (wSwwwww)

1.00 -r-----...------.

0.95 +------.l~-""------I

0.90 +--~I----_¥---__:_I~------"''__k------I

0.85 +--IIIIf------lo.------__.---...:...... "Ilr-----<...... ~~____1

:2 0.80 +------~------"'~------1 • NS !~ 0.75 +--_------J\------'"__--,o;;;;;::::"Oi. _1 --ESL iii ... 'Ill .. EFL I 0.70 +------~------____1

0.65 +------~------______1

0.60 +------l\-----______.'_-____1

0.55 +------=-~"""""=___I_------cI

0.50 +----.....---,...---..,....----.---.....-----r"'----I The old man gave it to me Syllable

149 (g) Mean intensity ratios of syllables for sentence A7 (SwwwwwSw)

1.00 -r------,

0.95 +- ~---_____,~..______..______'lillk___------___=__-----1

0.90 +------'ll. --..------....--..------!'I---"~...... _:_._____--1

0.85 +------"'l-..-----'~---"<-::_=/i'___-I----~:__--1

:2 0.80 +------"Ilr------I------1 ! ~~.;;;;;:;;~NiSsl ~ 0.75 +------~--~,...... ------1--ESL ~ ...... EFL :5 0.70 +- ..3L. --1

0.65 +------1

0.60 +------1

0.55 +------1

0.50 +---,....-----r-----r---...,..---,.-----r----,----! Jane's the one wear ing the blue dress Syllable

Figure 5.2 Group Average intensity ratios for sentence Al through A7

A few observations can be made based on the results summarized in Table5.2 and Figure5.2. First, the intensity patterns of NS appear highly consistent with the target stress patterns. Their intensity tends to rise at target strong syllables and gradually declines over the long stretch of weak syllables. Also, the intensity of their function words, especially the and it, tends to be very low. However, the intensity ratios of ESL and EFL do not always coincide with the target stress patterns. Target strong syllables are occasionally spoken with lowered intensity. Examples include the syllable Jim in sentence AI, the syllables you and jeans in sentences A3, and the syllable you in sentence A4. More frequently, weak syllables are produced with raised intensity. Examples include the weak syllables wrote in AI, wear in A3, want and bring in A4, made in AS, gave in A6, and one, wear in A7. Prevalence of the relatively louder weak syllables over the relatively softer strong syllables may result in a greater number of intensity hikes in a sentence than expected. "Zigzag" intensity patterns are often observed over what was

150 expected to be a long stretch of weak syllables in the speech of both ESL and EFL. For example, in sentence A4 ESL and EFL raise intensity on three syllables (want, bring, wine) as opposed to the two target stresses (want, wine) according to the context. They lower intensity on the strong syllable you, and raise intensity on the weak syllables want and bring.

Second, syllables often maintain higher intensity ratios throughout a sentence in the speech of ESL and EFL than in the speech of NS. EFL, in particular, tend to speak at slightly higher intensity ratios than ESL. Not only do ESL and EFL sometimes raise intensity on target weak syllables, when they do lower intensity on weak syllables, they often do not lower as much as NS do. This phenomenon is evident in all Type A sentences and is particularly true for small grammatical words, especially, it, the, to.

Third, in addition to being relatively louder, syllables vary within a tighter and more constant range of intensity ratios for ESL and EFL. As a result, their intensity patterns look flatter than those of NS. The differences between TM and English speakers appear both on the placement of stress and the magnitude with which intensity varies with stress.

5.1.3 SIGNIFICANT DIFFERENCES BETWEENNS, ESLAND EFL

T-tests were performed on individual syllables to determine whether or not the obtained difference in intensity ratio between pairs of groups was statistically significant or likely by chance. Results of the t-test scores for the intensity ratios of individual syllables between groups are listed in Table 5.3 below.

151 Table 5.3 Student's t-test scores for intensity ratios of individual syllables between groups for Type A sentences

Group Al NS-ESL NS-EFL ESL-EFL 0.775 A2 Jane NS-ESL 1.468 NS-EFL 2.057 ESL-EFL 0.935 1.544 A3 You like NS-ESL 2.029 0.561 NS-EFL 0.868 1.663 ESL-EFL 1.420 0.769 A4 You want NS-ESL 2.123* 0.780 NS-EFL 1.679 1.257 ESL-EFL 0.458 0.440 AS My mom NS-ESL 0.292 0.073 NS-EFL 0.204 0.911 ESL-EFL 0.105 0.809 A6 The old NS-ESL 0.712 1.055 NS-EFL 0.042 0.850 ESL-EFL 0.929 0.615 A7 Jane's the dress NS-ESL 0.358 0.874 0.210 NS-EFL 0.493 0.810 1.800 ESL-EFL 0.766 0.094 1.540 0.753 2.469* *p<.05 (tcrit=2.101), **p<.OI(tcrit=2.878), df=18, two-tailed

In the next two subsections we will examine which syllables are found to be significantly different between groups in terms of intensity ratios. Section 5.1.3.1 reports and compares differences between native and non-native speakers, NSand ESL, and NS and EFL. Results provide crucial information as to the kinds of difficulties TM speakers face with intensity as a correlate of stress, as well as in what ways ESL and EFL are similar or different in their problems with intensity. Section 5.1.3.2 examines significant differences between ESL and EFL. Results in this section provide information about the

152 changes that might have taken place from EFL to ESL in the way intensity correlates with stress.

5.1.3.1 Differences between NS and ESL vs. differences between NS and EFL ESL and EFL show similar types of difficulties with intensity as a correlate of stress. Significant differences between NS and ESL and significant differences between NS and EFL highlight two types of difficulties: strong syllables that are too soft or weak syllables that are too loud. Despite having similar types of problems, ESL is distinguished from EFL in two respects. First, ESL speakers do not appear to have problems with weak syllables in final position. In contrast, out ofthe five weak syllables in final position, EFL produce three relatively louder than NS. Second, ESL speakers display a slight lesser degree of difficulty with intensity. A smaller number of syllables are found to· be significantly different between NS and ESL than between NS and EFL. In particular, EFL have greater difficulty with intensity of weak syllables in both non-final and final positions.

Table 5.4 Number of syllables with intensity ratios significantly different from NS as strong or weak in non-final and final positions for Type A sentences

Position Non-Final Final Stress Strong (k=2 T e louder softer ESL 1 25 EFL 1 31

Out of 46 syllables compared for Type A sentences, a total of 25 syllables was found to be significantly different between NS and ESL. They comprise two relatively softer strong syllables in non-final position, 22 relatively louder weak syllables all in non­ final position, and one relatively louder strong syllable in final position. A total of 31 syllables was found to be significantly different between NS and EFL. They comprise one relatively softer strong syllable in non-final position, 26 relatively louder weak

153 syllables in non-final position, one relatively louder strong syllable in final position, and three relatively louder weak syllables in final position.

In summary, ESL and EFL evidence similar types of difficulties with intensity ratios including relatively softer strong syllables and relatively louder weak. ESL speakers demonstrate less difficulty with weak syllables in non-final and final position than EFL speakers do. A larger number of syllables are found to be significantly different between NS and EFL than between NS and ESL.· Relatively louder weak syllables account for the majority of differences between TM and English speakers.

5.1.3.2 Significant differences between ESL and EFL This section reports significant differences in intensity ratios between ESL and EFL. Syllables that are found to be significantly different between these two groups are further examined under four categories: strong syllables in non-final position, weak syllables in non-final position, strong syllables in final position, and weak syllables in final position. The purpose is to identify patterns in changes that might have taken place due to improved proficiency and increased exposure to English.

Table 5.5 Number of strong and weak syllables with intensity ratios significantly different between EFL and ESL in non-final vs. final positions for Type A sentences

Position Non-Final Final Total Stress Strong Weak: Strong Weak: (k=8) (k=31) (k=2) (k=5) T e louder softer louder softer louder softer louder softer l~l.I:::.::-=,------t--===--t--=-'='-- --=-==---j--=-::"==---t--=-'==---- EFLvs.ESL ESIJNS only EFUNSonly EFUESIJNS ..ESUEFL only

Of the 46 syllables compared, the intensity ratios of 11 syllables were significantly different between ESL and EFL. All of the differences are attributed to relatively louder weak syllables on the part ofEFL. Ofthese 11 syllables, seven are weak syllables in non­ final position, and four are weak syllables in final position. 154 Table 5.6 Number of strong and weak syllables with intensity ratios significantly different between EFL and ESL.speakers categorized as content and function in non-final vs. final positions for Type A sentences

Position Non-Final Final Stress Strong (k=8) Weak (k=3l) Strong (k=2) Weak (k=5) Length Softer (k=O) Louder (k=7) Softer (k=O) Louder (k=4) Syl.Type content I function content I function content I function content I function Number I 4 I 3 0 I 0 1 I 3

Of the seven relatively louder weak syllables produced by EFL speakers, four are content words or morphemes and three are function words or morphemes. We also notice that the EFL speakers have greater difficulty softening final weak function words than the ESL speakers. Of all three final weak function words for Type A sentences, all are relatively louder for EFL than for ESL.

Not all of the differences between ESL and EFL translate into difficulties with speech rhythm. Simply because EFL and ESL differ from each other on the intensity ratios of one syllable does not mean either of them produce that syllable significantly differently from English speakers. For example, the syllable dress in sentence A7 is found to be significantly different between ESL and EFL, but not between NS and ESL or NS and EFL. This is a typical example of differences between ESL and EFL that do not amount to difficulties for ESL or EFL. Differences that are interpretable as difficulties are syllables that are significantly different between NS and ESL and/or between NS and EFL. When a syllable is found to be significantly different both between NS and ESL and between NS and EFL, it implies a difficulty common to both ESL and EFL. When a syllable is found to be significantly different between NS and ESL but not between NS and EFL, it implies a difficulty unique to ESL. Similarly, when a syllable is found to be significantly different between NS and EFL but not between NS and ESL, it implies a difficulty unique to EFL.

Among the 11 syllables that are found to be significantly different between ESL and EFL, one represents a difference between ESL and EFL only. Of the 10 remaining syllables that do suggest difficulties for either ESL or EFL, six syllables indicate

155 difficulties common to ESL and EFL while four indicate difficulties unique to EFL. However, none of the differences between ESL and EFL indicate a difficulty unique to ESL.

Figure 5.3 Distribution of significant differences in intensity ratios between ESL and EFL in terms ofthe spread of difficulties to EFL and/or EFL

In addition, the significant differences in the intensity ratios of weak syllables in non-final positions between ESL and EFL strongly reflect difficulties common to both ESL and EFL, whereas differences found with weak syllables in final position reflect difficulties unique to EFL. This result is consistent with an earlier finding that ESL show difficulty with the intensity ratios of weak syllables in non-final but not in final positions.

In summary, the single most important difference between ESL and EFL is that EFL produce relatively louder weak syllables than ESL. All of the significant differences between ESL and EFL are attributed to relatively louder weak syllables in non-final and

156 final position on the part of EFL. Most of the differences between ESL and EFL imply difficulties unique to EFL or difficulties common to both ESL and EFL. None of the differences between ESL and EFL indicate difficulties unique to ESL. This suggests that whenever a significant difference with intensity is found between ESL and EFL, it is more likely for EFL than for ESL to be the group having difficulty. It also suggests that if ESL is having difficulty, EFL will manifest this difficulty as well, but the reverse will not apply.

5.1.4 CORREIATION OFINTENSITYPAITERNS BEIWEENSPEAKER GROUPS Pearson Product-moment correlation coefficients were obtained to estimate how closely intensity ratios of the same sentence covary from syllable to syllable between pairs of subject groups. Correlation tests were performed between NS and ESL, NS and EFL, and ESL and EFL respectively for all seven type A sentences. Complete results of the correlation tests are summarized in Table 5.7 below.

Table 5.7 Pearson Product-moment Correlation Coefficients for mean syllable intensity ratios between groups for Type A sentences

Sentence NS andESL Al 0.8904* A2 0.8300 A3 A4 AS A6 A7 *p<.05, **p<.Ol

There are two major observations. First, the correlation is the strongest between ESL and EFL. Fewer significant correlations are identified between NS and ESL or between NS and EFL than between ESL and EFL. Correlation coefficients are found to be significant in six sentences between NS and ESL, in five sentences between NS and EFL, and in all seven sentences between ESL and EFL. Second, correlation of intensity

157 ratios is consistently stronger between NS and ESL than between NS and EFL throughout all Type A sentences.

The results lead to three suggestions. First, ESL and EFL are quite similar with their intensity patterns. Second, ESL and EFL are more similar to each other than they are to English speakers. And third, intensity tends to vary in a more similar fashion between NS and ESL than between NS and EFL.

5.1.5 TEST RELIABIliTY This section summarizes results of test-retest reliability between the three administrations of all seven Type A sentences. Three reliability indices were obtained for each sentence for, each group. High test-retest reliability indicates speakers are consistent with their intensity patterns.

158 Table 5.8 Test-retest reliability for intensity ratios from three productions for Type A sentences

Sentence Administration Test-retest r NS ESL EFL Al First & Second 0.9746** 0.9925** 0.9633** First & Third 0.9894** 0.9973** 0.9744** Second & Third 0.9957** 0.9986** First & Second 0.9884** 0.9910** First & Third 0.9973** 0.9846** Second & Third 0.9928** 0.9938** A3 First & Second 0.9800** 0.9417** First & Third 0.9908** 0.8944** 0.8991** Second & Third 0.9774** 0.8879** 0.9014** A4 First & Second 0.9967** 0.9215** 0.9604** First & Third 0.9916** 0.8900** 0.9646** Second & Third 0.9973** 0.9525** 0.9447** AS First & Second 0.9826** 0.9904** First & Third 0.9910** 0.9904** Second & Third 0.9893** 0.9748** A6 First & Second 0.9884** 0.9917** First & Third 0.9969** 0.9914** Second & Third 0.9941** 0.9794** A7 First & Second 0.9421** 0.9843** First & Third 0.9590** 0.8974** Second & Third 0.9841** 0.9277** *p<.05, **p<.Ol

Results of the test-retest reliability indicate that speakers of all groups are highly consistent with their intensity patterns in all three productions of Type A sentences. No correlation coefficient is found to be below significance at p

5.1.6 SUMMARY OF INTENSITYRESULTS FOR TYPE A SENTENCES

ESL and EFL produce syllables with higher intensity ratios than NS. The average intensity ratio is highest for EFL, lowest for NS, and in between for ESL. Variation in

159 intensity ratios averages the smallest for EFL, the greatest for NS, and in between for ESL. Syllables vary within a narrower range of inteusity ratios for ESL and EFL than for NS. The range of intensity ratios of Type A sentences is the narrowest for EFL, the broadest for NS, and in between for ESL. The results show that EFL consistently produce syllables that are louder and more equal in loudness than ESL, who in tum produce syllables that are louder and more equal in loudness than NS.

For NS, the rise and fall of intensity ratios are highly consistent with the target stress patterns for NS. They tend to raise intensity at target strong syllables and lower intensity at target weak syllables. However, rise and fall of intensity ratios do not always coincide with the target stress patterns for ESL and EFL. They sometimes lower intensity on target strong syllables or more frequently raise intensity on target weak syllables. A "zigzag" intensity pattern is often observed over what was intended to be a long stretch of weak syllables. TM speakers do not lower intensity on weak syllables as much as NS do. The intensity ratios of their weak syllables appear to hover at a higher range than for NS. Overall, the differences between TM and English speakers seem to appear both on the placement of stress and on the magnitude in which intensity varies with stress.

ESL and EFL produce similar types of difficulties with intensity. Significant differences between NS and ESL and between NS and EFL indicate that ESL and EFL sometimes produce relatively softer strong syllables and more often relatively louder weak syllables. However, ESL speakers do not have as much difficulty with intensity as EFL. A larger number of syllables is found to be significantly different between NS and EFL than between NS and ESL. Relatively louder weak syllables account for the majority of differences between TM and English speakers.

Relatively louder weak syllables produced by EFL also account for the differences between ESL and EFL. All of the significant differences between ESL and EFL involve relatively louder weak syllables in non-final and final position on the part of EFL. Most of the differences between ESL and EFL indicate difficulties common to both ESL and EFL, or difficulties unique to EFL. None of the differences between ESL and EFL suggest a difficulty unique to ESL. 160 Intensity ratios correlate more closely between ESL and EFL than between NS and ESL or between NS and EFL. Despite great similarity between the two groups of TM speakers, ESL speakers produce intensity patterns that vary in a more native-like way than EFL. Correlation coefficients are consistently stronger between NS and ESL than between NS and EFL in all but one Type A sentence. Speakers of all groups are highly consistent with their intensity patterns for Type A sentences. No test-retest reliability o indices between two administrations of the same sentence fall below significance at p

5.2 INTENSITY PATTERNS OF TYPE B SENTENCES

This section reports results from the intensity patterns for the Type B sentences, which feature a highly regular rhythmic pattern of alternating strong and weak syllables. Each strong syllable is immediately preceded and/or followed by an weak syllable, and vice versa. All sentences were embedded in broad-focused contexts to encourage speakers to introduce all words that carry content information as new and stressed. The alternating stress pattern is further reinforced with word type. All strong syllables are made up with content words, which are more likely to bear stress than function words and morphemes, which make up the weak syllables.

Three different rhythmic patterns are represented in Type B sentences. Sentences Bl through B4 feature the six-syllable iambic rhythmic pattern wSwSwS, sentences B5 and B6 feature the seven-syllable trochaic rhythmic pattern SwSwSwS, and sentence B7 features the eight-syllable iambic rhythmic pattern wSwSwSwS. The total number of utterances analyzed is 630 and the total number of syllables analyzed is 4140.

5.2.1 INTENSiTY PAITERNS IN DB This section summarizes results based on the peak intensity in dB of the individual syllables of Type B sentences. A group average intensity in dB was obtained for each syllable of each test sentence. Overall observations based on actual fluctuations of

161 intensity and basic statistics, including mean, standard deviation, and range of peak intensity, are reported.

Table 5.9 .Group Average intensity in dB ofindividual syllables for Type B sentences

Group Mean SD Range S Hable Bl back by noon NS 28.00 27.27 28.83 ESL 29.90 30.60 EFL 26.60 27.40 B2 and laugh NS 23.70 23.93 ESL 26.70 28.67 EFL 23.40 25.70 B3 and write NS 16.27 17.97 ESL 25.77 28.53 EFL 23.07 25.97 B4 at noon NS 23.13 26.40 ESL 26.40 28.63 EFL 24.03 25.40 BS were mad at Jim NS 24.53 24.50 21.13 23.33 ESL 30.13 28.87 26.33 25.90 EFL 26.30 25.73 23.07 24.13 B6 at bake -ing bread NS 28.83 29.07 23.17 28.70 ESL 31.03 33.43 29.67 32.23 EFL 26.40 29.17 26.10 28.67 B7 ride to work at once NS 31.23 26.03 29.43 23.20 26.73 ESL 27.37 31.27 25.33 27.90 EFL 25.67 28.93 22.83 25.90 Total NS ESL EFL

The mean peak intensity among syllables of a sentence is generally higher for ESL than for NS and EFL. ESL speakers produce the highest mean peak intensity across all Type B sentences. NS and EFL are similar in terms of mean intensity. The mean peak intensity for all sentences averages 26.38 dB for NS, 29.95 dB for ESL, and 26.64 dB for

162 EFL. The variation in intensity among syllables of a sentence tends to be greatest for NS, smallest for EFL, and in between for ESL. The average standard deviation of intensity for Type B sentences is 2.38 dB for NS, 1.95 dB for ESL, and 1.83 dB for EFL. The intensity range of a sentence tends to be wider for NS than for ESL or EFL for most Type B sentences. The average intensity range for Type B sentences is 9.69 dB for NS, 6.59 dB for ESL, and 4.40 dB for EFL. Caution must be taken in the interpretation of differences in absolute intensity between groups because non-linguistic factors, such as individual variations in volume of speech or proximity of one's mouth to the microphone during recording, can easily affect the intensity of speech.

163 (a) Peak intensity (dB) of syllables for sentence B1 (wSwSwS) 35 ...... - _..II 30 - _...... - -- - ,''''''-- - .. ...A- .. .. -- - ...... "'lIIIIIl - ...... A. .. 25 -'" '" iii' 20 :E.. • NS ~ --ESL 1/1 ...... EFL S 15 .5

10

5

o need it back by noon Syllable

(b) Peak intensity (dB) of syllables for sentence B2 (wSwSwS) 35 ,,-..., -~ 30 ... "'..&...... -::----...... - ...... """ .. .. """". ~ ... ",'r- "" ---.... 25 ~ ~.. '" • NS --ESL ...... EFL

10

5

o They play with dad and laugh Syllable

164 (c) Peak intensity (dB) of syllables for sentence B3 (wSwSwS)

35.....------,

30 +---1...... -"---=---=------&---"'..,------::;...-.--....------1 - """". ,..-... "- AI...... '-...... -~-----=- 25 +------.--=...... =&=----1 a'"

~ 20 t-----~--~~::::;;;----~~~------I 1-::+,~Nffis~ :i 'Y"" ~ --ESL i 15 +------1 ...... EFL

10 +------1

5+------1

0+----.,...... ---,.....------,r------r----r----4 We learn to read and write Syllable

(d) Peak intensity (dB) of syllables for sentence B4 (wSwSwS) 35.....------., 30 +----c::;;;;;r--""""=-----==--""'iIIF"''----==------="""...------1... ,,-...... --...""",-~-_-...... ~ """". 25 +--illt--'==------"'"'o---'----,;;;o~ __I.___---1

iii' 20 +- ---1 ~ • NS ~ --ESL ! 15 +------1 ...... EFL

10 -1------1

5+------1

O+----.,...... ---,.....------,-----r----r----! met with John at noon Syllable

165 (e) Peak intensity (dB) of syllables for sentence B5 (SwSwSwS)

35

30 -- .....

25

iii 20 :E. • NS ~ _-ESL III ...... EFL j 15 .5

10

5

0 Mom and dad were mad at Jim Syllable

(f) Peak intensity (dB) of syllables for sentence B6 (SwSwSwS)

40

35 "", ...... 30 .. ",a'" '" .. '" ..... 25 *'" lD :E. • NS ~ 20 --ESL IIIc S ...... EFL .5 15

10

5

0 John is good at bake lng bread Syllable

166 (g) Peak intensity (dB) of syllables for sentence B7 (wSwSwSwS)

Figure 5.4 Group Average peak syllable intensity in dB for sentences Bl through B7

A few observations can be made based on the results summarized in Table5.9 and Figure5.4. First, ESL generally speak with noticeably higher intensity than NS and EFL, with syllables averaging higher in dB than NS and EFL throughout most of the sentence. Second, intensity generally rises and falls at the same syllables among NS, ESL, and EFL, regardless of differences in overall intensity. Target strong syllables are generally spoken with greater intensity and target weak syllables are often spoken with lower intensity. The rise and fall of intensity generally coincides with the alternation between strong and weak syllables for all groups.

5.2.1.1 Relative intensity patterns This section reports results based on relative intensity among syllables. The intensity data in dB were converted into ratios so that the syllable with the highest peak intensity in a sentence assumed the value of one and others were computed as its ratios. An average

167 group intensity ratio was obtained for each syllable of each sentence. The basic statistics, including means, standard deviations, and ranges, are summarized in Table 5.10.

Table 5.10 Group Average intensity ratios of individual syllables for Type B sentences

GrOll S lIable Bl I back by noon NS 0.93 0.90 0.87 ESL 0.89 0.90 0.91 EFL 0.82 0.88 0.92 B2 The and laugh NS 0.87 0.76 0.77 ESL 0.86 0.76 0.84 EFL 0.86 0.75 0.82 B3 We and write NS 0.66 0.73 ESL 0.91 0.78 0.87 EFL 0.90 0.80 0.90 B4 I at noon NS 0.89 0.78 0.90 ESL 0.87 0.81 0.88 EFL 0.84 0.81 0.86 BS Mom were mad at Jim NS 0.84 0.84 0.72 0.80 ESL 0.93 0.88 0.80 0.79 EFL 0.92 0.90 0.80 0.84 B6 at bake -ing bread NS 0.86 0.87 0.69 0.86 ESL 0.85 0.92 0.81 0.89 EFL 0.84 0.93 0.82 0.91 B7 ride to work at once NS 0.80 0.90 0.70 0.81 ESL 0.84 0.96 0.77 0.85 EFL 0.85 0.76 0.86 Total NS ESL EFL

Mean imensity ratios tend to be similar among NS, ESL, and EFL. Although the differences among the three are very small, ESL and EFL produce higher mean intensity ratios than do NS in most sentences. The mean intensity ratio for all Type B sentences is 0.87 for NS, 0.89 for ESL, and 0.89 for EFL. NS produce greater variation in intensity ratios across syllables of a sentence than ESL and EFL in all but one Type B sentence (except for Bl). The standard deviation of intensity ratios among syllables of a sentence 168 averages 0.08 for NS, 0.06 for ESL, and 0.06 for EFL. On average syllables vary within a narrower range of intensity ratios for ESL and EFL than for NS. The average range in intensity ratios for Type B sentences is 0.23 for NS, 0.17 for ESL, and 0.18 for EFL. The results show that ESL and EFL are almost indistinguishable in terms of the basic statistics of their intensity ratios for Type B sentences, and that on average, TM speakers produce syllables that are slightly louder and more equal in loudness than English speakers for most Type B sentences.

Figure 5.5 shows the relative intensity patterns of the individual Type B sentences in terms ofintensity ratios.

169 (a) Mean intensity ratios of syllables for sentence B1 (wSwSwS)

1.00 ./.~ 0.95 ~

. - ~ - _....-- - , _.41 0.90 ....- , r-- ...... >£ 0.85 ...... " :2 0.80 ~ • NS ~ 0.75 --ESL Ui c ...... EFL S .5 0.70

0.65

0.60

0.55

0.50 need it back by noon Syllable

(b) Mean intensity ratios of syllables for sentence B2 (wSwSwS)

1.00 -r------~------,

0.95 +------.IIF--~...... ,,___------____I

0.90 +------,9------'-='-,---;..--""-'

0.85 +---''''''------~k_------____I

:2 0.80 +------"~-____::irl';;;_~-'-..--_1 e • NS ~ 0.75 !------~~~:-:::~___1 --ESL Ui i ...... EFL :s 0.70 +------1

0.65 +------1

0.60 +------1

0.55 +------1

0.50 -!----..,..----.,..----.,..-----r------r------! They play with dad and laugh Syllable

170 (c) Mean intensity ratios of syllables for sentence B3 (wSwSwS)

1.00

0.95

0.90

0.85

:2 0.80 ! • NS ~ 0.75 --ESL iii c ...... EFL ~ 0.70

0.65

0.60

0.55

0.50 We leam to read and write Syllable

(d) Mean intensity ratios of syllables for sentence B4 (wSwSwS)

1.00 ,...------~------____.

0.90 +--_~'----'''----_o._____'=-...".'-----~_----__J ______f

0.85 +----"'------~i__-_____,"'"=--""'-_____f

:2 0.80 +------¥OAI-----_____f ! r:::••~NJSS~ ~ 0.75 +------_____f --ESL iii ii ...... EFL :5 0.70 +------_____f

0.65 +------_____f

0.60 +------_____f

0.55 +------_____f

0.50 +----.,.----.,.----.,..----,.-----,.------1 met with John at noon Syllable

171 (e) Mean intensity ratios of syllables for sentence B5 (SwSwSwS)

1.00,..------:------,

0.95 -t----;l!~~,__--~~~__ii;;;;:.._____------_,___----1

0.90 -t----~~r_---~------"'...... _=:_,

:2 0.80 -t------~.___IlI_r_,;;;;=___---1 l ~••~NJSsl ~ 0.75 -t------'~_____,,~-----1 --ESL ii c ...... EFL ~ 0.70 -t------1

0.65 +------1

0.60 +------1

0.55 +------1

0.50 -!-----,----r---....,..-----,---...,....----,------! Mom and 'dad were mad at Jim Syllable

(t) Mean intensity ratios of syllables for sentence B6 (SwSwSwS)

1.00

0.95

0.90

0.85

:2 0.80 ...... i! • NS ~ 0.75 --ESL ii c ...... EFL ~ 0.70

0.65

0.60

0.55

0.50 John is good at bake ing bread Syllable

172 (g) Mean intensity ratios of syllables for sentence B7 (wSwSwSwS)

1.00.....------...,

0.95 -1------::;;~~~~It.,__---4\-----__1

0.90 -1--~~__-...... ,...... --~._T-____...._ _+_------1

0.85 -1-__-...------~eWf____:.~-~i----___1I11_--1

:2 0.80 -1------i-----1r-".,-----F-~IIL---1 ! • NS ~ 0.75 -1------+&--1-----1 --ESL iii ...... EFL ! 0.70 -1------~__-----1

0.65 -{------1

0.60 -1------1

0.55 -{------1

0.50 +-----,r-----,.---.---.....---,-----,.----.----l need a ride to work at once Syllable

Figure 5.5 Mean syllable intensity ratios for sentences B1 through B7

Three observations can be made based on the results summarized in Table5.10 and Figure 5.5. First, the intensity ratios generally rise and fall at similar syllables among NS, ESL, and EFL, with intensity rising at target strong syllables and declining at target weak syllables. There are a few occasions, however, where target strong syllables are spoken with lowered intensity, target weak syllables are spoken with raised intensity, or where intensity ratios are indistinguishable between target strong and weak syllables. For example, in sentence B1 intensity is almost flat over the first three syllables for NS, while in B5 ESL and EFL lower intensity on the target strong syllable mad. And in sentence B7 intensity rises at the target weak syllable a for all three speaker groups.

Second, in most sentences the intensity ratios of NS start higher and finish lower than those ofESL and EFL. A general declination underlies fluctuations of intensity level

173 according to stress. For example, we can see a clear down sloping zigzag intensity pattern in sentences B3, B5, and B6. However, for ESL and EFL speakers, intensity often does not decline as rapidly for NS over the course of a sentence.

Third, the intensity data from sentences that feature iambic and trochaic rhythms show consistent patterns. Intensity tends to rise at strong syllables and fall at weak syllables regardless of rhythmic pattern. As NS, ESL, and EFL often raise and lower intensity on the same syllables, the differences among them for Type B sentences seem to weigh on the magnitude in which intensity varies with stress rather than the placement of stress.

5.2.2 SIGNIFICANT DIFFERENCES BETWEENNS, ESLAND EFL

Student's t-tests were performed on individual syllables to determine whether or not the obtained differences in intensity ratios between pairs of groups were statistically significant or likely by chance. Results of the t-test scores for the intensity ratios of individual syllables between groups are shown in Table 5.12.

174 Table 5.12 Student's t-test scores for intensity ratios of individual syllables between groups for Type B sentences

Group t Bl it back by NS-ESL 0.113 1.313 0.094 NS-EFL 1.674 0.600 0.380 ESL-EFL 1.907 0.666 0.658 B2 with dad and NS-ESL 0.743 0.622 0.189 NS-EFL 1.101 1.501 0.102 ESL-EFL 0.482 0.394 0.332 B3 read and NS-ESL NS-EFL ESL-EFL B4 noon NS-ESL 0.522 NS-EFL 1.486 ESL-EFL 0.764 B5 at Jim NS-ESL 0.440 NS-EFL 1.041 ESL-EFL 1.667 1.358 B6 John bread NS-ESL 0.427 0.750 NS-EFL 0.902 1.528 ESL-EFL 0.422 1.054 B7 I at once NS-ESL 0.635 1.682 1.388 NS-EFL 1.468 1.347 1.937 ESL-EFL 0.908 0.755 1.653 1.289 0.503 0.660 *p<.05 (tcrit=2.101), **p<.Ol(tcrit=2.878), df=18, two-tailed

In the following two subsections, we will consider which syllables are found to be significantly different between groups in tenns of intensity ratios. Section 5.2.3.1 reports and compares differences between NS and ESL and differences between NS and EFL. Results will provide crucial information about the kinds of difficulties TM speakers face with intensity as a correlate of stress, as well as the ways in which ESL and EFL are similar or different with respect to their difficulties with intensity. Section 5.2.3.2 examines significant differences between ESL and EFL. Results in this section will

175 provide infonnation about the changes that might have taken place between EFL and ESL in the way intensity correlates with stress, if there are any changes.

5.2.2.1 Differences between NS and ESL vs. differences between NS and EFL A larger number of syllables is found to be significantly different between NS and EFL than between NS and ESL. 11 syllables are found to be significantly different between NS and ESL and 15 syllables are found to be significantly different between NS and EFL out of all 46 syllables compared.

Nonetheless, not all of the differences between TM and English speakers are disruptive to the speech rhythm of TM speakers. Softer strong syllables and louder weak syllables produced by TM speakers are differences that weaken their speech rhythm because they are likely to make the strong and weak syllables less distinct from each other. Louder strong syllables and softer weak syllables may not be as damaging as far as rhythm is concerned because of the greater contrast between strong and weak syllables. With this distinction in mind, we reconsider the differences between TM and English speakers. As a result, only four of the 11 significant differences between NS and ESL and seven of the 15 significant differences between NS and EFL constitute differences that dilute the contrast between strong and weak syllables. The majority of these differences comprise weak syllables produced by both ESL and EFL speakers and a smaller number of softer strong syllables produced by EFL speakers.

Table 5.11 Number of syllables with intensity ratios significantly different from NS in categories of strong and weak in non-final and final positions for Type B sentences

Position Non-Final Final Stress Strong Weak Strong Weak Total (k=17) (k=22) (k=7) (k=O) differ T e louder softer louder softer louder Softer Louder softer ESL 4 1 2 11 EFL 4 3 1 15

Differences between NS and ESL and differences between NS and EFL show very similar patterns. Both ESL and EFL produce a number of louder strong and weak 176 syllables in non-final position, softer weak syllables in non-final position, and louder strong syllables in final position. In terms of type and number of differences between ESL and EFL, we found that EFL speakers differ from ESL speakers in two aspects. First, EFL produce a small number of softer strong syllables in non-final position whereas ESL speakers do not produce any. Second, EFL produce a slightly larger number of softer weak syllables in non-final position than ESL.

Overall, it seems that ESL and EFL speakers are more prone to producing relatively louder syllables than to producing relatively softer syllables than NS, regardless of stress or position of the syllable. Among the 11 syllables with intensity ratios significantly different between NS and ESL, 10 are louder syllables produced by ESL. And among the 15 syllables with intensity ratios significantly different between NS and EFL, 10 are louder syllables produced by EFL.

In summary, ESL and EFL are similar in terms of the types of differences found between them and NS. Most of the differences between TM and English speakers are the relatively louder syllables produced by TM speakers. Problems with intensity ratios arise from a small number of softer strong syllables produced by EFL and louder weak syllables produced by both ESL and EFL.

5.2.3 SIGNIFICANT DIFFERENCES BETWEEN ESLAND EFL

This section reports significant differences in intensity ratios between ESL and EFL for Type B sentences. Syllables that are found to be significantly different between the two groups are further examined under three categories: strong syllables in non-final position, weak syllables in non-final position, and strong syllables in final position.

177 Table 5.12 Number of syllables with intensity ratios significantly different between EFL and ESL in categories of strong and weak in non-final and final positions for Type B sentences

Position Non-Final Final Total Stress Strong Weak Strong Weak (k=17) (k=22) (k=7) (k=O) Type louder softer lou1er ~ofter louder softer louder softer EFLvs.ESL t - - 2 ESUNSonly - - EFUNSonly 1 - - 1 EFUESUNS 1 -- 1 ..ESUEFL only --

Very few significant differences in intensity ratios can be established between ESL and EFL. Out of 46 syllables compared, the intensity ratios of only two syllables are found to be significantly different between ESL and EFL. These two syllables include one relatively louder weak syllable and one relatively softer weak syllable in non-final position produced by EFL. The relatively louder w~ syllable indicates a difficulty common to both ESL and EFL, Le., both ESL and EFL are significantly louder than NS on this syllable. The relatively softer weak syllable indicates a difficulty unique to EFL, Le., EFL alone are softer than NS on this syllable. Results suggest that ESL and EFL are highly similar in terms of their intensity patterns for Type B sentences. Although the number of differences between ESL and EFL is very small, the results are consistent with earlier findings for Type A sentences that none of the differences between ESL and EFL indicate a difficulty unique to ESL.

5.2.4 CORREIATION OF INTENSITY RATIOS BETWEEN SPEAKER GROUPS

Pearson Product-moment correlation coefficients were obtained to estimate how closely intensity ratios within the same sentence covary together from syllable to syllable between pairs of subject groups. Correlation tests were performed between NS and ESL, NS and EFL, and ESL and EFL respectively for all seven type B sentences. Complete results of the correlation tests are summarized in Table 5.13.

178 Table 5.13 Pearson Product-moment Correlation Coefficients for mean intensity ratios between pairs of groups for Type B sentences

Sentence

Bl B2 B3 B4 B5 B6 B7 *p<.05, **p<.Ol

The correlation of intensity ratios is consistently stronger between NS and ESL than between NS and EFL in all Type B sentences. Significant correlation coefficients are established in a greater number of sentences between NS and ESL than between NS and EFL. Correlation coefficients between NS and ESL are found to be significant in all but one sentence, while correlation coefficients between NS and EFL are found to be significant in only three sentences. The intensity ratios of ESL and EFL correlate positively and strongly. Significant correlations are established between ESL and EFL in six out of seven Type B sentences at p

The results are consistent with findings for Type A sentences that ESL and EFL are quite similar in their intensity patterns. Their intensity ratios vary more similarly to each other than to those of English speakers. The results also suggest that intensity tends to vary together more closely between NS and ESL than between NS and EFL.

5.2.4.1 Test Reliability This section summarizes the test-retest reliability between two administrations of each of the Type B sentences. Three reliability indices were obtained per sentence per group. High test-retest reliability indicates that speakers are consistent with their intensity patterns.

179 Table 5.14 Test-retest reliability for intensity ratios from three productions ofType B sentences

Sentence Administration Test-retest r~ NS ESL EFL B1 First & Second 0.9345** First & Third 0.8557* 0.8782* Second & Third 0.8944* ~0.9388** B2 First & Second 0.9628** 0.9661** 0.9547** First & Third 0.9533** 0.9427** 0.9430** Second & Third 0.9921** 0.9877** 0.9942** B3 First & Second 0.9473** 0.9553** 0.9279** First & Third 0.9890** 0.9038** 0.9267** Second & Third 0.9810** 0.9238** 0.9015* B4 First & Second 0.9882** 0.9580** 0.9786** First & Third 0.9491** 0.9180** 0.9778** Second & Third 0.9332** 0.9693** 0.9409** B5 First & Second 0.9695** 0.9682** 0.9513** First & Third 0.9893** 0.9449** 0.9148** Second & Third 0.9883** 0.9905** 0.9557** B6 First & Second 0.9166** 0.9724** 0.9414** First & Third 0.9613** 0.9824** 0.9026** Second & Third 0.9767** 0.9588** 0.9380** B7 First & Second 0.9752** 0.9600** 0.9738** First & Third 0.9673** 0.9735** 0.9169** Second & Third 0.9892** 0.9785** 0.9125** *p<.05, **p<.Ol

Results of the test-retest reliability tests indicate that speakers of all groups are highly consistent with their intensity patterns in three productions of Type B sentences. Among 63 Pearson Product-moment Correlation coefficients obtained, all except two correlation coefficients, one from ESL and the other from EFL, fall below significance at p

5.2.5 SUMMARY OF INTENSITYRESULTS FOR TYPE B SENTENCES

On average, TM speakers produce syllables that are slightly louder and. less variable in loudness than English speakers for most Type B sentences. ESL and EFL often produce higher mean, smaller standard deviation, and narrower range of intensity ratios than NS. ESL and EFL are almost indistinguishable in terms of basic statistics of their intensity ratios for Type B sentences.

180 The Mean intensity ratio averages higher for ESL and EFL than for NS. Variation in intensity ratios averages smaller for ESL and EFL than for NS. Syllables vary within a narrower range of intensity ratios for ESL and EFL than for NS. ESL and EFL produce higher average intensity ratio, smaller standard deviation and smaller range of intensity ratios than do NS. ESL and EFL, however, are almost indistinguishable based on basic statistics of their intensity ratios in Type B sentences.

The intensity patterns of NS, ESL and EFL are generally consistent with the target stress patterns. Their intensity generally rises at target strong syllables and declines at target weak syllables. Although the mean intensity ratios of NS, ESL, and EFL appear quite similar in average, NS start out relatively louder and finish relatively softer in a number of sentences as compared with ESL and EFL. The intensity ratios of ESL and EFL sometimes do not decline as much toward the end of a sentence as NS. As a result, the differences between TM and English speakers in Type B sentences appear to weigh more on the magnitude in which intensity varies with stress than the placement of stress.

A slightly larger number of syllables are found to be significantly different between NS and EFL than between NS and ESL in terms of intensity ratios. The differences between NS and ESL and between NS and EFL are slightly different in number and in type. When considering differences that weaken the intensity contrast between strong and weak syllables, only a small number of softer strong syllables produced by EFL and a small number of louder weak syllables produced by ESL and EFL are found. Most of the differences between TM and English speakers, however, come from louder syllables produced by ESL and EFL. It seems that TM speakers are more likely to produce more relatively loud syllables than to produce more relatively soft syllables than English speakers, regardless of stress or position of the syllable.

The correlation of intensity ratios is the strongest between ESL and EFL and stronger between NS and ESL than between NS and EFL in most Type B sentences. Speakers of all groups are highly consistent with their intensity patterns in Type B sentences. All but two correlation coefficients among 63 pairs of sentence productions compared fall below significance at p

5.3.1 DO TM SPEAKERS HAVE DIFFICULTYWITH THE INTENSITY OF STRONG SYLLABLES, THE INTENSITY OF WEAK SYLLABLES, OR BOTH?

We fIrst look for significant differences between NS and ESL and differences between NS and EFL to see how these differences distribute in strong and weak syllables. Then we focus on the differences that weaken the contrast between strong and weak syllables, namely, softer strong .syllables and louder weak syllables. Louder strong syllables and softer weak syllables produced by 1M speakers are not considered a diffIculty because they exaggerate the target rhythmic pattern by broadening the contrast between target strong and weak syllables.

Table 5.15 Number of strong and weak syllables with intensity ratios signifIcantly different from NS in non-fInal vs. fInal positions for Type A and Type B sentences

Position Non-Final Final Stress Weak Strong Weak (k =31, =22) (k =2, =7) (k =5, =0) T louder softer louder Softer Louder softer A 1 1 B 4 1 2 4 3 1

Both ESL and EFL show difficulty in raising intensity on strong syllables and lowering intensity on weak syllables. Lowering intensity on weak syllables appears more difficult for 1M speakers than raising intensity on strong syllables. ESL and EFL produce a larger number of louder weak syllables than softer strong syllables in both Type A and Type B sentences. ESL speakers, who produce softer strong syllables in Type A sentences, do not produce any such syllables in Type B sentences. The problem with lowering intensity on weak syllables is particularly pronounced in Type A sentences with the long stretches of weak syllables, where 22 syllables produced by ESL and 26 syllables produced by EFL are signifIcantly louder than those of NS. 1M speakers' difficulty with the intensity of weak syllables is drastically reduced in Type B sentences.

182 Only four syllables produced by ESL and five syllables produced by EFL are significantly louder than those of NS. In addition, ESL speakers show slightly less difficulty with intensity of weak syllables than EFL. They produce a smaller number of louder weak syllables in non-final and final position for both Type A and Type B sentences than EFL.

Because Type A sentences contains two types weak syllables (those that have some stress and those that have no stress) and Type B sentences consist of only weak syllables that are totally unstressed, I wondered whether the weakly stressed syllables in Type A sentences contribute to the relatively greater difficulties that TM speakers have with the intensity of syllables for Type A than for Type B sentences. Table 5.16 shows the number of strong (stressed and accented), weakly stressed (stressed and unaccented), and unstressed syllables with intensity ratios significantly different from those ofNS for Type A sentences.

Table 5.16 Number of strong, weakly stressed, and unstressed syllables with intensity ratios significantly different from NS in non-final vs. final position for Type A sentences

Position Final Stress Strong Unstressed Strong Weak Stressed Unstressed (k=8) (k=2) (k=2) (k=3) T e louder softer louder softer louder softer louder softer ESLvs. NS 2 1 EFLvs.NS 1 1 3

TM speakers had great difficulties with softening both weakly stressed (8/12 for ESL and 10/12 for EFL) and unstressed syllables (14/19 for ESL and 16/19 for EFL) for Type A sentences. This suggests that TM speakers' difficulties with softening weakly stressed syllables is only one of the reasons why Type A sentences are more difficult for them than Type B sentences. The fact that TM speakers have greater difficulties reducing unstressed syllables for Type A than for Type B sentences suggests that the prosodic differences between these two types of sentences (Le. Type A sentences consist of stretches ofrelatively weaker syllables) may have contributed to this.

183 Significant differences with final syllables between TM and English speakers always involve louder syllables produced by TM speakers, whether the syllable is targeted as strong or weak. EFL are more likely to produce louder syllables in final position than ESL. And unlike ESL, EFL speakers sometimes produce louder weak syllables than NS in final position.

Table 5.17 Group average intensity ratios of strong and weak syllables in non-final and final positions ofType A and Type B sentences

Position Non-Final Final Stress Stron Weak Stron Weak GrOll NS ESL EFL NS ESL EFL NS ESL EFL NS ESL EFL A Al 1.00 0.92 0.90 0.78 0.89 0.89 0.57 0.64 0.76 A2 1.00 0.99 0.99 0.72 0.86 0.90 0.60 0.65 0.82 A3 0.97 0.91 0.96 0.81 0.88 0.91 0.79 0.82 0.86 A4 0.97 0.91 0.92 0.74 0.87 0.90 0.74 0.85 0.88 AS 0.97 0.97 0.96 0.79 0.88 0.92 0.83 . 0.88 0.90 A6 1.00 0.99 1.00 0.72 0.85 0.89 0.64 0.74 0.86 A7 0.95 0.94 0.93 0.81 0.88 0.90 0.84 0.83 0.89

B Bl 0.95 0.95 0.92 0.92 0.91 0.86 0.87 0.91 0.92 B2 0.96 0.95 0.94 0.85 0.84 0.83 0.77 0.84 0.82 B3 0.88 0.95 0.96 0.80 0.85 0.87 0.73 0.87 0.90 B4 0.95 0.96 0.96 0.85 0.87 0.85 0.90 0.88 0.86 B5 0.93 0.94 0.94 0.83 0.87 0.87 0.80 0.79 0.84 B6 0.93 0.95 0.95 0.81 0.84 0.83 0.86 0.89 0.91 B7 0.93 0.95 0.94 0.83 0.85 0.84 0.81 0.85 0.86

Results from the average intensity ratios of strong and weak syllables are consistent with four earlier findings. First, ESL and EFL tend to produce louder final syllables than NS, regardless of stress. Second, ESL and EFL tend to produce louder weak syllables than NS. The difference in intensity of weak syllables between TM and English speakers is much greater in Type A sentences than in Type B sentences. Third, ESL and EFL tend to produce softer strong syllables in Type A sentences but often produce slightly louder strong syllables in Type B sentences. Fourth, TM speakers have greater difficulty with the intensity of weak syllables.

184 Table 5.18 shows the average intensity ratios of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences.

Table 5.18 Group average intensity ratios of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences

Position Non-final Final Stress Strong Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) NS 0.97 0.85 0.72 0.76 0.84 0.60 ESL 0.84 0.83 0.85 0.68 EFL 0.87 0.87 0.89 0.81

The results are consistent with earlier findings that for Type A sentences, TM speakers have difficulty strengthening strong syllables and weakening weak syllables in non-final position, including those that are weakly stressed and those that are unstressed. In final position, TM speakers produced relatively greater intensity on all three types of syllables than English speakers did.

In sum, TM speakers have difficulties softening weak syllables and strengthening strong syllables. ESL speakers show only slight trouble with the strengthening of streng syllables. Overall, softening weak syllables appears to be more difficult for TM speakers than strengthening strong syllables. Both ESL and EFL speakers have great difficulties softening both weakly stressed and unstressed syllables in non-final position for Type A sentences. Type A sentences pose greater challenge to TM speakers than Type B sentences. ESL and EFL produce a much larger number of louder weak syllables than NS in Type A than in Type B sentences. ESL speakers show somewhat less difficulty with intensity of strong and weak syllables than EFL. They produce fewer louder weak syllables in Type A and Type B sentences, they also produce fewer softer strong syllables in Type B sentences, compared with EFL.

185 5.3.2 DO TM SPEAKERS USE INTENSITYASA CORRELATE OF STRESS?

Although all the test sentences are framed within a context to induce certain stress patterns, there is no guarantee that speakers will produce the stress patterns exactly as expected. Without assuming that TM speakers intend to produce the same stress patters as English speakers, I examine the extent to which syllable intensity varies with the target stress patterns in Type A and Type B sentences. As a frrst step, I trace the rise and fall of intensity across each sentence and summarize the results in Table5.19. The intensity of a syllable is considered "+IRPS" (Intensity Relative to Preceding Syllable) when (1) it is higher in dB than its preceding syllable or (2) it is higher than its immediately following syllable if the syllable in question is the initial syllable of the sentence. When the intensity of a syllable is the same as its preceding syllable, its intensity is considered "+IRPS" if the intensity of the preceding syllable is "+IRPS". Ifthe preceding syllable is the initial syllable of the sentence, its intensity is considered "+IRPS" if it is higher than the next syllable with a different intensity.

Table 5.19 Number of strong and weak syllables classified as "+IRPS" or "-IRPS" in non-final and final position for Type A and Type B sentences

Position Stress Stron Intensi + A NS 26 3 10 29 ESL 21 4 6 25 EFL 20 4 7 24 B NS 19 19 19 ESL 20 24 20 EFL 21 22 21

186 30.,------,

25 +------

CIl 20 +------j ~ IINS "0 15 +------mESL j IJEFL E ::I Z 10 +------

5

o Type A Strong +IRPS Type A Weak-IRPS Type B Strong +IRPS Type B Weak-IRPS Non-final

Figure 5.6 Number of strong and weak syllables classified as "+IRPS" or "-IRPS" for Type A and Type B sentences

As can be seen in Table 5.19 and Figure 5.6, intensity generally rises at strong syllables and falls at weak syllables for speakers of all groups in both Type A and Type B sentences. Nonetheless, intensity seems to vary with stress more closely in Type B sentences than in Type A sentences for TM speakers. In fact, the correlation between intensity and stress for TM speakers is so strong in Type B sentences that they actually raise intensity on strong syllables and lower intensity on weak syllables even more consistently than English speakers. In Type A sentences, however, TM speakers do not raise intensity on strong syllables or lower intensity on weak syllables as consistently as English speakers.

The results show that TM speakers seem to vary intensity with the target stress patterns very closely in one type of sentences but not in the other. If TM speakers had been able to use information from the lead-in sentences to determine which syllables to

187 stress in the target sentences, this would not have happened. This suggests that their stress patterns may be guided by principles that are somewhat different from those of English speakers and that their performance may reflect how closely their guiding principles match the target stress patterns. When their guiding principles generate the same distribution of stress as English speakers', TM speakers may produce native-like intensity patterns, but they may experience difficulty when their guiding principles generate a somewhat different distribution of stress. The key question is what principles TM speakers subscribe to in determining the distribution of stress. Or more specifically, we want to know what makes the stress patterns of Type B sentences easier for TM speakers to produce than those of Type A sentences.

To answer this question, I take one step further by examining the relationships between intensity and stress levels. The results are summarized in Table 5.20.

Table 5.20 Number of "+IRPS" vs "-IRPS" strong, weakly stressed, vs. unstressed syllables in non-final vs. final position for Type A sentences

Position Non-fInal Final Stress Strong Weak: Stressed Unstressed Strong Weak: Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) Type +IRPS -IRPS +IRPS -IRPS +IRPS -IRPS +IRPS -IRPS +IRPS -IRPS +IRPS -IRPS NS 8 3 9 2 17 2 1 1 1 2 ESL 5 3 9 3 1 18 1 1 1 1 3 EFL 6 2 7 5 4 15 1 1 2 3

As shown in Table 5.20, TM speakers are more likely to produce -IRPS strong syllables and +IRPS weakly stressed syllables than English speakers. EFL speakers are more likely to produce +IRPS unstressed syllables than English speakers.

188 Table 5.21 Average intensity ratios of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A and Type B sentences

Position Non-fmal Final Stress Strong Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) NS 0.97 0.85 0.72 0.76 0.84 0.60 A ESL 0.95 0.92 0.84 0.83 0.85 0.68 EFL 0.87 0.87 0.89 0.81 NS 0.93 - 0.83 0.82 - - B ESL -0.95 - 0.85 0.86 - - EFL 0.95 - 0.85 0.87 - -

Similar to NS, ESL speakers are capable of using intensity to distinguish at least three stress levels (strong, weakly stressed, and unstressed), although the duration differentiation between these three types of syllables is not as distinct as it is for NS. Instead of making a three-way distinction (strong, weakly stressed, and unstressed), EFL speakers, seem to make a binary distinction (stressed vs. unstressed). They do not produce a clear intensity distinction between strong syllables and weakly stressed syllables. This suggests that EFL speakers may not distinguish as many levels of stress as English speakers do.

5.3.3 DO TM SPEAKERS PRODUCE SMALLER INTENSITY CONTRASTS BETWEEN STRONG AND WEAK SYllABLES THAN ENGliSH SPEAKERS?

In addition to examining the difficulty TM speakers may have with the placement of intensity of strong and/or weak syllables respectively, it is equally important to look at the contrast in intensity between strong and weak syllables. The differences found with strong and weak syllables between TM and English speakers may not be disruptive to the speech rhythm if TM speakers maintain similar intensity contrasts between strong and weak syllables as do English speakers. In this section, we focus on the examination of intensity contrasts between strong and weak syllables in non-final and final positions for Type A sentences and in non-final positions for Type B sentences. Only non-final

189 syllables will be discussed for Type B sentences because all final syllables are strong. Intensity contrasts between strong and weak syllables in non-final and final positions will be discussed separately because according to our earlier findings they seem to behave differently.

I focus first on the intensity contrasts between target strong and weak syllables.

Table 5.22 Intensity contrasts (ratio) between strong and weak syllables in non-final positions for Type A sentences

Position Non-Final GrOll NS ESL EFL Stress S w S w S w Al 1.00 0.78 0.92 0.89 0.90 0.89 A2 1.00 0.72 0.99 0.86 0.99 0.90 A3 0.97 0.81 0.91. 0.88 0.96 0.91 A4 0.97 0.74 0.91 0.87 0.92 0.90 AS 0.97 0.79 0.97 0.88 0.96 0.92 A6 1.00 0.72 0.99 0.85 1.00 0.89 A7 0.95 0.81 0.94 0.88 0.93 0.90 Mean 0.98 0.77 0.95 0.87 0.95 0.90 NS-NNS 0.03 -0.10 0.03 -0.13

190 1.00 ~ 0.95 --~...... A 0.90 - ...... - 0.85 ..

:2 0.80 ! " • NS ~ 0.75 "- --ESL ~ " .. "It. .. EFL S .5 0.70

0.65

0.60

0.55

0.50 Stressed Non-final Unstressed Non-final Type A Sentences

Figure 5.7 Group average intensity ratios of strong vs. weak syllables in non-final position for Type A sentences

Comparing the S and U columns for each group, TM speakers appear to produce smaller intensity contrasts between strong and weak syllables in non-final positions than English speakers in Type A sentences. Lower intensity ratios on strong syllables and higher intensity ratios on weak syllables contribute to the relatively smaller intensity contrast in the speech of ESL aJ?-d EFL speakers. Compared with ESL, EFL produce slightly smaller intensity contrasts between strong and weak syllables in non-final positions for Type A sentences due to relatively higher intensity ratios on weak syllables. Average difference in intensity ratios between strong and weak syllables in non-final positions for Type A sentences is 0.21 for NS, 0.08 for ESL, and 0.05 for EFL.

191 Table 5.23 Intensity contrasts (ratio) between strong and weak syllables in final positions for Type A sentences

Position Final GrOll NS ESL EFL Stress S W S W S W Al 0.57 0.64 0.76 A2 0.60 0.65 0.82 A3 0.79 0.82 0.86 A4 0.74 0.85 0.88 AS 0.83 0.88 0.90 A6 0.64 0.74 0.86 A7 0.84 0.83 0.89 Mean 0.76 0.70 0.83 0.75 0.87 0.85

1.00

0.95

0.90

A ..... _ ...... as .. _ 0.85 ...,.A ...... 0 80 1. - - ...... • NS ~ 0.75 ...... _-ESL III c - ...... EFL S .5 0.70 ------0.65

0.60

0.55

0.50 Strong Final Weak Final Type A Sentences

Figure 5.8 Group average intensity ratios of strong vs. weak syllables in final position for Type A sentences

The picture looks very different when it comes to syllables in final position. In Type A sentences, NS and ESL produce highly similar intensity contrasts between strong and weak syllables although ESL speakers produce higher intensity ratios on strong and 192 weak syllables than NS. For both NS and ESL, their strong syllables appear higher in intensity ratios than the weak ones. EFL produce little intensity contrast between strong and weak syllables in final position. Their intensity ratios remain relatively high for both strong and weak syllables.

Aside from comparing the intensities of strong vs. weak syllables, I wondered how alike the intensities of NS, ESL, and EFL speakers would be, if I compared only the strong vs. unstressed syllables for Type A sentences. Table 5.24 presents the intensity contrasts between strong and unstressed syllables for Type A sentences.

Table 5.24 Intensity contrasts (ratios) between strong and unstressed syllables in non­ final and final position for Type A sentences

Position Non-final Final Stress Stron Unstressed Unstressed NS 0.97 0.72 0.60 ESL 0.95 0.84 0.68 EFL 0.95 0.87 0.81

Alternatively, I compared all stressed syllables (primary or not) vs. all totally unstressed syllables.

Table 5.25 Intensity contrasts (ratios) between stressed and unstressed syllables in non­ final and final position for Type A sentences

Position Non-fmal Final Stress Stressed Unstressed Stressed Unstressed NS 0.90 0.72 0.80 0.60 ESL 0.93 0.84 0.84 0.68 EFL 0.95 0.87 0.88 0.81

The results show that in all three types of comparisons (strong vs. weak, strong vs. unstressed, and stressed vs. unstressed), NS consistently produced grater intensity contrasts than ESL, who in tum, produced greater intensity contrasts than EFL for Type A sentences. However, NS and TM speakers are least alike when comparing the intensities of strong vs. unstressed syllables.

193 Table 5.26 Intensity contrasts (ratio) between strong and weak syllables in non-final positions for Type B sentences

Position Non-flnal GrOll NS ESL EFL Stress S w S w S w Al 0.95 0.92 0.95 0.91 0.92 0.86 A2 0.96 0.85 0.95 0.84 0.94 0.83 A3 0.88 0.80 0.95 0.85 0.96 0.87 A4 0.95 0.85 0.96 0.87 0.96 0.85 AS 0.93 0.83 0.94 0.87 0.94 0.87 A6 0.93 0.81 0.95 0.84 0.95 0.83 A7 0.93 0.83 0.95 0.85 0.94 0.84 Mean 0.93 0.84 0.95 0.86 0.94 0.85 NS-NNS -0.02 -0.02 -0.01 -0.01

1.00

0.95 ... ":::'- ~.. "':::"-...... 0.90 .. ~.... ~ ':II 0.85 ~

:2 0.80 ! • NS ~ 0.75 --ESL ~ - ... EFL J!l .5 0.70

0.65

0.60

0.55

0.50 Stressed Non-final Unstressed Non-final Type B Sentences

Figure 5.9 Group average intensity ratios of strong vs. weak syllables in non-final position for Type B sentences

In contrast to Type A sentences, TM and English speakers produce very similar intensity contrasts between strong and weak syllables with similar intensity ratios on strong and weak syllables for Type B sentences.

194 In summary, the intensity contrasts between target strong and weak syllables show conflicting results between Type A and Type B sentences. TM speakers produce smaller intensity contrasts between target strong and weak syllables in non-final positions than NS for Type A sentences, while producing similar intensity contrasts with NS for Type B sentences. When intensity contrasts are measured as the difference in intensity ratios between strong and weak syllables or between stressed and unstressed syllables, intensity contrasts remain larger for English speakers than for TM speakers for Type A sentences. The smaller intensity contrasts among TM speakers have less to do with lower intensity ratios on strong syllables than with higher intensity ratios on weak syllables. ESL speakers produce slightly greater intensity contrasts between strong and weak syllables in both non-final and final positions than EFL speakers for Type A sentences. ESL and EFL differ little with respect to the intensity ratios of strong syllables, but EFL tend to produce louder weak syllables than ESL. In final positions, intensity contrasts between strong and weak syllables appear highly similar for NS and ESL although ESL speakers produce higher intensity ratios on both strong and weak syllables than NS. EFL produce smaller intensity contrasts between strong and weak syllables in final position. The results suggest that TM speakers are able to produce native-like intensity contrasts between strong and weak syllables in sentences that feature a regular alternation between stressed and unstressed syllables and when stress patterns and lexical categories of words are perfectly aligned. However, they produce smaller intensity contrasts between strong and weak syllables in sentences that feature long stretches of weak syllables and where stress patterns and lexical categories are not aligned. That fact that TM speakers do not lower intensity on weak syllables as much as English speakers largely contributes to the smaller intensity contrasts in this type of sentence.

5.3.4 DO TM SPEAKERS WITH IMPROVED PROFICIENCYAND INCREASED EXPOSURE TO ENGliSH PRODUCE INTENSITY PATTERNS THATARE CLOSER TO THE TARGET SPEECH RHYTHM?

There are several indications from the results that ESL produce intensity patterns that·are closer to target than the less proficient EFL.

195 Table 5.27 A comparison of intensity results between ESL and EFL

Mean Intesity Standard Range of No.ofSyl. Correlation Contrast between Deviation of Intensity Different Coefficients Stron &Weak Intensi ratios Ratios fromNS withNS Grou NS ESL EFL NS ESL EFL NS ESL EFL ESL EFL ESL EFL Al 0.22 0.Q3 0.01 0.18 0.14 0.09 0.43 0.36 0.23 4 5 0.89 0.85 A2 0.28 0.13 0.09 0.17 0.13 0.06 0.40 0.34 0.17 3 4 0.83 0.96 A3 0.16 0.03 0.05 0.10 0.06 0.05 0.27 0.15 0.10 2 4 0.83 0.78 A4 0.23 0.04 0.02 0.14 0.06 0.05 0.35 0.17 0.15 6 5 0.88 0.78 AS 0.18 0.09 0.04 0.10 0.05 0.Q3 0.26 0.14 0.08 4 4 0.90 0.44 A6 0.28 0.14 0.11 0.17 0.09 0.05 0.46 0.25 0.15 3 5 0.78 0.76 A7 0.14 0.06 0.Q3 0.09 0.05 0.04 0.25 0.13 0.11 3 4 0.82 0.80

Bl 0.03 0.04 0.06 0.Q3 0.03 0.05 0.09 0.09 0.15 0 1 0.65 0.14 B2 0.11 0.11 0.11 0.09 0.08 0.08 0.22 0.22 0.23 1 0 0.95 0.95 B3 0.08 0.10 0.09 0.12 0.06 0.06 0.31 0.18 0.17 4 5 0.84 0.67 B4 0.10 0.09 0.11 0.06 0.06 0.06 0.18 0.16 0.17 0 1 0.86 0.77 B5 0.10 0.07 0.07 0.09 0.07 0.06 0.26 0.18 0.19 3 4 0.87 0.85 B6 0.12 0.11 0.12 0.09 0.06 0.06 0.27 0.15 0.15 1 1 0.96 0.74 B7 0.10 0.10 0.10 0.09 0.08 0.08 0.26 0.21 0.21 2 3 0.91 0.80

First of all, although it is difficult to tell ESL and EFL apart based on the basic statistics of the intensity results in Type B sentences, the results across Type A sentences consistently show that ESL speakers perform better than EFL with respect to intensity. While TM speakers tend to produce smaller intensity contrasts, lower standard deviations and lower ranges of intensity ratios than NS in Type A sentences, ESL almost always produce means, standard deviations, and ranges of intensity ratios that are more similar to those of NS than EFL do. In average EFL speakers produce the highest mean intensity ratios, the smallest variations in intensity ratios, and the narrowest ranges of intensity ratios compared with NS and ESL speakers.

Second, despite similarities in the types of significant differences between NS and ESL and between NS and EFL within each type of sentences, differences between NS and ESL are fewer than differences between NS and EFL. A total of 25 syllables are found to be significantly different between NS and ESL as opposed to 31 syllables between NS and EFL in Type A sentences. A total of 11 syllables are found to be significantly different between NS and ESL as opposed to 15 syllables between NS and EFL in Type B sentences. Again, the strength ofESL over EFL is more salient in Type A sentences. 196 Third, the correlation coefficients of syllable-to-syllable intensity ratios across sentences are consistently stronger between NS and ESL than between NS and EFL for both Type A and Type B sentences. Additionally, a larger number of significant correlations are found between NS and ESL than between NS and EFL. Significant correlation is found in six sentences between NS and ESL and in five sentences between NS and EFL for Type A sentences. Correlation coefficients are found to be significant in six sentences between NS and ESL, but in only three sentences between NS and EFL for Type B sentences.

Fourth, although ESL and EFL produce similar intensity contrasts between strong and weak syllables for Type B sentences, ESL speakers produce slightly greater intensity contrasts than EFL in both non-final and final positions for Type A sentences. In final positions, ESL speakers produce similar intensity contrasts as English speakers, whereas EFL speakers produce small intensity contrasts between strong and weak syllables.

In summary, ESL, who have higher overall proficiency and greater exposure to the English-speaking environment produce more native-like intensity patterns than EFL. Results from the basic statistics, the significant differences from English speakers, the correlation coefficients of intensity ratios, and the intensity contrasts between strong and weak syllables provide supporting evidence that ESL perform better than EFL. The strength of ESL over EFL is more pronounced in Type A sentences than in Type B sentences, because the stress patterns of the former prove to be more difficult for TM speakers.

5.3.5 SUMMARY FOR THE DISCUSSION OFINTENSITY

Both ESL and EFL show difficulty with the intensity of weak syllables for Type A and Type B sentences, but only slight difficulty with the intensity of strong syllables. Lowering intensity on weak syllables (including both weakly stressed syllables and unstressed syllables) appears more difficult for ESL and EFL than raising intensity on strong syllables. Type A sentences pose a greater challenge to TM speakers than Type B sentences. ESL and EFL produce a much larger number of louder weak syllables than NS

197 in Type A than in Type B sentences. ESL speakers show somewhat less difficulty with intensity as a correlate of stress than EFL. They not only produce fewer louder weak syllables in Type A and Type B sentences, but also produce fewer softer strong syllables in Type B sentences.

TM speakers generally correlate intensity with the target stress patterns, although the correlation is not as strong as it is for English speakers. In particular, the weakly stressed syllables of EFL speakers are almost as loud as their strong syllables. Unlike NS and ESL speakers who produced a three-way intensity distinction for the three types of syllable (strong, weakly stressed, unstressed), EFL speakers produce a binary intensity distinction (stressed vs. unstressed).

ESL and EFL produce similar intensity contrasts between strong and weak syllables with English speakers for Type B sentences, but produce smaller intensity contrasts for Type A sentences. Relatively softer strong syllables and relatively louder weak syllables both contribute to the relatively smaller intensity contrasts between strong and weak syllables among TM speakers. When intensity contrasts are measured based on differences in intensity ratios between strong and weak syllables or between stressed and unstressed syllables, intensity contrasts are still more distinct for English speakers than for TM speakers for Type A sentences. The results suggest that TM speakers are capable of producing native-like intensity contrasts between strong and weak syllables when the rhythm features regular alternation between stressed and unstressed syllables.

ESL speakers, who have higher overall proficiency and exposure to the English­ speaking environment, produce more native-like intensity patterns. Various aspects of the intensity results, including the mean intensity ratios of syllables, the standard deviations of intensity ratios among syllables, the ranges of intensity ratios in a sentence, the correlations of intensity ratios across sentences, the number of syllables significantly different from NS, and the amount of intensity contrasts between strong and weak syllables, all provide supporting evidence that ESL perform better than EFL with respect to their intensity patterns.

198 CHAPTER 6: RESULTS AND DISCUSSION (3): PITCH

This chapter reports and analyzes results from the Fo data of Type A and Type B sentences. NaIve English, TM ESL, and TM EFL speakers are compared with respect to their use of pitch as a correlate of stress in English. Results of the two rhythmically diverse sets of sentences are reported separately and then combined for more detailed analyses. The focus of the current investigation will be the extent to which these two types of rhythmic patterns differ in terms of the types and degrees of difficulty in pitch presented to TM speakers and how such differences may help us understand the source of any difficulties.

Fo data of Type A and Type B sentences are organized into five subsections each. The first pair of sections, 6.1.1 and 6.2.1, report basic statistics and patterns revealed by the absolute Fo frequency (Hz) of syllables. The data are valuable in showing what the original speech data look like for each group. However, caution should be taken when one compares differences in pitch across groups based on F0 frequency in Hz alone because these differences often heavily reflect individual variations in pitch range. As a result, the Hz data are used as basis for comparing rhythm between groups. Interpretations of the Hz data are limited to revealing general patterns. The second pair of sections, 6.1.2 and 6.2.2, report basic statistics and general patterns revealed by Fo frequency in semitone ratios. To make the raw Fo data directly comparable, the original Fo data in Hz is further processed in two steps. First, the Fofrequency in Hz is converted into 6 semitones to make the units of comparison perceptually equal. After the Fo frequency in Hz is converted into semitones, the semitone value of each syllable is reconverted into

6 The reason for doing so is that a given absolute difference in Hz is perceived differently at different pitch ranges. It takes a larger difference in Hz to achieve the same amount of perceptual difference in pitch at a higher pitch range than at a lower pitch range. Semitones take such perceptual differences into consideration, so that a difference in one semitone .at a higher pitch range represents the same amount of perceptual difference in pitch of a difference in one semitone at a lower pitch range.

199 ratios based on the highest semitone value in that utterance. In doing so, we eliminate variations in pitch range across utterances. In the third pair of sections, 6.1.3 and 6.2.3, direct syllable-to-syllable comparisons are made between speaker groups based on their semitone ratios. Both similarities to and significant differences from the pitch patterns of English speakers are reported. The fourth pair of sections, 6.1.4 and 6.2.4, report correlation coefficients of overall Fo patterns between subject groups for each test sentence in order to provide additional information about how close to target ESL and EFL speakers are. The fifth pair of sections, 6.1.1.5 and 6.1.2.5, report test reliability information for all test sentences to provide information of how consistent NS, ESL, and

EFL speakers are with their Fo patterns across three productions ofthe same sentence.

In the event that pitch information for a syllable is unavailable, the entire utterance that contains the syllable or syllables in question is excluded from analyses. The purpose is to ensure that all comparisons are made based on complete utterances.

6.1 FoPATTERNS OF TYPE A SENTENCES This section reports pitch results for the Type A sentences, which feature long stretches of weak syllables. Each sentence contains a single strong syllable or two widely spaced strong syllables. Four rhythmic patterns are represented. Sentences Al and A2 feature the five-syllable rhythmic pattern Swwww, sentences A3 and A4 feature the seven syllable rhythmic pattern SwwwwwS, sentences A5 and A6 feature the seven syllable rhythmic pattern wSwwwww, and sentence A7 has the eight-syllable rhythmic pattern SwwwwwSw. Utterances with missing data were removed from analyses. Out of the 630 utterances recorded, 89 contained syllables where the pitch data could not be extracted. A

total of 129 syllables had missing Fo data. Among them, 97 occurred in the speech of NS, 25 in the speech of ESL, and seven in the speech of EFL. All of them were weak

syllables. Because missing Fo data occurs more often in the speech of NS than in the speech of ESL or EFL, the majority of the removed utterances come from NS. Among the 89 utterances discarded from the analyses, 61 were produced by NS, 21 by ESL, and

200 seven by EFL. As a result, the total number of utterances analyzed is 541 and the total number of syllables analyzed is 3570.

6.1.1 FoFREQUENCYINHZ

This section reports results based on the Extreme Point Fo frequency in Hz of individual syllables sampled from the raw speech data. A group average Fo in Hz was obtained for each syllable of every test sentence. The basic statistics of the Fo data, including means, standard deviations, and ranges, are summarized in Table6.1 below.

201 Table 6.1 Group Average Fo in Hz of individual syllables for Type A sentences

Item Mean SD Range Grou S llable Ai it with me NS 117.33 131.1 125.24 13.87 36.81 ESL 145.16 134.4 165.68 EFL 170.90 154.38 172.24 A2 it for me NS 124.31 114.44 133.25 ESL 153.52 127.81 130.15 EFL 171.72 156.31 167.66 A3 me to wear NS 148.23 143.73 140.54 ESL 152.14 137.07 139.66 EFL 177.82 158.29 171.79 A4 me to bring NS 149.85 145.77 142.92 ESL 168.48 153.19 152.07 EFL 185.46 157.89 175.39 AS made the le- mon NS 147.69 143.46 141.50 142.04 ESL 151.64 135.14 148.71 153.82 EFL 175.23 150.77 171.80 181.53 A6 man gave it to me NS 130.82 123.82 120.64 129.55 131.00 ESL 160.48 166.11 154.15 141.59 154.48 EFL 182.93 174.38 172.34 155.93 175.48 A7 one wear -ing the blue NS 150.00 146.29 145.76 145.29 171.57 ESL 152.50 147.54 143.00 131.42 169.35 EFL 185.61 180.18 164.71 150.14 179.68 Total NS ESL EFL

The mean Fos of a sentence tend to be the highest for EFL, the lowest for NS; and in between for ESL. The mean Fo for all sentences is 146.21 Hz for NS, 155.02 Hz for ESL, and 172.48 Hz for EFL. The variation of Fo frequency among syllables of an utterance averages the greatest for ESL, the smallest for EFL and in between for NS. The average standard deviation of Fo in Hz is 18.31 for NS, 20.34 for ESL, and 16.48 for EFL. The Fo range of an utterance averages the widest for ESL, the narrowest for EFL, and in between for NS. Average Fo range of all seven Type A sentences is 49.71 Hz for NS, 53.25 for 202 ESL, and 46.38 for EFL. EFL speakers on average produced the highest means Fo in Hz, the smallest variations in Hz among syllables, and the smallest Fo ranges in Hz.

203 (a) Fa Frequency (Hz) of sentence Al (Swwww)

250,..------..,

200 +--__ ------1 1>...... GO ...... A. .... ".,..... N a...... _ .... e 150 +-----...... :------""0-.--.------...- O---,------f ~ ~ II- -- --r' • NS ~ ~------....- --ESL """,.."". l ...... ~ ...... EFL II. 100 +--,------1 ~

50 +------1

O-!-----r------,.-----.,..----..,....------! Jim wrote it with me Syllable

(b) Fa Frequency (Hz) of sentence A2 (Swwww)

250,.....------.

200 +------~------I...... ii" ...... _ ...... C 150 +----~~-----=--.._---=-.. ------4 ~ ...... ~4.~~Nssl : --ESL ~ ...... EFL II. 100 +------1 ~

50 +------1

O-!-----.,....------,.-----.,..----..,....------! Jane made it for me Syllable

204 (c) FoFrequency (Hz) of sentence A3 (SwwwwwS)

250,------,

200 +------...----'---..------1...... t----~::::::"""s --'liiiiiii~i:;:;;;;;:1~;:; ;:::~~~~_l !~ 150 ... .. r.:::t.~:;;-NN:SSl i __~ ::J l ...... EFL II. 100 +------1 ~

50 +------1

O+-----,.---....,----r----..,...---.,...-----.------! You like me to wear the jeans Syllable

(d) FoFrequency (Hz) of sentence A4 (SwwwwwS)

250

~ 200 .- .. .. ". .. ~ II- 11--"':" .. .. - ...... L:" .. l!II':/~ - --- - • NS --ESL ...... EFL

50

o You want me to bring the wlne Syllable

205 (e) FoFrequency (Hz) of sentence AS (wSwwwww)

250 -r------""1

200 +-----,.------"'--..iII£----J .. .. A'" .. A" .. .. It'

• NS --ESL ...... EFL

50 +------1

O+----..,..---,-----r------,r-----r-----r----! My mom made the Ie mon pie Syllable

(f) FoFrequency (Hz) of sentence A6 (wSwwwww)

250 -r------,

~ 200 +------I--A----;------I

I "~~"j; .. I _.... ~'" ....j; .. II- .... g: 150 +--~t/;----~~-----...... -.....,__,~-='------L---J ~ • NS ! --ESL ~ ...... EFL II. 100 +------1 ~

50 +------1

O+----,..---,.----....,.-----r---..,...----...----! The old man gave it to me Syllable

206 (g) FoFrequency (Hz) of sentence A7 (SwwwwwSw)

250...------...,

200 +- ...... ------1 ...... ~ 150 1--'to------...--IiIIIiIfI.....~iiiIE;;;;;;;;~r~---3li,~H • NS i --ESL :::I ~ .. 'll" EFL II. 100 +------i ~

50 +------i

O+---r------,.--...,...--....,.---.,.------,---....--~ Jane's the one wear ing the blue dress Syllable

Figure 6.1 Group Average FoFrequency (Hz) for sentences Al through A7 Three observations can be made based on the results summarized in Table 6.1 and

Figure 6.1. First, EFL tend to speak at a noticeably higher Fo in Hz than NS and ESL. Their syllables often average highest in Hz throughout the sentence. ESL also speak at a higher pitch than NS in most sentences. Second, despite higher overall Fos, ESL and EFL tend to produce noticeably lower Fos on initial weak syllables than NS as in sentences AS and A6. Third, the Fo patterns of NS appear highly consistent with the target stress patterns. Their Fousually rises at target strong syllables, drops at weak syllables and stays low over the long stretch of weak syllables. In contrast, the Fopatterns of ESL and EFL sometimes exhibit wrongful placement of pitch prominence, which may result in a greater number of pitch prominences than expected. Take sentence A4, You want me to bring the wine, as an example. EFL produce a pitch pattern of LHLLHLH with pitch prominence on three" syllables want/bring/wine, while NS produced a pitch pattern of

207 HLLLLLH, with pitch prominence on two target strong syllables youlwine. A same pattern can also be found for sentence A3.

6.1.2 Fo FREQUENCYIN SEMITONE RATIOS

This section reports results based on the Fo frequency in semitone ratios of the individual syllables. The Fo Frequency in Hertz is fIrst converted into equal perceptual units of semitones and then converted into ratios so that the peak EPF of an utterance has a value of one and others as its ratios. The semitone ratio is preferred over Hz for comparing rhythmic patterns among groups because it minimizes the impact of perceptual bias and individual variations on pitch range as potential variables. A group average Fo in semitone ratios was obtained for every individual syllable of every test sentence. These basic statistics, including means, standard deviations, and ranges, are summarized in Table 6.2.

208 Table 6.2 Group Average Fa in semitone ratios ofindividual syllables for Type A sentences

Item Mean SD Range GrOll S llable At with me NS 0.80 0.78 0.07 0.18 ESL 0.76 0.81 EFL 0.85 0.91 A2 for me NS 0.72 0.76 ESL 0.78 0.78 EFL 0.88 0.93 A3 to wear the ·eans NS 0.82 0.81 0.81 0.89 ESL 0.81 0.82 0.79 0.90 EFL 0.88 0.91 0.84 0.92 A4 to bring the wine NS 0.84 0.82 0.82 ESL 0.83 0.82 0.78 EFL 0.87 0.91 0.82 AS the Ie mon NS 0.79 0.77 0.81 ESL 0.78 0.84 0.84 EFL 0.81 0.88 0.89 A6 ave it to me NS 0.79 0.77 0.81 0.82 ESL 0.84 0.81 0.77 0.81 EFL 0.89 0.90 0.85 0.91 A7 one wear -in the blue dress NS 0.83 0.82 0.81 0.80 0.90 0.81 ESL 0.85 0.83 0.81 0.76 0.87 0.76 EFL 0.89 0.88 0.84 0.78 0.87 0.87 Total NS ESL EFL

The Mean semitone ratios tend to be highest for EFL. EFL produce the highest mean semitone ratios in all seven sentences. The mean semitone ratio of Type A sentences is 0.84 for NS, 0.85 for ESL, and 0.90 for EFL. The variation in semitone ratios across syllables of an utterance averages the smallest for EFL, the greatest for NS, and in between for ESL. The standard deviation of semitone ratios among syllables of an

209 utterance averages 0.07 for NS, 0.06 for ESL, and 0.05 for EFL. NS produce the greatest variation in semitone ratios in five out of seven sentences, while EFL produce the smallest variation in six out of seven sentences. Syllables vary within a narrower range in semitone ratios for ESL and EFL than for NS. NS produce the widest FO range in five out of seven sentences, while EFL produce the narrowest FO range in six out of seven sentences. FO range in semitone ratios of all sentences averages the narrowest for EFL, the widest for NS, and in between for ESL Average FO range in semitone ratios of Type A sentences is 0.20 for NS, 0.18 for ESL, and 0.14 for EFL. The results show that syllables tend to be higher-pitched but with smaller FO variation in the speech ofESL and EFL than in the speech ofNS.

Average semitone ratios are relatively high in the speech ofEFL. The major reason is that they produce high tones on a greater number of syllables than do NS and ESL. They sometimes raise pitch on syllables that are supposed to be weak. Another contributing reason is that EFL tend to produce high pitch prominence on final syllables.

The additional Fo rise on these weak syllables consequently raises their average Fo•

Figure 6.2 shows the Fopatterns ofType A sentences in terms of semitone ratios.

210 (a) FoFrequency in semitone ratios ofsentence Al (Swwww)

1.00

0.95

0.90

0.85 0 ~ 0.80 cGl • NS ~ 0.75 --ESL Gl ...... EFL U) .5 0.70 ~ 0.65

0.60

0.55

0.50 Jim wrote it with me Syllable

(b) Fo Frequency in semitone ratios of sentence A2 (Swwww) 1.00 ...... 0.95 Ii...... A...... 0.90 '\.'" '" "" '\. .... - ----. .. '* .. 0.85 o ~ 0.80 '\. " "- Glc ~ 'D--~ • NS ~ --ESL 0.75 -...... ~ ~ -r- ...... EFL .5 0.70 ~ 0.65

0.60

0.55

0.50 Jane made it for me Syllable

211 (c) FoFrequency in semitone ratios of sentence A3 (SwwwwS) 1.00 - ~'- 0.95 ""\. .. A ... ,.,~~ ~ 0.90 ...... Ii r 'JI. ¢' ¢' .. ... ~':/ 0.85 '-~ "" o ~ L. Y ~ 0.80 ... cGl • • NS ~ 0.75 --ESL r'l .. 'It. .. EFL .5 0.70

0.65

0.60

0.55

0.50 You like me to wear the jeans Syllable

(d) FoFrequency in semitone ratios of sentence A4 (SwwwwwS)

1.00 '\ ,A .. .. 0.95 A ... ~ A 0.90 .. --\...... '" .. ... ~/~ -, ...... ,~ 0.85 - -'- .. o ~...... ~ ~ -... 0.80 .... -rJf cGl • NS ~ 0.75 _-ESL r'l .. 'It. .. EFL .5 0.70 ~ 0.65

0.60

0.55

0.50 You want me to bring the wine Syllable

212 (e) FoFrequency in semitone ratios of sentence AS (wSwwwww)

1.00

0.95

0.90 A ... " 0.85 ~ a: 0.80 cCII • NS ~ 0.75 --ESL 6 ...... EFL tI) .5 0.70 ~ 0.65

0.60

0.55

0.50 My mom made the Ie mon pie Syllable

(f) FoFrequency in semitone ratios of sentence A6 (wSwwwww)

1.00

0.95

0.90

0.85

I 0.80 cCII • NS S ~ 0.75 --ESL tI) .... - EFL .5 0.70 0 II. 0.65

0.60

0.55

0.50 The old man gave it to me Syllable

213 (g) FoFrequency in semitone ratios of sentence A7 (SwwwwwSw)

1.00

0.95

0.90

0.85 8 ~ 0.80 cCD • NS ~ 0.75 --ESL CD (I) ...... EFL .5 0.70 ~ 0.65

0.60

0.55

0.50 Jane's the one wear ing the blue dress Syllable

Figure 6.2 Group Average FoFrequency in semitone ratios for sentence Al through A7 Four observations can be made based on the results summarized in Table6.2 and

Figure6.2. First, the Fo patterns of NS appear highly consistent with the target stress patterns. The Fo patterns of ESL and EFL, however, do not always coincide with the target stress patterns. For NS, strong syllables tend to have high pitches while weak syllables tend to have low pitches. In contrast, ESL and EFL sometimes exhibit non­ native placement of pitch prominence. While strong target syllables sometimes received no pitch prominence, more often a high pitch prominence was observed on weak target syllables. The prevalence of the latter often leads to an unexpectedly high number of pitch accents in a sentence. We generally observe a zigzag FO pattern in the speech of ESL and EFL over what is expected to be a long stretch of weak syllables. This phenomenon is particularly pronounced in the speech of EFL. Take sentence A3 for example.

214 (26) A3: You like me to wear the jeans. NS:H LLLLLH ESL:L HLLLLH EFL:L H LLH LH

NS produce pitch prominences on the target strong syllables you and jeans. Instead of the initial syllable, ESL place pitch prominence on the syllable like. In addition to that, EFL produce pitch prominence on a third syllable wear.

Second, syllables often maintain higher semitone ratios throughout a sentence in the speech of ESL and EFL than in the speech of NS. EFL, in particular, tend to speak: at noticeably higher semitone ratios than NS. While this phenomenon is particularly evident in sentence A2, it can be identified in most Type A sentences.

Third, despite higher overall Fos, ESL and EFL tend to produce noticeably lower Fas on initial function words whether they are supposed to be strong or weak: according to the lead-in sentences. The phenomenon is evident in sentences A3 through A6.

Fourth, EFL tend to raise Fa on final syllables. This happens to both strong and weak: syllables and to both function words and content words. Their divergence from NS is particularly obvious when the final syllables contain weak: function words.

6.1.3 SIGNIFICANT DIFFERENCES BETWEENNS, ESLAND EFL

Student's t-tests were performed on individual syllables to determine whether or not the obtained difference in semitone ratios between pairs of groups was statistically significant or likely by chance. Results of the t-test scores for semitone ratios of individual syllables between NS, ESL and EFL are shown in Table 6.3.

215 Table 6.3 Student's t-test scores for semitone ratios of individual syllables between NS, ESL and EFL for Type A sentences

Group t Al Jim with me NS-ESL 1.437 0.527 0.302 NS-EFL 1.370 0.483 1.414 ESL-EFL 0.023 1.894 1.807 A2 Jane for me NS-ESL 0.383 0.566 0.096 NS-EFL 2.388* 1.667 1.533 ESL-EFL 1.513 1.829 2.141* A3 You to jeans NS-ESL 0.356 0.455 NS-EFL 2.363* 1.268 ESL-EFL 1.878 0.749 A4 me to NS-ESL 0.645 0.100 NS-EFL 1.889 0.311 ESL-EFL 1.134 2.276* 0.536 AS M made the NS-ESL 2.085 0.899 0.053 NS-EFL 2.058 2.291* 0.357 ESL-EFL 0.084 0.868 1.020 0.437 A6 The old me NS-ESL 2.081 0.920 0.105 NS-EFL 1.337 1.887 1.463 ESL-EFL 1.124 1.356 2.065 A7 Jane's the one wear -ing blue dress NS-ESL 0.261 1.801 0.629 0.343 0.176 1.219 0.616 NS-EFL 1.007 2.234* 1.697 2.114* 1.046 1.195 1.101 ESL-EFL 0.836 0.399 1.058 1.940 0.881 0.004 1.840 *p<.05 (tcrn=2.10l), **p<.Ol(tcrn=2.878), df=18, two-tailed

In the next two subsections, we will examine which syllables are significantly different in semitone ratios between groups. Section 6.1.3.1 reports and compares differences between native and non-native speakers, NS and ESL, and NS and EFL. Results provide crucial information as to the kinds of difficulties TM speakers have with pitch as a correlate of stress, as well as in what ways ESL and EFL are similar or different in their problems with pitch. Section 6.1.3.2 examines significant differences found between ESL and EFL. Results in this section will provide information about the

216 changes that might have taken place from EFL to ESL in the way pitch correlates with stress.

6.1.3.1 Differences between NS and ESL vs. differences between NS and EFL ESL and EFL show similar types of difficulties with pitch. Significant differences between NS and ESL and significant differences between NS and EFL both highlight two types of difficulties: strong syllables that are relatively low-pitched or weak syllables that are relatively high-pitched compared with the same syllables produced by NS. Despite having similar problems, ESL is distinguished from EFL in two respects. First, ESL speakers do not appear to have problems with weak syllables in final position. Second, they display a lesser degree of difficulty in pitch. A much smaller number of syllables is found to be significantly different between NS and ESL than between NS and EFL. In particular, EFL have greater trouble with weak syllables in non-final position than ESL.

Table 6.4 Number of syllables with semitone ratios significantly different from NS as strong and weak in non-final and final positions for Type A sentences

Position Non-Final Final Stress Weak Strong Weak (k=31) (k=2) (k=S) lower hi er lower lower 5 1 21

Out of all 46 syllables compared for Type A sentences, a total of five syllables was found to be significantly different between NS and ESL. These five syllables include two lower-pitched strong syllables and three higher-pitched weak syllables, all in non-final position. A total of 21 syllables was found significantly different between NSand EFL out of the 46 syllables compared. These 21 syllables include one higher-pitched, two lower-pitched strong syllables in non-final position, 14 higher-pitched weak syllables in non-final position and two higher-pitched weak syllables in final position.

In summary, ESL and EFL evidence similar types of difficulties with semitone ratios, including lower-pitched strong syllables and higher-pitched weak syllables. ESL show much less difficulty with weak syllables in both final and non-final positions than 217 EFL. Higher semitone ratios on weak syllables account for the majority of differences between NS and EFL.

6.1.3.2 Significant differences between ESL and EFL This section reports significant differences in semitone ratios found between ESL and EFL. Syllables that are found to be significantly different between these two groups are further examined under four categories: strong syllables in non-final position, weak syllables in non-final position, strong syllables in final position, and weak syllables in final position.

Table 6.5 Number of strong and weak syllables with semitone ratios significant different between EFL and ESL in non-final vs. final positions for Type A sentences

Position Non-Final Final Total Stress Strong Strong Weak (k=8) (k=2) (k=5) T hi er lower er lower lower EFLvs.ESL 9 ESUNSonly o EFUNSonly 4 EFUESUNS 1 ..ESUEFL only 4

Out of 46 syllables compared, the semitone ratios of nine syllables are significantly different between ESL and EFL. All of these differences are attributed to higher semitone ratios of weak syllables on the part ofEFL. Of the nine syllables, eight are weak syllables in non-final position, and one is an weak syllable in final position.

Not all of the differences translate into difficulties with speech rhythm. Simply because EFL and ESL differ from each other on the intensity ratios of one syllable does not mean either of them produce that syllable significantly differently from English speakers. For example, the syllable me in sentence A3 is found to be significantly different between ESL and EFL, but not between NS and ESL or NS and EFL. This suggests that although ESL and EFL are very different in the semitone ratios of the syllable, neither ESL nor EFL have difficulty with this syllable. 218 The differences between ESL and EFL that are interpreted as difficulties are syllables that are significantly between NS and ESL and/or between NS and EFL. When a syllable is found to be significantly different both between NS and ESL and between NS and EFL, it implies a difficulty common to both ESL and EFL. For example, the syllable made in sentence A2 is found to be significantly different between ESLand EFL, between NS and ESL, and also between NS and EFL. When a syllable is found to be significantly different between NS and ESL but not between NS and EFL, it implies a difficulty unique to ESL. Similarly, when a syllable is found to be significantly different between NS and EFL but not between NS and ESL, it implies a difficulty unique to EFL.

Among the nine syllables that are found to be significantly different between ESL and EFL, four involve significant differences between ESL and EFL only and five also involve significant differences between NS and ESL or significant differences between NS and EFL. Among them, one involves a significant difference between both NS and ESL and NS and EFL, which implies a difficulty common to both ESL and EFL and four involve significant differences between NS and EFL but not between NS and ESL, which suggests a difficulty unique to EFL. Interestingly, none of the differences we found between ESL and EFL indicate a difficulty unique to ESL.

219 Figure 6.3 Distribution of significant differences in semitone ratios between ESL and EFL in tenns of the spread of difficulties to ESL and/or EFL

In summary, the single most important difference between ESL and EFL is that EFL produce higher relative pitch on weak syllables. All of the significant differences between ESL and EFL involve higher pitched weak syllables in non-final position produced by EFL speakers. They reflect difficulties unique to EFL, difficulties common to both ESL and EFL, or non-difficulties to either ESL or EFL. This suggests' that whenever there is a difference between ESL and EFL, it is more likely for EFL than for ESL to be the group having difficulty.

6.1.4 CORREIATION OFPITCH PAITERNS BETWEEN SPEAKER GROUPS

Pearson Product-moment correlation coefficients were obtained to estimate how closely semitone ratios covary from syllable to syllable between pairs of subject groups. Correlation tests were perfonned between NS and ESL, NS and EFL, and ESL and EFL 220 respectively for all seven type A sentences. Complete results of the correlation tests are summarized in Table 6.6.

Table 6.6 Pearson Product-moment Correlation Coefficients for mean syllable semitone ratios between groups for Type A sentences

Sentence r NS andESL NSandEFL ESLandEFL Al 0.7344 0.4016 0.9026* A2 0.8452 0.8125 0.8690 A3 0.6398 0.2736 0.8655* A4 0.3329 0.1807 AS 0.7911* 0.4246 II--A:.=::...6 0.6808 0.3128 A7 0.6429 *p<.05, **p<.OI

There are three major observations. First, few significant correlations can be identified between NS and ESL or between NS and EFL. Correlation coefficients between NS and ESL are found to be significant in only two sentences A5 at p

The results support earlier findings that (1) ESL and EFL differ from NS in their pitch patterns in some major ways, that (2) ESL and EFL are similar in their pitch patterns, and that (3) differences between ESL and EFL more likely lie in the magnitude in which pitch fluctuates rather than the direction in which it flows, Le., whether it rises or drops.

6.1.4.1 Test Reliability This section summarizes results of test-retest reliability between the two administrations of all seven Type A sentences. Three reliability indices were obtained for each sentence

221 for each group. High test-retest reliability indicates speakers are consistent with their pitch patterns.

Table 6.7 Test-retest reliability for Fofrequency in semitone ratios from three productions ofType A sentences

Sentence Administration NS Al First & Second 0.9414* First & Third 0.9965** 0.8808* Second & Third 0.9502* 0.9228* A2 First & Second 0.9579* 0.9945** First & Third 0.9744** Second & Third 0.9683** A3 First & Second 0.9782** 0.9699** First & Third 0.9986** 0.8521* Second & Third 0.9954** 0.9364** A4 First & Second 0.9375** 0.9523** First & Third 0.9145** 0.9190** Second & Third 0.9796** 0.9825** AS First & Second 0.9369** 0.9423** 0.9472** First & Third 0.9845** 0.9522** 0.9837** Second & Third 0.9981** 0.9751** 0.9896** A6 First & Second 0.8692* 0.9531** 0.8542* First & Third 0.9083** 0.9738** 0.8663* Second & Third 0.9775** 0.9836** 0.9668** A7 First & Second 0.8844** 0.9697** 0.9778** First & Third 0.7285* 0.8807** 0.9707** Second & Third 0.9297** 0.9544** 0.9890** *p<.05, **p<.Ol

Results of the test-retest reliability indicate that speakers of all groups are consistent with their pitch patterns in most Type A sentences. Among 21 Pearson Product-moment Correlation coefficients of semitone ratios between productions of the sanie sentence obtained for each group, NS show positive significant correlation in 19, ESL in 16, and EFL in 20 comparisons.

6.1.5 SUMMARY OF PITCH RESULTS FOR TYPE A SENTENCES Overall, ESL and EFL produce syllables with higher semitone ratios than NS. The average semitone ratio is highest for EFL, lowest for NS, and in between for ESL. The variation in semitone ratios is on average the smallest for EFL, the greatest for NS, and

222 in-between for ESL. Syllables vary within a narrower range of semitone ratios for ESL and EFL than for NS. The range of semitone ratios of individual sentences averages the smallest for EFL, the largest for NS, and in between for ESL

The Fa patterns of NS are highly consistent with the target stress patterns, but Fa patterns of ESL and EFL speakers do not always coincide with the target stress patterns. While Fa tends to rise at strong syllables and drop at weak syllables in the speech of NS, it is sometimes observed to drop on strong syllables and rise on weak syllables in the speech of ESL and EFL. Deviation from the target pitch patterns occurs more often with EFL than with ESL. In addition, the semitone ratios of weak syllables do not decline as much for NNS. They also tend to place high pitch prominence on final syllables, whether they are intended as strong or weak. The differences between NS and NNS seem to appear both on the placement of stress and the degree to which Fa varies with stress.

ESL and EFL produce similar types of difficulties with pitch. Significant differences between NS and ESL and between NS and EFL indicate ESL and EFL sometimes produce lower-pitched strong syllables and more often higher-pitched weak syllables than NS. EFL produced the latter type at a much higher frequency than ESL. Despite showing similar types of difficulties, EFL suffer from the problems more seriously than ESL. A larger number of syllables are found to be significantly different between NS and EFL than between NS and ESL.

Significant differences between ESL and EFL are found in only a small number of syllables. Most are attributed to higher-pitched weak syllables in non-final position on the part ofEFL and most reveal difficulties unique to EFL.

Poor correlation·of semitone ratios is found between NS and ESL and between NS and EFL. However, correlation coefficients are consistently stronger between NS and ESL than between NS and EFL throughout all Type A sentences. Very strong correlation is established between ESL and EFL. Speakers of all groups are consistent in their pitch patterns in their production ofType A sentences.

223 6.2 PITCH PATTERNS OF TYPE B SENTENCES

This section reports results from the Fo patterns of Type B sentences, which feature a highly regular rhythmic pattern of alternating strong and weak syllables. Each strong syllable is immediately preceded and/or followed by an weak syllable, and vice versa. All test sentences were embedded in broad-focused contexts to encourage speakers to introduce all words that carry content information as new and strong. The alternating stress pattern is further reinforced with word class. All strong syllables are made up with content words, which are more likely to bear stress than function words, which make up the weak syllables.

Three different rhythmic patterns are represented in Type B sentences. Sentences B1 through B4 feature the six-syllable iambic rhythmic pattern wSwSwS, sentences B5 and B6 feature the seven-syllable trochaic rhythmic pattern SwSwSwS, and sentence B7 features the eight-syllable iambic rhythmic pattern wSwSwSwS.

In the event that pitch information for a syllable was unavailable, pitch data of the entire utterance that containing the syllable or syllables in question was excluded. Of 630 utterances recorded, 35 utterances contained syllables whose FO information could not be extracted by PitchWorks. These were removed from analyses. They consist of 39 syllables, 31 of which occur in the speech of NS, four in the speech of ESL, and four in the speech of EFL. The majority of these syllables comprise weak syllables, which are either too softly spoken or too low in pitch for their Fo to be extracted. This suggests how reduced weak syllables can be for NS. Because they occur most often in the speech of NS, most of the removed utterances come from NS. Among the 35 utterances discarded from the analyses, 28 are produced by NS, three by ESL, and four by EFL. That brings the total number of utterances analyzed down to 595. The total number of syllables analyzed is 3936.

6.2.1 FoFREQUENCYIN HZ This section reports and summarizes the pitch results for the Type B sentences. A group average Fo in Hz was obtained for every individual syllable of each test sentence. The 224 basic statistics, including means, standard deviations, and ranges, are summarized in Table6.8.

Table 6.8 Group Average Fo in Hz of individual syllables for Type B sentences

Mean SD Range GrOll S lIable Bl I it back b noon NS 170.41 164.67 154.81 143.52 157.44 161.44 16.06 43.22 ESL 163.07 169.59 157.03 142.55 166.14 167.50 22.02 59.62 EFL 165.17 181.17 167.27 145.63 166.80 B2 The with dad and laugh NS 148.37 152.33 139.81 134.37 137.70 ESL 144.71 152.61 151.18 135.93 161.14 EFL . 168.57 167.70 174.33 148.30 170.37 B3 We to read and write NS 169.93 151.57 145.67 140.70 145.7 ESL 152.07 155.07 165.23 144.03 156.03 EFL 162.87 164.23 173.40 150.47 162.03 B4 I with John at noon NS 174.13 178.21 157.79 142.63 171.13 ESL 144.27 151.53 155.63 136.93 159.30 EFL 160.47 162.40 169.37 153.07 165.00 B5 Mom dad were mad . at Jim NS 154.87 140.96 138.57 121.61 149.35 ESL 148.07 131.97 157.59 128.86 149.83 EFL 181.93 151.00 192.11 148.07 185.93 B6 ood at bake .ing bread NS 163.88 143.60 154.80 140.60 162.92 ESL 166.10 129.60 174.80 147.30 169.13 EFL 194.07 163.83 189.21 183.45 182.66 B7 ride to work at once NS 165.35 155.62 157.85 140.15 152.27 ESL 172.83 147.23 160.33 140.87 161.73 EFL 181.37 156.90 167.13 151.77 161.10 Total NS 157.79 16.84 47.17 ESL 158.66 17.05 48.32 EFL 173.07 18.63 50.30

The mean Fo of a sentence tends to be the highest for EFL, the lowest for NS, and in between for ESL. The mean Fo for all sentences is 157.79 Hz for NS, 158.66 Hz for ESL, and 173.07 Hz for EFL. Variation ofFo frequency among syllables of an utterance averages the greatest for EFL but is similar for NS and ESL. Along with a greater average Fo variation, EFL also produce a much higher mean FO than NS and ESL. The

225 average standard deviation of Fo in Hz is 16.84 Hz for NS, 17.05 Hz for ESL, and 18.63 Hz for EFL. The Fo range for an utterance averages slightly greater for EFL than for NS and ESL. Average Fo range of all seven Type B sentences is 47.17 Hz for NS, 48.32 Hz for ESL, and 50.30 Hz for EFL.

226 (a) FoFrequency (Hz) of sentence Bl (wSwSwS)

250..,.------,

200 +------:;--.1-.,------1

...... ~...... t------==------=-=:::~~-=~~~.4f:.~~~ ! 150 .. .. ~ • NS Iii :::I --ESL ! ...... EFL II. 100 +------1 ~

50'+------1

0+----..,..----..,..----..,..----..,..----...,.------1 need it back ,by noon Syllable

(b) FoFrequency (Hz) of sentence B2 (wSwSwS)

200 ....6 .. 180 .. .. ,a'" "...... "" ..... 160 ..II- .. .. ~ - --..- ...... ;t'- 140 r --..::::::...:. '" £ " - ! 120 --- ~ • NS c ! 100 --ESL l ...... EFL II. 80 ~ 60

40

20 o They play with dad and laugh Syllable

227 (c) FoFrequency (Hz) of sentence B3 (wSwSwS)

250

200 .. ; /~', ~-'(P -"- ~~L:' :-'" .1=...... ~ , 'o ... "," ....1\ ~ 150 ~ ~ ~ .,.,. • NS i ::J --ESL ! ...... EFL II. 100 ~

50

o We learn to read and write Syllable

(d) FoFrequency (Hz) of sentence B4 (wSwSwS)

250 ...... ------.

200 +----'----*------~

~ 150 -l----.;;;~-=------~~~~~~!!"oo;;;;;;~~~~:::::...-__1 ~--- ~ • NS ::J --ESL ! ...... EFL II. 100 +------1 ~

50 +------~

O+----.,.----.,.----.,.----.,.-----r------! met with John at noon Syllable

228 (e) FoFrequency (Hz) of sentence B5 (SwSwSwS)

250,...------,

200 +------,II!r------I... ~ '" ~ " ~ '" # ! 150 +------"oi!llllr--c=--==-__",.",.....,-~It______,.....___::__------"'l'---__cAr_--__I_--1 to r.::4.;:::~NiSsl ~ --~ l ...... EFL u. 100 -1------1 ~

50 +------1

O+----,-----r---.,....----r-----,.---,....---..... Mom and dad were mad at Jim Syllable

(f) FoFrequency (Hz) of sentence B6 (SwSwSwS)

250,...------,

200 +------"'l~-----___,,;------I

! 150 +------"'------"--"'.... to NS ~ - IESL l lIIIII JIiiim EFL u. 100 -1------4 ~

50 +------1

O-l---...... ,..---,....----,------,r----...,---.....,.----I John is good at bake ing bread Syllable

229 (g) FoFrequency (Hz) of sentence B7 (wSwSwSwS)

250 -.------..,

200 +------::.."..~._------i .. '\..~...... L·.... ! 150 +-~ ____Ir:..___=~I=~~~~.. ~~$:..JL--i to r;;;;;;;4.~;_j\jNiSsl i __~ ::I ! ...... EFL II. 100 +------1 ~

50 +------1

0+---,..-----,.---,---.....,---..,.....---,..-----,.---1 need a ride to work at once Syllable

Figure 6.4 Group Average FoFrequency (Hz) for sentences Bl through B7 Four observations can be made based on the results summarized in Table 6.8 and

Figure 6.4. First, EFL sometimes speak at a noticeably higher Foin Hz than NS and ESL, with their syllables averaging higher in Hz than NS and ESL throughout most of the sentence. Second, the Fopatterns ofESL and EFL appear highly consistent with the target stress patterns. Their Fotends to rise at the target strong syllables and drop at the target weak syllables. Overall, ESL and EFL produce a robust "zigzag" Fopattern. Third, the Fo patterns of NS do not always rise at the target strong syllables or drop at target weak syllables, and when they do, their F0 rise and fall remain flatter than those of ESL and EFL. It seems that Fo rises at strong syllables and falls at weak syllables more consistently and within a wider range for TM speakers than for English speakers. Examples can be found in sentence B5, B6, and B7. Fourth, ESL and EFL tend to produce noticeably lower Fos on initial weak syllables than NS. This phenomenon is evident in sentence Bl, B3, B4 and B7. Fifth, the tendency for pitch range to gradually decline and narrow toward the end of an utterance is more consistent in the speech of NS 230 than in the speech of ESL and EFL. There is a gradual FO declination toward the end of a sentence for NS with each pitch hike and pitch dip being lower than the previous one. In contrast, the pitch range of ESL and EFL sometimes remains quite similar throughout a sentence, and the Fa of their final syllable can be almost as high as the Fa of strong syllables earlier in the sentence. This phenomenon is particularly evident in sentences that feature a trochaic rhythm, such as B5 and B6. In: connection with the preceding observation, we notice that the Fa patterns of NS often tend to start higher but end up lower than those ofESL and EFL.

Table 6.9 Group average Fa (Hz) of strong content words vs. weak function in non-final and final positions for Type B sentences

Position Non-fmal Final Stress Stron Weak: Stron Weak: T content function content function content function content function NS 166.31 154.49 153.79 ESL 170.81 f--__--+- .146.98 160.47 EFL

Because all strong syllables are content words and all weak syllables are function words in these sentences, it is impossible to disentangle the potential influence of stress from the potential influence of syllable type in pitch level by examining pitch data of Type B sentences alone. Nonetheless, we can see that strong content words tend to be higher in pitch than weak function words syllables in non-final position for all groups. The contrast between strong content words and weak function words appears larger for ESL and EFL than for NS. Final strong syllables tend to be spoken at the highest Fa in Hz for EFL, the lowest for NS, and in between for ESL.

6.2.2 FO FREQUENCYIN SEMITONE RATIOS

This section reports results from the Fa frequency in semitone ratios for individual syllables. The Fa Frequency in Hz was first converted into equal perceptual units of semitones and then converted into ratios so that the peak EPFo of an utterance has a value of one and others as its ratios. A group average Fa in semitone ratios was obtained for 231 every individual syllable of every test sentence. These Basic statistics, including means, standard deviations, and ranges, are summarized in Table 6.10.

Table 6.10 Group Average Fo in semitone ratios of individual syllables for Type B sentences

GrOll S Hable Bl I it back by noon NS 0.94 0.91 0.89 0.83 0.90 ESL 0.88 0.90 0.84 0.80 0.90 EFL 0.88 0.92 0.88 0.81 0.90 B2 They with dad and laugh NS 0.92 0.95 0.91 0.86 0.88 ESL 0.88 0.90 0.90 0.83 0.93 EFL 0.88 0.89 0.89 0.82 0.91 B3 We to read and write NS 0.87 0.87 0.83 0.85 ESL 0.86 0.91 0.80 0.84 EFL 0.89 0.91 0.83 0.88 B4 with John at noon NS 0.88 0.81 0.93 ESL 0.88 0.89 0.82 0.91 EFL 0.89 0.90 0.85 0.90 B5 dad were mad at Jim NS 0.87 0.81 0.81 0.74 0.85 ESL 0.85 0.78 0.88 0.77 0.87 EFL 0.90 0.79 0.91 0.78 0.91 B6 good at bake -ing bread NS 0.85 0.77 0.82 0.76 0.84 ESL 0.90 0.77 0.92 0.84 0.88 EFL 0.92 0.81 0.92 0.87 0.89 B7 I ride to work at once NS 0.94 0.86 0.82 0.83 0.74 0.81 ESL 0.88 0.91 0.83 0.87 0.80 0.87 EFL 0.89 0.92 0.86 0.88 0.84 0.87 Total NS ESL EFL

Mean semitone ratios tend to similar among NS, ESL, and EFL in Type B sentences. Although differences among the three are very small, EFL produce the highest mean semitone ratio more often than NS and ESL. The mean semitone ratio for Type B sentences is 0.88 for NS, 0.88 for ESL, and 0.89 for EFL. The variation in semitone ratios across syllables of an utterance also appears quite similar among the three groups. The 232 standard deviation of semitone ratios among syllables of an utterance averages 0.06 for NS, 0.06 for ESL, and 0.05 for EFL. On average syllables vary within a narrower range of semitone ratios for EFL than for NS and ESL. The average Forange in semitone ratios for Type B sentences is 0.17 for NS, 0.17 for ESL and 0.15 for EFL. The results show that NS and ESL are undistinguishable in terms of the basic statistics in pitch for Type B sentences. In addition, the semitone ratios of syllables tend to be slightly higher, less variable, and narrower in range in the speech ofEFL than in the speech ofNS and ESL.

Figure 6.5 shows the Fo patterns of the individual Type B sentences in terms of semitone ratios.

233 (a) FoFrequency in semitone ratios of sentence Bl (wSwSwS)

1.00 T'""'"------,._------,

0.95 +------c_,r:::....~lL--~llE.:_------I

0.85 +------~-----_""'loI___"""'o~~~_----1 o 1; a: 0.80 +------~-______'lI_------1 Gl C • NS ~ 0.75 +------1 --ESL ~ .. 'IIft. .. EFL .5 0.70 +------1 f 0.65 +------1

0.60 +------1

0.55 +------1

0.50 +-----...,..----...,..----...,..----.,..----.,..------1 need It back by noon Syllable

(b) FoFrequency in semitone ratios of sentence B2 (wSwSwS)

1.00

0.95 "'::' 0.90 ..

0.85 0 ! 0.80 cGl • NS ~ 0.75 --ESL °Gl tI) .. 'IIft. .. EFL .5 0.70 0 II. 0.65

0.60

0.55

0.50 They play with dad and laugh Syllable

234 (c) Fo Frequency in semitone ratios of sentence B3 (wSwSwS)

1.00 7~__ ...... 0.95 - ,/ ~ .... -- , ~ .... 0.90 . V/ ~ .. ~ .... ~ ...... -"', .. ~ 0.85 ~ • ..... ",_ . .".,... o - ~",.". ~ 0.80 II) c • NS ~ 0.75 --ESL II) ...... EFL U) .5 0.70 ~ 0.85

0.60

0.55

0.50 We learn to read and write Syllable

(d) FoFrequency in semitone ratios of sentence B4 (wSwSwS)

• NS --ESL ...... EFL

with John at noon Syllable

235 (e) FoFrequency in semitone ratios of sentence B5 (SwSwSwS)

1.00

0.95

0.90

0.85

i 0.80 cCD • NS ~ 0.75 --ESL CD .. oj, .. EFL fI) .5 0.70 ~ 0.65

0.60

0.55

0.50 Mom and dad were mad at Jim Syllable

(:0 FoFrequency in semitone ratios of sentence B6 (SwSwSwS)

1.00

0.95

0.90

0.85 0 ~ 0.80 cCD • NS ~ 0.75 --ESL CD .. oj, .. EFL fI) .5 0.70 ~ 0.65

0.60

0.55

0.50 John is good at bake ing bread Syllable

236 (g) Fo Frequency in semitone ratios of sentence B7 (wSwSwSwS)

1.00 ...-----__------....,

0.95 +------=~:.r-----'IIl.'l<_------I

0.90 -I------,~------'tlk_-___:;.u__~------f

8 ! 0.80 +------,------~_1__-~~-I ~ ~4.;::~NSSI ~ 0.75 -I------~.,.,.----f --ESL .~ ...... EFL .5 0.70 +------f ~ 0.65 +------f

0.60 +------f

0.55 +------f

0.50 +---,-----r-----r---..,..---,---..,.----r---...... j need a ride to work at once Syllable

Figure 6.5 Group Average FO Frequency in semitone ratios for sentences Bl through B7

Five observations can be made based on the results summarized in Table6.10 and

Figure 6.5. First, the Fo level of NS tends to rise at strong syllables and lower at weak syllables. However, their strong syllables are not always produced with high pitch accents. Occasionally, they produce low pitch on strong syllables and high pitch on weak syllables. For example, the following pitch pattern is found in some speakers for sentence B4.

(27)1 met with John at noon. H*+H L* H*L-L%

In this case, the Fo stays high on the weak syllable with due to the inflectional high tone as part of the high pitch accent of the preceding syllable and drops at the strong syllable John due to a low pitch accent. Thus for NS Fo level often but does not always rise at strong syllables and fall at weak syllables. In contrast, the Fo patterns of ESL and EFL are highly consistent with the target stress patterns. Unlike NS, their pitch level 237 appears almost absolutely correlated with the target stress patterns, rising at strong syllables and lowering at weak syllables.

Second, the rise and fall of semitone ratios is generally flatter for NS than for ESL and EFL. It seems that the FO level of ESL and EFL rises more at strong syllables and falls more at weak syllables than with NS. We often observe a more pronounced zigzag

Fo pattern in the speech of ESL and EFL. This phenomenon is particularly pronounced in sentences B3 and B5.

Third, there is a stronger and more consistent tendency for Fo to gradually decline toward the end of a sentence in the speech of NS than in the speech of ESL and EFL. Because of declination, each strong syllable tends to have its semitone ratio higher than that of the next strong syllable, and each weak syllable tends to have its semitone ratio higher than that of the next weak syllable, resulting in a downsloping zigzag pattern. This declination tendency is often weaker in the speech of ESL and EFL as in sentences B5,

B6, and B7. As a result, Fo level often finishes lower at the end of a sentence for NS than for ESL and EFL.

Fourth, ESL and EFL tend to produce noticeably lower FOs on initial syllables than NS. The phenomenon is particularly pronounced when the initial syllable is weak. ESL and EFL produce lower Fo on the initial syllables in sentences Bl, B2, B3, B4, B5, and B7, all of which are weak except for B5.

Fifth, ESL and EFL perform very similarly in all Type B sentences. Not only do their pitches rise and fall at identical syllables, but also with similar semitone ratios.

6.2.3 SIGNIFICANTDIFFERENCES BETWEENNS, ESLAND EFL

Student's t-tests were performed on individual syllables to determine whether or not the obtained differences in semitone ratios between pairs of groups were statistically significant or likely by chance. Results of the t-test scores for semitone ratios of individual syllables between NS, ESL andEFL are shown in Table 6.11.

238 Table 6.11 Student's t-test scores for semitone ratios of individual syllables between groups for Type B sentences

Group t Bl _n~d it back by noon NS-ESL 0.245 0.136 1.569 0.815 0.009 NS-EFL 0.365 0.195 0.305 0.537 0.172 ESL-EFL 0.228 0.687 0.523 1.333 0.566 0.226 B2 They play with dad and laugh NS-ESL 0.626 0.402 1.445 0.270 0.651 1.209 NS-EFL 1.085 0.115 1.767 0.358 0.904 0.714 ESL-EFL 0.012 0.355 0.468 0.118 0.511 0.783 B3 We learn to read and write NS-ESL 2.653* 0.303 1.927 0.567 0.221 NS-EFL 0.565 1.464 0.089 0.795 ESL-EFL 0.990 1.754 0.988 0.400 0.802 1.183 B4 I met with John at noon NS-ESL 2.788* 1.254 0.622 0.612 0.146 0.849 NS-EFL 2.268* 1.617 0.529 0.919 0.552 1.308 ESL-EFL 0.643 1.463 0.401 0.269 0.735 0.224 B5 Mom and dad were mad at Jim NS-ESL 1.977 1.149 0.618 1.015 2.310* 0.516 0.714 NS-EFL 2.081 2.014 1.806 0.644 0.609 2.381* ESL-EFL 0.030 1.541 1.703 0.296 1.423 0.232 1.850 B6 John is good at bake -ing bread NS-ESL 0.496 0.322 1.646 0.145 1.172 1.236 1.078 NS-EFL 0.792 0.098 2.076 0.936 1.150 1.643 1.081 ESL-EFL 0.387 0.139 0.756 1.174 0.139 0.962 0.033 B7 I need a ride to work at once NS-ESL 2.451* 1.119 0.342 1.791 0.575 1.215 0.704 2.165* NS-EFL 2.298* 1.179 0.099 2.421* 1.739 1.557 1.153 2.332* ESL-EFL 0.057 0.072 0.519 0.170 1.247 0.278 1.079 0.216 *p<.05 (tcrit=2.101); **p<.OI(tcrit=2.878); df=18; two-tailed

In the following two subsections, we will consider which syllables are significantly different in semitone ratios between pairs of groups. Section 6.2.3.1 reports and compares differences between NS and ESL and differences between NS and EFL. Results will provide crucial information about the kinds of difficulties TM speakers face with pitch as a correlate of stress, as well as in what ways ESL and EFL speakers are similar or different in their difficulties with pitch. Section 6.2.3.2 examines significant differences between ESL and EFL. Results in this section will address changes that might have taken place between EFL and ESL in the way pitch correlates with stress, if there are indeed any changes.

239 6.2.3.1 Differences between NS and ESL vs. differences between NS and EFL A slightly larger number of syllables is found to be significantly different between NS and EFL than between NS and ESL. Seven syllables are found to be significantly different between NS and ESL and nine syllables are found significantly different between NS and EFL out of all 46 syllables compared.

Table 6.12 Number of strong and weak syllables with semitone ratios significantly different from NS in non-final vs. final positions for Type B sentences

Position Non-Final Final Stress Strong Weak: Strong Weak: Total (1<=17) (k=22) (1<=7) (k=O) differ Type higher lower higher lower higher lower higher lower ESL 2 4 1 -- 7 EFL 3 4 2 -- 9

Differences between NS and ESL and differences between NS and EFL show very similar patterns. They both comprise a small number of higher-pitched strong syllables in final and non-final positions and lower-pitched weak syllables in non-final positions. The seven syllables that are significantly different between NS and ESL comprise two higher­ pitched strong syllables in non-final position, four lower-pitched weak syllables in non­ final position, and one higher-pitched strong syllable in final position. The nine syllables that are significantly different between NS and EFL comprise three higher-pitched strong syllables in non-final position, four lower-pitched weak syllables in non-final position, and two higher-pitched strong syllables in final position.

Adding to the striking similarities between ESL and EFL, all of the relatively lower-pitched weak syllables produced by ESL and EFL occur at initial positions. Five of the Type B sentences begin with weak syllables. This suggests that there is a very strong tendency for ESL and EFL to produce relatively lower semitone ratios on initial weak syllables.

In summary, ESL and EFL are similar in terms of the types of differences found between them and NS. The differences center on two types of syllables, namely,

240 relatively higher-pitched strong syllables and relatively lower-pitched weak syllables. Lower semitone ratios on weak syllables almost always occur in initial position. Results confmn earlier observation that for ESL and EFL the Fo level tends to start lower and end higher than for NS. The number of differences between NS and ESL is slightly smaller than the differences found between NS and EFL.

6.2.3.2 Significant differences between ESL and EFL This section reports significant differences between ESL and EFL in semitone ratios for the Type B sentences. Syllables that are significantly different between the two groups are further examined under three categories: strong syllables in non-final position, weak syllables in non-final position, and strong syllables in final position.

Table 6.13 Number of strong and weak syllables with semitone ratios significantly different between EFL and ESL in non-final vs. final positions for Type B sentences

Position Non-Final Final Non- Final Stress Strong Weak Strong Weak (k=17) (k=22) (k=7) (k=O) Type higher lower higher lower higher lower higher lower EFLvs.ESL 0 0 0 0 0 0 -- 0 ESUNS only 0 0 0 0 0 0 - - 0 EFUNSonly 0 0 0 0 0 0 -- 0 EFUESUNS 0 0 0 0 0 0 -- 0 ..ESUEFL only 0 0 0 0 0 0 -- 0

Results from Student's t-tests showed that ESL and EFL do not differ significantly in the semitone ratios of any syllables in Type B sentences. No significant differences can be established between ESL and EFL in all seven Type B sentences. Results are consistent with findings in the preceding section that ESL and EFL produced highly similar pitch patterns when the sentences feature highly regular alternation between strong content word and weak function word syllables.

241 6.2.4 CORRELATION OF PITCH PATTERNS BETWEEN SPEAKER GROUPS

Pearson Product-moment correlation coefficients were obtained to estimate how closely semitone ratios covary from syllable to syllable between pairs of subject groups. Correlation tests were performed between NS and ESL, NS and EFL, and ESL and EFL respectively for all seven Type B sentences. Complete results of the correlation tests are summarized in Table 6.14.

Table 6.14 Pearson Product-moment Correlation Coefficients for mean semitone ratios between groups for Type B sentences

Sentence

Bl B2 B3 B4 B5 B6 B7 *p<.05, **p<.Ol

There are three major observations. First, correlation of semitone ratios tends to be stronger between NS and ESL than between NS and EFL. Correlation coefficients between NS and ESL are found to be significant in four sentences, while correlation coefficients between NS and EFL are significant in only two sentences. Second, semitone ratios of ESL and EFL correlate positively and strongly. Significant correlations are found between ESL and EFL in all seven Type B sentences.

6.2.5 TEST REUABIliTY

This section summarizes the test-retest reliability between the three sets of administrations of all seven Type B sentences. Three reliability indices were obtained per sentence per group. High test-retest reliability indicates that speakers are consistent in their pitch patterns.

242 Table 6.15 Test-retest reliability for syllable semitone ratios from three productions of Type B sentences

Sentence Administration Test-retest r NS ESL EFL Bl First & Second 0.9409** 0.9030* 0.9704** First & Third 0.9359** 0.8361* 0.9624** Second & Third 0.9950** 0.9814** 0.9707** B2 First & Second 0.9023* 0.9414* 0.8871* First & Third 0.8997* 0.9696* Second & Third 0.9803** 0.8588* 0.8421* B3 First & Second 0.9412** 0.9938** 0.9692** First & Third 0.9975** 0.9882** 0.9805** Second & Third 0.9455** 0.9938** 0.9868** B4 First & Second 0.8661* 0.9268** 0.9727** First & Third 0.9849** 0.9278** 0.9664** Second & Third 0.8883* 0.9789** 0.9378** B5 First & Second 0.9462** 0.8324* 0.8948** First & Third 0.9687** 0.8921** 0.9319** Second & Third 0.9751** 0.8651* 0.9085** B6 First & Second 0.8439* 0.8849** 0.8791** First & Third 0.9578** 0.9570** 0.8246* Second & Third 0.9278** 0.8064* 0.8406* B7 First & Second 0.8784** 0.9872** 0.9605** First & Third 0.9735** 0.9612** 0.9540** Second & Third 0.9345** 0.9446** 0.9772** *p<.05, **p<.OI

Results of the test-retest reliability indicate that speakers of all groups are consistent with their pitch patterns in all three productions of Type B sentences. Among 63 Pearson Product-moment Correlation coefficients obtained, all except one correlation coefficient from EFL falls below significance at p

6.2.6 SUMMARY OFRESULTS FOR TYPE B SENTENCES

Despite that the overall statistics look quite similar for NS, ESL and EFL, their internal Fa patterns tum out to be somewhat different. First, the Fa levels of ESL and EFL appear unidirectionally correlated with the target stress patterns, with pitches rising on strong content words and lowering on weak function words. However, pitch often but does not always positively correlates with stress for NS. Second, Fa declination toward the end of a sentence tends to be more rapid and more consistent in the speech of NS than in the speech of ESL and EFL. Third, although the average ranges of semitone ratios look quite

243 similar across groups, the semitone ratios of ESL and EFL often oscillate within a broader and more constant pitch range from syllable to syllable, while the semitone ratios ofNS tend to oscillate within a narrower and ever-lowering pitch range.

ESL and EFL differ from NS in very similar ways. Only a slightly larger number of syllables are found to have semitone ratios significantly different between NS and EFL than between NS and ESL. Besides, they share common types of differences. Both ESL and EFL sometimes produce higher-pitched strong syllables in non-final and final position and lower-pitched weak syllables in non-final position.

The Fo patterns of ESL and EFL appear highly similar to each other. Correlation results strongly indicate that their semitone ratios tend to vary together. Also, no significant differences could be found for any syllables of Type B sentences between ESL and EFL. Not only do their pitches rise and fallon the same syllables, but also at similar magnitudes.

The correlation of syllable semitone ratios is the strongest between ESL and EFL and stronger between NS and ESL than between NS and EFL in most Type B sentences, suggesting that semitone ratios vary in highly similar fashion between ESL and EFL and tend to covary more closely between NS and ESL than between NS and EFL. Speakers of all groups are highly consistent in their FO patterns in their productions of Type B sentences. All but one correlation coefficient of all groups is found to be below significance at p

6.3 DISCUSSION OF THE PITCH IN TYPE A AND TYPE B SENTENCES

6.3.i DO TM SPEAKERS HAVE DiFFiCULTY WITH THE PITCH OF STRONG SYLIABLES, THE PITCH OF WEAK SYLIABLES, OR BOTH?

To answer this question, we first look for significant differences between NS and ESL and between NS and EFL. We soon notice that ESL and EFL manifest very similar types

244 of differences within each sentence type. However, the differences within Type A and the differences within Type B sentences show totally opposite patterns.

Table 6.16 Number of strong and weak syllables with semitone ratios significantly different from NS in non-final vs. final positions for Type A and Type B sentences

Position Non-Final Stress

1 B 4 1 4 2

ESL and EFL speakers show difficulty in both strong and weak syllables in Type A sentences. The pitch of their strong syllables is sometimes lower and the pitch of their weak syllables is sometimes higher than that of NS. While ESL show minor difficulty with both strong and weak syllables, EFL show great difficulty with weak syllables. About half of their weak syllables in non-final position are produced with semitone ratios that are significantly higher than those of NS. It seems that the long stretches of weak syllables featured in Type A sentences pose a greater challenge to EFL than to ESL speakers.

ESL and EFL do not show the same kinds of problems they have with Type A sentences in Type B sentences. Instead of lower-pitched strong syllables and higher­ pitched weak syllables, they actually produce the opposite, namely, higher-pitched strong syllables and lower-pitched weak syllables in Type B sentences. Unlike in Type A sentences, these differences are not inconsistent with the target rhythm. On the contrary, they create a more "exaggerated" version of the target rhythm with greater pitch differentiation between strong and weak syllables.

Because Type A sentences contain two types of weak syllables (those that have some stress and those that have no stress) and Type B sentences consist of only weak syllables that are totally unstressed, I wondered whether the weakly stressed syllables in

245 Type A sentences contribute to the relatively greater difficulties that TM speakers have with the semitone ratios of syllables for Type A than for Type B sentences. Table 6.17 shows the number of strong (stressed and accented), weakly stressed (stressed and unaccented), and unstressed syllables with semitone ratios significantly different from those ofNS for Type A sentences.

Table 6.17 Number of strong, weakly stressed, and unstressed syllables with semitone ratios significantly different from NS in non-final vs. final position for Type A sentences

Position Stress

As can be seen in Table 6.17, TM speakers produced relatively lower-pitched strong syllables than NS (2/8 for ESL and 3/8 for EFL). EFL have difficulties sometimes in lowering pitches on the unstressed syllables (3/19). Both ESL and EFL produced relatively higher-pitched weakly stressed syllables. In particular, EFL speakers produced a much larger number of relatively higher-pitched weakly stressed syllables (11/12) than ESL speakers (3/12) for Type A sentences. This suggests that EFL speakers' difficulties with the pitch of weak syllables for Type A sentences are largely due to their relatively higher-pitched weakly stressed syllables.

246 Table 6.18 Group average semitone ratios of strong and weak syllables in non-final and final positions ofType A and Type B sentences

Position Non-Final Final Stress Stron Weak: Stron Weak: GrOll NS ESL EFL NS ESL EFL NS ESL EFL NS ESL EFL A Al 0.91 0.95 0.95 0.77 0.81 0.90 0.78 0.81 0.91 A2 0.96 0.96 0.98 0.76 0.85 0.93 0.76 0.78 0.93 A3 1.00 0.89 0.90 0.83 0.84 0.91 0.89 0.90 0.92 A4 0.99 0.86 0.90 0.84 0.85 0.90 0.87 0.88 0.92 AS 1.00 0.96 0.97 0.82 0.83 0.88 0.82 0.87 0.93 A6 1.00 0.99 0.97 0.82 0.80 0.88 0.82 0.81 0.91 A7 0.95 0.93 0.92 0.83 0.81 0.84 0.81 0.76 0.87

B Bl 0.94 0.92 0.93 0.89 0.86 0.87 0.90 0.90 0.90 B2 0.93 0.93 0.92 0.91 0.87 0.86 0.88 0.93 0.91 B3 0.91 0.95 0.95 0.88 0.84 0.86 0.85 0.84 0.88 B4 0.91 0.94 0.95 0.90 0.86 0.87 0.93 0.91 0.90 B5 0.89 0.89 0.92 0.82 0.80 0.79 0.85 0.87 0.91 B6 0.88 0.93 0.93 0.80 0.83 0.85 0.84 0.88 0.89 B7 0.89 0.92 0.93 0.84 0.84 0.86 0.81 0.87 0.87

Results from the average semitone ratios of strong and weak syllables in Type A sentences further confirm that ESL and EFL have difficulty with the pitch of both strong and weak syllables. The average semitone ratios of their strong syllables are consistently lower and the average semitone ratios of their weak syllables appear consistently higher than those of NS in non-final positions. The results also support the earlier finding that EFL show great difficulty with the pitch of weak syllables in non-final positions. The average semitone ratios of their weak syllables tend to be much higher than those of NS and ESL. Results from the average semitone ratios of strong and weak syllables in Type B sentences also further confirm that ESL and EFL tend to produce slightly higher pitch on strong syllables and slightly lower pitch on weak syllables than NS.

Table 6.19 shows the average semitone ratios of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences.

247 Table 6.19 Group average semitone ratios of strong, weakly stressed, and unstressed syllables in non-final and final position for Type A sentences

Position Non-final Final Stress Strong Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=8) (k=12) (k=19) (k=2) (k=2) (k=3) NS 0.97 0.82 0.81 0.88 0.82 0.79 ESL 0.93 0.86 0.81 0.89 0.82 0.80 EFL 0.87

The results are consistent with earlier findings that for Type A sentences, TM speakers have difficulty strengthening strong syllables and weakening weak syllables in non-final position, including those that are weakly stressed and those that are unstressed. In particular, NS have a two-way pitch distinction between strong (accented) and weak (unaccented) syllables, while ESL have a three-way pitch distinction of strong, weakly stressed, and unstressed syllables, and EFL have a two-way pitch distinction between stressed (primary or not) and unstressed syllables. It appears that EFL speakers are assigning high pitches to both strong and weakly stressed syllables. In final position, NS and ESL realized a distinct pitch differentiation between strong and weak syllables, whereas EFL speakers produced little pitch differentiation between strong and weak syllables.

In summary, ESL and EFL show difficulty with the pitch of both strong and weak syllables in Type A sentences, but not in Type B sentences. Both ESL and EFL produced lower-pitched strong syllables and higher-pitched weak syllables in non-final positions in Type A sentences. These differences could have a negative impact on their speech rhythm if the strong syllables sometimes do not get stressed or stressed enough and the weak syllables sometimes get stressed or are not reduced enough. In contrast, ESL and EFL do not seem to have difficulty with the pitches of strong or weak syllables in Type B sentences. They sometimes produce higher-pitched strong syllables and lower-pitched

weak syllables, but the resulting Fo patterns are consistent with the target stress pattern. Results therefore show that Type A sentences pose greater challep.ge to TM speakers than Type B sentences. A much larger number of differences show up between NS and EFL in Type A than in Type B sentences, largely due to the tremendous difficulty EFL have with 248 the weakly strong syllables in Type A sentences. They tend to assign high pitches to both strong and weakly stressed syllables.

6.3.2 DO TM SPEAKERS USE PITCHASA CORRELATE OF STRESS? Although all the test sentences are framed within a context to induce the target stress patterns, there is no guarantee that speakers will produce the stress patterns exactly as expected. Without assuming that TM speakers intend to produce the same stress patters as English speakers, I examine the extent to which syllable intensity varies with the target stress patterns in Type A and Type B sentences. As a first step, I trace the rise and fall of pitch across each sentence and summarize the results in Table6.20. The pitch of a syllable is considered "+PRPS" when (1) it is higher than its preceding syllable or (2) it is higher than its immediately following syllable if the syllable in question is the initial syllable of the sentence. When the pitch of a syllable is the same as its preceding syllable, its pitch is considered "+PRPS" if the pitch of the preceding syllable is "+PRPS". If the preceding syllable is the initial syllable of the sentence, its pitch is considered "+PRPS" if it is higher than the pitch of the next syllable with a different pitch. The pitch levels of two syllables are arbitrarily treated as equal when their difference in semitones is less than one half of a semitone. Two syllables in Type A and eight syllables in Type B sentences are treated as equal to their adjacent syllables as a result ofthis adjustment.

Table 6.20 Number of strong or weak syllables classified as "+PRPS" or "-PRPS" in final and non-final position for Type A sentences

Position Stress Stron Stron Pitch + + A NS 28 2 2 10 30 ESL 24 2 2 .8 26 EFL 21 2 8 21 B NS 18 7 17 18 ESL 21 7 20 21 EFL 22 6 21 22

249 30....------,

25 +------

GI 20 +------­ ::a .!!! ~ IINS '0 15 +------lii!llli mESL ! IJEFL E ::J Z 10 +-- _

5

o Type A1Strong +PRPS Type ANIeak -PRPS Type B/Strong +PRPS Type BfWeak -PRPS Non-final

Figure 6.6 Number of strong and weak syllables classified as "+PRPS" or "-PRPS" for Type A and Type B sentences

Pitch often rises on strong syllables and falls on weak syllables for speakers of all groups in both types of sentences. Nonetheless, the correlation between pitch level and the target stress patterns is weakest for TM speakers in Type A but strongest in Type B sentences. It appears that TM speakers produce pitch variations that are highly consistent with the target stress patterns within Type B sentences but much less so within Type A sentences. Apparently, TM speakers do not always produce the same distribution ofpitch prominence as English speakers, nor do they adjust the pitch level of a syllable according to the same stress patterns if they do use pitch as a correlate of stress.

The results show that TM speakers seem to vary pitch with the target stress patterns very closely in one type of sentences but not in the other. IfTM speakers had been able to use information from the lead-in sentences to determine which syllables to stress in the target sentences, they would have had no trouble raising pitch on strong syllables and

250 lowering pitch on weak syllables in both Type A and Type B sentences. This suggests that their stress patterns may be guided by principles that are somewhat different from those of English speakers and that their pitch performance may reflect how closely their guiding principles match the target stress patterns. When their guiding principles generate the same distribution of stress as English speakers', TM speakers may produce native­ like pitch patterns, but they may experience difficulty when their guiding principles generate a somewhat different distribution of stress. The key question is what principles TM speakers subscribe to in determining the distribution of stress. Or more specifically, we want to know what makes the stress patterns of Type B sentences easier for TM speakers to produce than those ofType A sentences.

To answer this question, I take one step further by examining the relationships between pitch and stress levels. The results are summarized in Table 6.21.

Table 6.21 Number of "+PRPS" vs "-PRPS" strong, weakly stressed, vs. unstressed syllables in non-final vs. final position for Type A sentences

Position Non-fmal Final Stress Strong Weak Stressed Unstressed Strong Weak: Stressed Unstressed (b8) (k=12) (k=19) (k=2) (k=2) (k=3) Type +PRPS -PRPS +PRPS -PRPS +PRPS -PRPS +PRPS -PRPS +PRPS -PRPS +PRPS -PRPS NS 8 1 11 2 17 2 1 1 2 1 ESL 6 2 5 7 2 17 2 1 1 2 1 EFL 6 2 9 3 1 18 2 2 3

First, TM speakers are more likely to produce -PRPS strong syllables and +PRPS weakly stressed syllables than English speakers. Neither ESL nor EFL speakers seem to have problems with producing "-PRPS" unstressed syllables.

English speakers seem to rely on the lead-in sentences in determining the pitch patterns. Their pitch patterns generally rise at syllables that are emphasized and fall at syllables that are not. However, the pitch patterns of TM speakers seem to be influenced by stress, regardless of context. While TM speakers have little trouble raising pitch on strong or lowering pitch on unstressed syllables, they have trouble lowering pitch on

251 weakly stressed syllables. EFL, in particular, are more likely to produce +PRPS weakly stressed syllables than NS.

Pitch level of syllables in final position is a somewhat different story. Pitch level often rises on final syllables for both TM and English speakers. Five of the seven final syllables were classified as "+PRPS" for NS and ESL and all seven were classified as "+PRPS" for EFL.

Table 6.22 Average semitone ratios of strong, weakly stressed, and unstressed syllables in non-final and final positions for Type A and Type B sentences

Position Non-fmal Final Stress Weak Stressed Unstressed Strong Weak Stressed Unstressed (k=12) (k=19) (k=2) (k=2) (k=3) NS 0.82 0.81 0.88 0.82 0.79 A ESL 0.86 0.81 0.89 0.82 0.80 EFL 0.87 0.92 0.90 0.92 NS 0.86 0.87 B ESL 0.84 0.89 EFL 0.85 0.89

Similar to NS, ESL speakers are capable of using pitch to distinguish at least three stress levels (strong, weakly stressed, and unstressed), although the pitch differentiation between these three types of syllables is not as distinct for ESL as it is for NS. Instead of making a three-way distinction (strong, weakly stressed, and unstressed), EFL speakers, seem to make a binary distinction (stressed vs. unstressed). They do not produce a clear pitch distinction between strong syllables and weakly stressed syllables. Note that EFL speakers tend to assign high pitches to stressed syllables (primary or not). This suggests that EFL speakers may not distinguish as many levels of stress as English speakers do.

While the current findings may serve well in projecting TM speakers' pitch patterns for sentences comprising monosyllabic words, caution should be taken with how the generalization could be extended to predict the pitch patterns of multi-syllabic words. A workable experiment would have both TM and English speakers produce both types of sentences illustrated in the following examples. Target sentences are shown in italics.

252 Each target sentence is preceded by a brief sentence to elicit the target stress patterns. TM and English speakers would be compared with respect to their pitch patterns over the target multi-syllabic words.

(37) Sample stimuli Type One: I don't party with friends BEFORE eleven. I party withfriends AFTER eleven. Type Two: I don't STUDY with friends after eleven. I PARTYwithfriends after eleven.

Type One sentences feature weak multi-syllabic content words and strong multi­ syllabic function words. Type Two sentences feature strong multi-syllabic content words and weak multi-syllabic function words. Type One and Type Two sentences form a minimal pair of strong and weak multi-syllabic words to test how context impacts the pitch patterns of a multi-syllabic word. Type One and Type Two sentences also form a minimal pair of strong content words and strong function words as well as a minimal pair of weak .content words and weak function words. If TM speakers indeed assign high pitches to strong syllables in general but not to weak syllables, they are expected to produce high pitches on strong syllables regardless of context.

6.3.3 DO TM SPEAKERS PRODUCE SMALLER PITCH CONTRASTS BETWEEN STRONG AND WEAK SYLLABLES THAN ENGliSH SPEAKERS?

Despite that TM speakers may appear to have different types or degrees of difficulty in the pitch of strong and/or weak syllables, it is important to look at the contrast in pitch between strong and weak syllables. After all, it is the relative prominence among syllables that create the rhythm. In other words, the differences in strong and weak syllables between TM and English speakers may not be disruptive to the speech rhythm of TM speakers if they maintain a similar amount of contrast between strong and weak syllables as do English speakers. In this section, we focus on the examination of pitch contrasts between strong and weak syllables in non-final and final positions for Type A sentences and in non-final positions for Type B sentences. Only non-final syllables will be discussed for Type B sentences because all final syllables are strong. Pitch contrasts between strong and weak syllables in non-final and final positions will be discussed separately because according to our earlier findings they seem to behave differently. 253 I focus fIrst on the pitch contrasts between target strong and weak syllables.

Table 6.23 Group average semitone ratios of strong vs. weak syllables in non-fInal position for Type A sentences

Position Non-Final GrOll NS ESL EFL Stress S w SW S w Al 0.91 0.77 0.95 0.81 0.95 0.90 A2 0.96 0.76 0.96 0.85 0.98 0.93 A3 1.00 0.83 0.89 0.84 0.90 0.91 A4 0.99 0.84 0.86 0.85 0.90 0.90 AS 1.00 0.82 0.96 0.83 0.97 0.88 A6 1.00 0.82 0.99 0.80 0.97 0.88 A7 0.95 0.83 0.93 0.81 0.92 0.84 Mean 0.97 0.81 0.93 0.83 0.94 0.89 NS-NNS 0.04 -0.02 0.03 -0.08

254 1.00 .,...------..,

0.95 +------;---'~------I ...... o 0.90 +------"""'r-~,------'''----c!..--;;::-..-....A------i ~ Gl C - • NS ~ 0.85 +------"'~------I _-ESL ~ .. oj, .. EFL .5 o II. 0.80 +------,------1

0.75 +------1

0.70 ~------..,.._------! Strong Non-final Weak Non-final Type A Sentences

Figure 6.7 Group average semitone ratios of strong vs. weak syllables in non-final position for Type A sentences

1M speakers generally produce smaller pitch contrasts between strong and weak syllables in non-final positions than English speakers in Type A sentences. EFL on average produce smaller pitch contrasts between strong and weak syllables in non-final positions than do ESL, who in turn produce smaller pitch contrasts between strong and weak syllables in non-final positions than NS. Lower semitone ratios on strong syllables and higher semitone ratios on weak syllables contribute to the relatively smaller pitch contrast for ESL and EFL.

For ESL speakers, the semitone ratios of their non-final strong syllables are on average lower than those of NS by 0.04, whereas the semitone ratios of their non-final weak syllables are on average higher than those of NS by 0.02. For EFL speakers, the semitone ratios of their non-final strong syllables are on average lower than those of NS by 0.03, whereas the semitone ratios of their non-final weak syllables are on average 255 higher than those of NS by 0.08. The average difference in semitone ratios between strong and weak syllables in non-final positions for Type A sentences is 0.16 for NS, 0.10 for ESL, and 0.05 for EFL speakers.

Table 6.24 Group average semitone ratios of strong vs. weak syllables in final position for Type A sentences

Position Final GrOll NS ESL EFL Stress S W S W S W Al 0.78 0.81 0.91 A2 0.76 0.78 0.93 A3 0.89 0.90 0.92 A4 0.87 0.88 0.92 AS 0.82 0.87 0.93 A6 0.82 0.81 0.91 A7 0.81 0.76 0.87 Mean 0.88 0.81 0.89 0.81 0.92 0.91 NS-NNS -0.01 0.00 -0.04 -0.10

256 1.00,...------..,

0.95 -1------1 ...... 0.90 -1------1 ~ ~ • NS !! 0.85 +------==""""'"":.::-...,------1 --ESL ~ ..... EFL ~ 0.80 +------1

0.75 +------.-1

0.70 +------...,------J! Strong Rnal WeakRnai Type A Sentences

Figure 6.8 Group average semitone ratios of strong vs. weak syllables in final position for Type A sentences

Similar to what we have found with the intensity in final position (see Figure 5.9), NS and ESL produce highly similar pitch contrasts between strong and weak syllables. For both NS and ESL, their strong syllables appear higher in semitone ratios than the weak ones. The semitone ratios of their strong syllables are almost identical, as are the semitone ratios of their weak syllables. EFL speakers, however, produce little pitch contrast between strong and weak syllables in final position. Their semitone ratios remain relatively high for both strong and weak syllables.

Aside from comparing the intensities of strong vs. weak syllables, one may wonder how alike the pitches of NS, ESL, and EFL speakers would be, if I compared only the strong vs. unstressed syllables for Type A sentences. Table 6.25 presents the duration contrasts between strong and unstressed syllables for Type A sentences.

257 Table 6.25 Pitch contrasts (ratios) between strong and unstressed syllables in non-final and final position for Type A sentences

Position Non-fmal Final Stress Stron Unstressed Unstressed NS 0.97 0.81 0.79 ESL 0.93 0.81 0.80 EFL 0.94 0.87 0.92

Alternatively, I compared all stressed syllables (primary or not) vs. all totally unstressed syllables.

Table 6.26 Pitch contrasts (ratios) between stressed and unstressed syllables in non-final and final position for Type A sentences

Position Non-fmal Final Stress Stressed Unstressed Stressed Unstressed NS 0.87 0.81 0.85 0.79 ESL 0.88 0.81 0.85 0.80 EFL 0.93 0.87 0.91 0.92

The results show that when pitch contrasts were measured between strong and weak syllables or between strong and unstressed syllables, NS consistently produced grater pitch contrasts than ESL, who in tum, produced greater pitch contrasts than EFL for Type A sentences. However, when pitch contrasts were measured between stressed and unstressed syllables, NS and TM speakers are very similar. In particular, the average semitone ratio of EFL speakers is much higher than that of N~ because EFL speakers produce relatively higher-pitched weakly stressed syllables.

258 Table 6.27 Group average semitone ratios of strong vs. weak syllables in non-final position for Type B sentences

Position Non-Final GrOll NS ESL EFL Stress S w S w S w Al 0.94 0.89 0.92 0.86 0.93 0.87 A2 0.93 0.91 0.93 0.87 0.92 0.86 A3 0.91 0.88 0.95 0.84 0.95 0.86 A4 0.91 0.90 0.94 0.86 0.95 0.87 AS 0.89 0.82 0.89 0.80 0.92 0.79 A6 0.88 0.80 0.93 0.83 0.93 0.85 A7 0.89 0.84 0.92 0.84 0.93 0.86 Mean 0.91 0.86 0.93 0.84 0.93 0.85 NS-NNS -0.02 0.02 -0.02 0.01

1.00 ....------,..;..------,

0.95 1------1

o 0.90 I------~~...... ;;::::""O....=__.._------I ~ • NS i 0.85 1------':"""'O'odIl,.------I _-ESL ~ ... 'II!lI. ... EFL .5 o u.. 0.80 1------1

0.75 1------1

0.70 I....- .l..- ---I

Strong Final Weak Final Type B Sentences

Figure 6.9 Group average semitone ratios of strong vs. weak syllables in non-final position for Type B sentences

In contrast to Type A sentences, NS, ESL and EFL are overall very similar in terms of the semitone ratios of their strong and weak syllables. ESL and EFL produced slightly greater pitch contrasts between strong and weak non-final syllables than NS for the Type 259 B sentences. ESL and EFL generally produce slightly higher semitone ratios on strong syllables and lower semitone ratios on weak syllables than do NS.

In summary, TM speakers produce smaller pitch contrast between target strong and weak syllables than NS in non-final positions in Type A sentences, while producing slightly greater pitch contrast than NS in Type B sentences. When pitch contrasts are measured as the differences in semitone ratios between strong and weak syllables instead of between strong and weak syllables, I notice increased similarities in pitch contrasts between NS and TM speakers in Type A sentences. By the analyzing the semitone ratios of strong, weakly stressed, and unstressed syllables, I found that NS have a binary pitch distinction between strong and weak syllables while EFL speakers have a binary pitch distinction between stressed and unstressed syllables. ESL speakers, on the other hand, show a three-way pitch distinction for these three types of syllables. The results suggest that TM speakers produce high pitches on a larger number of syllables than NS speakers for Type A ,sentences. The results also suggest that TM are able to correlate pitch with stress, although they may not share the same binary pitch distinction between strong and weak syllables that English speakers have.

6.3.4 DO TM SPEAKERS WITH IMPROVED PROFICIENCYAND INCREASED EXPOSURE TO ENGLISH PRODUCE PITCH PATTERNS THATARE CLOSER TO THE TARGET SPEECH RHYTHM?

There are several indications from the results that ESL produce pitch patterns that are closer to target than the less proficient EFL.

260 Table 6.28 A comparison Fa results between ESL and EFL

Mean Pitch Standard Range of No.ofSyI. Correlation Contrast between Deviation of Semitone Different from Coefficients Stron &Weak Semitone ratios Ratios NS withNS Grou NS ESL EFL NS ESL EFL NS ESL EFL ESL EFL ESL EFL Al 0.14 0.14 0.05 0.07 0.08 0.04 0.18 0.19 0.10 1 2 0.73 0.40 A2 0.20 0.11 0.05 0.09 0.08 0.04 0.24 0.19 0.10 1 3 0.85 0.81 A3 0.17 0.05 -0.01 0.07 0.05 0.04 0.19 0.14 0.14 1 4 0.64 0.27 A4 0.15 0.01 0.00 0.06 0.04 0.05 0.17 0.12 0.16 1 4 0.33 0.18 AS 0.18 0.13 0.09 0.08 0.05 0.05 0.23 0.18 0.13 1 4 0.79 0.42 A6 0.18 0.19 0.09 0.08 0.08 0.05 0.23 0.22 0.12 0 2 0.68 0.31 A7 0.12 0.12 0.08 0.06 0.07 0.06 0.19 0.22 0.19 0 2 0.89 0.64

First of all, the basic statistics of the pitch results throughout the sentences supports that ESL speakers behave closer to NS than EFL speakers do. The mean pitch contrasts, standard deviations, and ranges of semitone ratios for this group almost always fall between those of NS and those of EFL. Not only do EFL produce the highest mean semitone ratio, they also produce the smallest variation, and the narrowest range of semitone ratios in a sentence compared with NS and ESL. The strength of ESL over EFL is particularly pronounced in Type A sentences, whose target pitch patterns prove to be more difficult for TM speakers.

Second, despite similarities in the types of significant differences between NS and ESL and between NS and EFL within each sentence type, differences between NS and ESL are fewer than those between NS and EFL. A total of five syllables are found to be significantly different between NS and ESL as opposed to 21 syllables between NS and EFL in Type A sentences. A total of seven syllables are found to be significantly different between NS and ESL as opposed to nine syllables between NS and EFL in Type B sentences. Again, the strength of ESL over EFL is more salient in Type A sentences,

261 where EFL struggle with many of the weak content syllables. Significantly higher semitone ratios are found on three weak content syllables in the speech of ESL as opposed to 12 such syllables in the speech ofEFL in Type A sentences.

Third, correlation coefficients of syllable-to-syllable semitone ratios across sentences have been consistently stronger between NS and ESL than between NS and EFL for both Type A and Type B sentences. Although very few significant correlations are established, a larger number is found between NS and ESL than between NS and EFL. Significant correlation is found in two sentences between NS and ESL but none between NS and EFL in Type A sentences. Correlation coefficients are found to be significant in four Type B sentences between NS and ESL, but in only two sentences between NS and EFL.

Last but not the least, ESL speakers do better than EFL in their distribution of pitch prominences. When the target stress patterns and the lexical categories of words clash, namely, when content words are weak and function words are strong, ESL relies less heavily on lexical category than EFL for clues. For example, ESL speakers raise pitch on five, while EFL raise pitch on nine of the 12 weak content syllables in the non-final position ofType A sentences.

6.3.5 SUMMARYFOR THE DISCUSSION OFPITCH ESL and EFL show difficulty with the pitch of strong and weak syllables in Type A sentences, but not in Type B sentences. ESL and EFL produce lower-pitched strong syllables and higher-pitched weak syllables in non-final positions in Type A sentences. In contrast, ESL and EFL do not seem to have difficulty with pitch in Type B sentences, where they consistently produced +PRPS strong syllables and -PRPS weak syllables. Although they sometimes produce higher-pitched strong syllables and 'lower-pitched weak syllables in Type B sentences, their resulting Fo patterns are consistent with the target stress pattern.

262 TM speakers generally correlate pitch with the target stress patterns, although the correlation is not as strong as it is for English speakers. In particular, the weakly stressed syllables of EFL speakers are almost as high-pitched as their strong syllables. EFL speakers produce a binary intensity distinction (stressed vs. unstressed). For NS, the binary pitch distinction is between strong (accented) and weak (unaccented) syllables. For ESL speakers, there is a three-way pitch distinction for the three types of syllable (strong, weakly stressed, unstressed). EFL speakers produce a binary intensity distinction (stressed vs. unstressed).

When comparing pitch contrasts between target strong and weak syllables, I found that TM speakers produce smaller pitch contrasts than English speakers in Type A .sentences but slightly greater pitch contrasts in Type B sentences. The inconsistency was later traced back to the relatively higher-pitched weakly stressed syllables produced by TM speakers in Type A sentences. When comparing pitch contrasts between strong and unstressed syllables, I found that'NS produced greater pitch contrasts than ESL and EFL speakers. But when comparing pitch contrasts between stressed and unstressed non-final syllables for Type A sentences, I found little differences between NS and TM speakers. The native-like pitch contrasts between stressed and unstressed syllables for TM speakers can be attributed to their relatively higher-pitched weakly stressed syllables, which raise the average semitone ratios of stressed syllables. The results also show that TM speakers, especially ESL, are capable of producing sufficient pitch contrasts between strong and weak syllables despite occasional misplacement of stress and that ESL are better than EFL in maintaining native-like pitch differentiation between strong and weak syllables.

Various aspects of the Fa results support ESL speakers' strength over EFL speakers with their ability to use pitch as a correlate of stress in English. Basic statistics show that ESL behave more similarly to NS than EFL in terms of mean, standard deviation, and range of semitone ratios. Additionally, ESL speakers differ from NS in a smaller number of syllables than EFL speakers do. The variations of semitone ratios from syllable to syllable also co-vary more strongly between NS and ESL than between NS and EFL throughout Type A and Type B sentences. Most of all, variations of pitch level are more

263 consistent with the target stress patterns in the speech of ESL than in the speech of EFL. ESL appear more attentive to contextual clues than EFL, who very often ignore contextual clues and rely heavily on syllable type to map out the pitch patterns.

264 CHAPTER 7: COORDINATION AMONG DURATION, INTENSITY, AND PITCH

This chapter integrates what I have learnedlrom the previous three chapters about the use of duration, intensity, and pitch as correlates of stress in English. The purpose of this investigation is to identify any similarities and/or differences between Chinese and English speakers about the ways and the extent to which duration, intensity, and pitch coordinate with one another in the realization of stress. Three different measures are used for comparing Chinese and English speakers with respect to the coordination among these three variables.

First, I compare duration, intensity, and pitch in terms of the number and type of significant differences each reveals between groups: English native speakers and the two groups of Chinese speakers. The results provide information about whether each variable presents similar types and degrees of difficulties to each group of the Chinese speakers and how the coordination patterns may have changed from the intermediate level EFL speakers to the more proficient ESL speakers.

Second, I compare duration, intensity, and pitch with respect to their variations across syllables of the sentences. For both Type A and Type B sentences, I count the number of stressed versus unstressed syllables where duration, intensity, and pitch values are each "+SRPS" (Strength Relative to Preceding Syllable) or "-SRPS" from their preceding syllables. The respective acoustic value of a syllable is considered "+SRPS" when (1) it is higher in value than its preceding syllable or (2) it is higher than its immediately following syllable if the syllable in question is the initial syllable of the sentence. When the acoustic value of a syllable is the same as its preceding syllable, it is considered "Raised" if the preceding syllable is "+SRPS". Ifthe preceding syllable is the initial syllable of the sentence, it is considered "+SRPS" if it is higher than the next " syllable with a different value. The pitches oftwo syllables are arbitrarily treated as equal when their difference in semitones is less than one half of a semitone and the lengths of

265 two syllables are arbitrarily treated as equal when their difference is equal to or less than 20 milliseconds.

Third, I compare duration, intensity, and pitch in terms of the differentiation between stressed and unstressed syllables for each of the three subject groups. The average duration, intensity, and pitch of stressed versus unstressed syllables are expressed as ratios between zero and one. With pitch, the ratios are obtained by dividing the semitone of a syllable by the highest semitone of the sentence. With intensity, the ratios are obtained by dividing the peak intensity of a syllable by the maximum intensity of the sentence. With duration, the ratios are obtained by dividing the absolute length of a syllable by the length ofthe longest syllable in the sentence.

Note that here I normalize duration using an approach that is different from before. Earlier I divided the absolute length of a syllable by the length of the entire sentence. The purpose of that normalization procedure was to eliminate speech rates as a potential variable. Here I place the three variables within the same scale so that I can compare them. Neither of the two normalization procedures changes the relative duration relationships among syllables of the sentences because with each approach I divide the duration of the syllables by a constant. The current approach differs from the previous one in that it generates a scale that is consistent with the one used for representing relative pitch and intensity. But at the same time, I need to bear in mind that the new approach does not eliminate speech rates as a potential variable. Therefore, the new set of duration ratios is only appropriate for showing the "relative" duration relationships among syllables within each subject group, and should not be used for making direct comparisons between English native speakers and the two groups of Chinese speakers, where I have observed major differences in speech rates.

The average duration, intensity, and semitone ratios of stressed versus unstressed syllables in non-final versus final positions are obtained for each type of sentence for each subject group. The results reveal the relationships between the three variables, Le., whether they reinforce or cancel one another in the differentiation between stressed and unstressed syllables. 266 In the next two sections, I will apply each of the three comparative measures separately, first to syllables in non-final position and then to syllables in final position. TIiis is because I have learned repetitively from earlier findings that these two types of syllables do not behave the same way prosodically.

7.1 THE COORDINATION OF DURATION, INTENSITY, AND PITCH OF SYLLABLES IN NON-FINAL POSmONS

Next I analyze the coordination among duration, intensity, and pitch for the syllables in non-final positions in three subsections. Section 7.1.1 analyzes the coordination of the three variables in terms of the significant differences between NS and the two groups of Chinese speakers. Section 7.1.2 analyzes the number of strong and weak syllables that are "+SRPS" or "-SRPS" for each of the three variables. Section 7.1.3 analyzes the extent to which the three variables are correlated with one another in the differentiation between stressed and unstressed syllables.

7.1.1 SIGNIFICANT DIFFERENCES IN DURATION, INTENSITY, AND PITCH

This section investigates whether Chinese speakers encounter similar types and degrees of difficulties with each variable. For each group of Chinese speakers, I compare the three variables in terms of the number and type of stressed and unstressed syllables classified as having significantly greater or smaller ratios than English speakers at p

267 Table 7.1 Number of strong and weak non-final syllables for Type A sentences with significant greater or smaller duration, intensity, or semitone ratios than those ofNS

T NPosition Non-Final Stress Stron Weak (~k=....;;3--,1),-- __ Si 'ficant differences Greater Smaller ESL vs. NS I-D==-=ur:.:;:ati=:;'o::.:;n:....-+- _ 3 Intensi Pitch EFLvs.NS Duration 2 Intensi Pitch 1 1 EFL vs. ESL 1--==-='---+-----Duration Intensi Pitch *Significant differences that weaken the contrast between stressed and weak syllables

For non-final syllables of Type A sentences, the significant differences in duration, intensity, and pitch between English native speakers and the two groups of TM speakers highlight two types of difficulties: weaker strong syllables and stronger weak syllables. TM speakers generally have greater trouble weakening weak syllables than strengthening strong syllables except for pitch for ESL speakers, and for duration for EFL speakers. That the three variables go wrong in similar ways suggests that these variables might be coordinated with one another for TM speakers.

Although ESL and EFL do not evidence the same degrees of difficulty with each of the three variables, the relative ranking of the variables in terms of difficulty level is the same for these two groups ofTM speakers. Here the difficulty level is defined as the total number of significantly different syllables that potentially contribute toward the weakening of the target speech rhythm, including weaker strong syllables and stronger weak syllables. Stronger strong syllables and weaker weak syllables produced by TM speakers are not regarded as difficulties because they increase the contrast between strong and weak syllables. For both groups of TM speakers, intensity ranks as the topmost difficult correlate of stress, with duration being the second, and pitch the third. Of the 39 weak non-final syllables compared for Type A sentences, the intensity ratios of 24, the duration ratios of 14, and the semitone ratios of 5 syllables are significant different

268 between English and ESL speakers, whereas the intensity ratios of 27, the duration ratios of 21, and the semitone ratios of 17 syllables are significantly different between English and EFL speakers. Note that pitch appears only slightly difficult for the ESL speakers, but it is almost as difficult as duration for EFL speakers. From these results I learn that not only do the two groups of TM speakers have similar types of difficulties with each of these three variables, their relative ranking in tenns of difficulty level is also similar, although EFL speakers experience greater difficulty with each variable than do ESL speakers.

That the significant differences in duration, intensity, and pitch between ESL and EFL speakers show highly similar patterns also provides supporting evidence that the three variables are in sync with one another. All of the significant differences with intensity and pitch and the majority of the significant differences with duration involve the relatively stronger weak syllables produced by EFL speakers. Of the 27 significant differences between ESL and EFL speakers, 10 involve relatively longer durations, seven involve relatively greater intensity, and eight involve relatively higher pitch on weak syllables on the part ofEFL speakers.

Table 7.2 Number of strong and weak non-final syllables with duration, intensity, or semitone ratios significantly greater or smaller than those ofNS for Type B sentences

T BlPosition Non-Final Stress Stron Weak <>-",-k=_2_2.£-)__ Si 'ficant differences Greater Smaller ESL vs. NS j-:D::...ur=ati:;=.:·o::=n:.--+_---=l:.-_f-- _ 2 Intensi 4 1 ·Pitch 2 4 EFLvs.NS Duration 4 Intensi 4 3 Pitch 3 4 EFLvs.ESL Duration 3 Intensi 1 Pitch *Significant differences that weaken the contrast between strong andweak syllables

Compared with Type A sentences, the number of significant differences between English native speakers and the two groups of TM speakers is much smaller for Type B

269 sentences (see Table 7.2) except for duration for EFL speakers. EFL speakers produce a very high frequency of relatively longer weak syllables than English native speakers. Of the 22 weak non-final syllables for Type B sentences produced by EFL speakers, 14 are significantly relatively longer than those of NS. While EFL speakers encounter great difficulty with duration for both Type A and Type B sentences, ESL speakers have markedly less difficulty with duration in Type B seIitences.

With regard to the types of significant differences in these three variables for Type B sentences, I notice a consistent pitch pattern across the two groups of TM speakers. ESL and EFL speakers produce relatively greater semitone ratios on strong syllables and relatively smaller semitone ratios on weak syllables. I do not find such a clear pattern with duration and intensity. Significant differences in these two variables generally involve greater ratios on strong and weak syllables in the speech of non-native speakers. EFL speakers produce a small number of relatively shorter strong syllables as well as relatively softer strong and weak syllables.

7.1.2 THE VARIATIONS IN DURATION, INTENTISY, AND PITCHAND STRESS

Section 7.1.2 investigates the similarities and differences in ways and the extent to'which duration, intensity, and pitch fluctuate with stress across the syllables of the sentences for each group of speakers. The three variables are compared with respect to the number of strong and weak syllables that are classified as "+SRPS" or "-SRPS" with respect to the preceding syllables. Please see the introduction of this chapter for the criteria for these two categories.

270 Table 7.3 Number of strong and weak non-final syllables classified as Raised or Lowered for Type A sentences

T e AlPosition Stress Movement -SRPS +SRPS Dis la No. % No. % NS Duration 1 13 8 26 Intensi 0 5 16 Pitch 0 3 10 ESL Duration 2 25 10 32 Intensi 3 38 10 32 Pitch 2 25 7 23 EFL Duration 2 25 13 42 Intensi 2 25 11 35 Duration 2 25 10 32

100

90

80 J 70 .!!! 60 i' IIDuration '0 Gl 50 EIIlnt6nslty til ~ CPitch Gl 40 ~ Gl a. 30

20

10

0 NS +SRPS ESL +SRPS EFL +SRPS NS -SRPS ESL -SRPS EFL -SRPS Strong Strong Strong Weak Weak Weak Type A Non-final

Figure 7.1 Percentage of strong non-final syllables classified as "+SRPS" and weak non­ final syllables classified as "-SRPS" for Type A sentences

As can be seen in Table 7.3 and Figure 7.1, NS, ESL, and EFL speakers generally raise duration, intensity, and pitch on strong syllables and lower them on weak syllables.

271 Moreover, the distributions of strong and weak syllables classified as Raised or Lowered are similar among these three variables for each subject group. Therefore, the three variables generally move in the same direction, ignoring the extent to which each variable is raised or lowered.

However, I can see that these variables correlate with stress more closely for English than for either group of TM speakers. Higher percentages of strong syllables are raised in terms of duration, intensity, and pitch values for the English speakers (88%, 100%, 100%) than for the TM ESL speakers (75%, 63%, 75%) and the TM EFL speakers (75%, &5%, and 75%). And higher percentages of weak syllables are lowered in terms of duration, intensity, and pitch values for English speakers (74%, 84%, 90%) than for TM ESL (68%, 68%, 77%) and EFL speakers (58%, 65%, 68%).

Although the degrees of associations with stress differ across variables and subject groups, pitch appears to be most consistently raised or lowered according to stress. Duration appears least consistently raised or lowered probably because syllable duration is highly susceptible to segmental variations across syllables. I also notice that the associations of the three variables with stress are generally stronger for strong syllables than for weak syllables. The results are consistent our earlier findings that TM speakers, especially EFL speakers, experience greater trouble with weakening weak syllables than strengthening strong syllables.

272 Table 7.4 Number of strong and weak non-final syllables classified as "+SRPS" or "­ SRPS" for Type B sentences

T B/Position Stress Movement -SRPS +SRPS Dis la No. % No. % NS Duration a 1 5 Intensi 4 24 3 14 Pitch 7 41 4 18 ESL Duration 1 6 a Intensi a a 2 9 Pitch 4 24 1 5 EFL Duration 1 6 3 14 Intensi 2 12 1 5 Duration 2 12 a a

100

90

80

:l 70 :is III 60 ~ • Duration '0 50 tmlntenslty & ~ EilPitch Gl 40 ~ Gl ll. 30

20

10

0 NS +SRPS ESL +SRPS EFL +SRPS NS -SRPS ESL -SRPS EFL -SRPS Strong Strong Strong Weak Weak Weak Type B Non-final

Figure 7.2 Percentage of strong non-final syllables classified as "+SRPS" and weak non­ final syllables classified as "-SRPS" for Type B sentences

Overall, duration, intensity, and pitch correlate with stress more closely for Type B than for Type A sentences in terms of the percentages of the strong syllables classified as

273 +SRPS and the percentages of the weak syllables classified as -SRPS in duration, intensity, and pitch. The three variables are generally tied to one another, +SRPS on strong syllables and -SRPS on weak syllables. The association is strong except for intensity and pitch of the strong syllables for English speakers. This in part might have to do with the fact that NS do not always produce high pitch accents on strong syllables. For Type B sentences, TM speakers seem to tie the three variables with stress more closely than do English speakers. The stronger tie between duration and stress for Type B than for Type A sentences may have to do with the fact that all of the weak syllables in Type B sentences are also unstressed function words or morphemes, which are generally inherently shorter than stressed content words or morphemes, which correspond to strong syllables in Type B sentences.

7.1.3 DIFFERENTIATION BETWEENSTRONG AND WEAK SYLLABLES INNON­ FINAL POSITION

Section 7.1.3 compares the differentiation between strong and weak syllables in each of the three variables. The average duration, intensity, and semitone ratios of the strong syllables are compared with those of the weak syllables within each group and the relative contrasts in each variable are compared across groups.

Table 7.5 Duration, intensity, and semitone ratios of strong versus weak syllables in non-final positions for Type A sentences

Position GrOll Stress Duration Intensi Pitch

As shown in Table 7.5, for Type A sentences, the non-final strong syllables are on average relatively longer, louder, and higher in pitch than the weak ones for all subject groups. Among the three subject groups, NS on average produce the greatest differentiation between strong and weak syllables in all three variables and EFL the 274 smallest. The durational contrast between strong and weak syllables is 0.32 for NS, 0.28 for ESL, and 0.17 for EFL speakers. The intensity contrast between strong and weak syllables is 0.21 for NS, 0.08 for ESL, and 0.05 for EFL speakers. The pitch contrast between strong and weak syllables is 0.16 for NS, 0.10 for ESL, and 0.05 for EFL speakers.

275 (a) Contrasts between strong and weak syllables for English speakers

1.00 Ih., 0.90 "', .. ... ",,...... 0.80

0.70 UI • Duration .2 _ -Instnesity 1ii a: ...... Pitch 0.60

0.50

0.40

0.30 Strong Non-final Weak Non-final Type A Sentences: NS

(b) Contrasts between strong and weak syllables for ESL speakers

1.00 1-:.---- 0.90 ...... --...... --.. ..--.... 0.80 • ~ 0.70 UI • Duration 0 il _ -Instnesity a: ~ ...... Pitch 0.60

0.50 "'-

0.40

0.30 Strong Non-final Weak Non-final Type A Sentences: ESL

276 (c) Contrasts between strong and weak syllables for EFL speakers

1.00 II-:a -:-,. -=- _ ...... --::' 0.90 ---..... 0.80 "'- ~ 0.70 ! ~ • Duration ~ - -Instnesity .. ~ .. Pitch 0.60

0.50

0.40

0.30 Strong Non-final Weak Non-final Type A Sentences: EFL

Figure 7.3 Duration, intensity, and pitch contrasts between strong and weak non-final syllables for Type A sentences for NS, ESL, and EFL speakers

Although the amount of contrasts between strong and weak syllables may differ across variables and speaker groups, I can see that these three variables generally coordinate in the differentiation between strong and weak syllables, with higher ratios being associated with the strong syllables and lower ratios being associated with the weak syllables. However, the differentiation is not as distinct for TM speakers as it is for English speakers.

In addition, the average intensity ratios of the strong and weak syllables are very close to the corresponding average semitone ratios. As can be seen in Figure 7.3, the average intensity and semitone ratios of strong syllables are very similar for all subject groups and so are the intensity and semitone ratios ofthe weak syllables.

277 Table 7.6 Duration, intensity, and semitone ratios of strong versus weak syllables in non-final positions for Type B sentences

Position GrOll Stress Duration Intensit Pitch

Similar to what I have found for Type A sentences, the non-final strong syllables for Type B sentences also are on average relatively longer, louder, and higher in pitch than the weak ones for all subject groups. However, here the amount of contrasts is highly similar between native and non-native speakers for all three variables. The durational contrast between the strong and the weak syllables is 0.32 for NS, 0.31 for ESL, and 0.26 for EFL speakers. The intensity contrast between the strong and the weak syllables is 0.10 across all groups. The pitch contrast between the strong and the weak syllables is 0.06 for NS, and 0.08 for ESL and EFL speakers.

278 (a) Contrasts between strong and weak syllables for English speakers

1.00 !-- ...... 0.90 .. ..-...... '''':'''1 0.80

0.70 - • Duration III 0 :;; - -Intensity a: ...... Pitch 0.60 ~ 0.50 " ~ 0.40 ~

0.30 Strong Non-final Weak Non-final Type B Sentences: NS

(b) Contrasts between strong and weak syllables for ESL speakers

1.00 ....------.

0.90 +------''"--'-.~t:-:-- ...... -..-..---,..~...- ...... ------~ .... =---1 0.80 +------1

0.70 +------::0'"------1 r:::i.~~D;;;ur:;atifu:o;;;-n ~ 1ii --Intensity a: ...... Pitch 0.60 +------~liEc_------~

0.50 +------.3l...------~

0.40 +------.::~----~

0.30 +------.,...------l Strong Non-final Weak Non-final Type B Sentences: ESL

279 (c) Contrasts between strong and weak syllables forEFL speakers

1.00 I-:. ':'-.. __ 0.90 ... """"'" .-.. ---. 0.80 "'- ~ 0.70 III • Duration 0 ;:; _ -Intensity III a: ~ ...... Pitch 0.60 •

0.50

0.40

0.30 Strongd Non-final Weak Non-final Type B Sentences: EFL

Figure 7.4 Duration, intensity, and pitch contrasts between strong and weak non-final syllables for Type B sentences for NS, ESL, and EFL speakers

As can be seen in Figure 7.4, the three variables generally coordinate in the differentiation between the strong and the weak syllables within each group, with higher ratios being associated with the strong syllables and lower ratios being associated with the weak syllables. Also, similar to what I have found for Type A sentences, here the average intensity ratios of the strong and weak syllables also are very similar to the corresponding semitone ratios. The average intensity and semitone ratios of strong syllables are very similar for all subject groups and so are the intensity and semitone ratios of the weak syllables.

7.2 DURATION, INTENSITY, AND PITCH OF SYLLABLES IN FINAL POSmON

This section analyzes the coordination among duration, intensity, and pitch for syllables in final positions in three subsections. Section 7.2.1 analyzes the coordination of the three

280 variables in terms of the significant differences between NS and the two groups of TM speakers. Section 7.2.2 analyzes the number of strong and weak syllables that are "+SRPS" or "-SRPS" along each of the three variables. Section 7.2.3 analyzes the extent to which the three variables are coordinated with one another in the differentiation between the strong and the weak syllables.

7.2.1 SIGNIFICANTDIFFERENCES IN DURATION, INTENSITY, AND PITCH BETWEEN PAIRS OF SUBJECT GROUPS

This section investigates whether TM speakers evidence similar types and degrees of difficulties with the duration, intensity, and pitch of final syllables. For each group ofTM speakers, the three variables are compared with respect to the number and types of strong and weak final syllables classified as having significantly greater or smaller ratios than those ofEnglish native speakers at p

Table 7.7 Number of strong and weak syllables in final position ofType A sentences classified as significantly different in duration, intensity, and/or pitch

Weak: (k=5) Greater Greater Smaller

1

EFLvs.NS 1 3 2 EFLvs.ESL 4 1

Two observations can be made based on Table 7.7. First, with duration, all of the significant differences between native and non-native speakers involve relatively shorter final syllables in the speech of TM speakers, regardless of their being targeted as strong or weak. Second, with intensity and pitch, all of the significant differences between native and non-native speakers involve relatively greater ratios on the final syllables.

281 Table 7.8 Number of strong and weak syllables in final position ofType B sentences classified as significantly different in duration, intensity, and/or pitch

Weak (k=O) eater smaller

EFLvs.NS

EFLvs.ESL .-

Similar to what I have found for final syllables of Type A sentences, both ESL and EFL speakers produce relatively shorter, but relatively louder and higher-pitched final syllables than English speakers. Summarizing Tables 7.7 and 7.8, while TM speakers produce less final lengthening than do English speakers, they have a slight tendency to produce greater intensity and pitch on final syllables.

7.2.2 THE VARIATIONS IN DURATION, INTENTISY, AND PITCHAND STRESS

Section 7.2.2 investigates the similarities and differences in ways and the extent to which duration, intensity, and pitch of the final syllables vary with stress for each subject group. The three variables are compared with respect to the number of strong and weak syllables that are classified as "+SRPS" or "-SRPS" with respect to the preceding syllables. For Type B sentences, only strong syllables will be examined because all of the final syllables for Type B sentences are strong.

282 Table 7.9 Number of strong and weak final syllables classified as "+SRPS" or "-SRPS" for Type A sentences

T e NPosition Stress (k=2) Weak:->...(k_=S--<)-=-=-=-_.....lI Movement -SRPS -SRPS Dis la No. % No. % NS Duration Intensi 3 60 Pitch 2 40 ESL Duration Intensi 1 50 4 80 Pitch 2 40 EFL Duration Intensi 1 50 4 80 Pitch

283 100

90

80

gj 70 :is .!! 60 ~ IIDuration '0 CIl 50 Dlntenslty l:lI S []Pltch c CIl 40 ..C) CIl a. 30

20

10

0 NS +SRPS ESL +SRPS EFL +SRPS NS +SRPS ESL +SRPS EFL +SRPS Strong Strong Strong Weak Weal< Weak Type A Final

Figure 7.5 Percentage of strong final syllables classified as "+SRPS" and "-SRPS" final syllables classified as "+SRPS" for Type A sentences

Three observations can be made based on Table 7.9 and Figure 7.5. First, speakers of all groups lengthen final strong and weak syllables 100% of the time for Type A sentences. In other words, TM speakers evidence lengthening on final syllables but they generally produce less lengthening (see Tables 7.7 and 7.8) than do English speakers. Second, for speakers of all groups, the pitches of final strong syllables are +SRPS 100% (2/2) of the time for Type A sentences. Moreover, for EFL speakers, the pitches of final weak syllables are +SRPS 100% (5/5) of the time, while for NS and ESL speakers, the pitches of final syllables are +SRPS at a lower frequency 60% (3/5). It seems that EFL speakers evidence a tendency to raise pitch on final syllables, regardless of their being targeted as strong or weak. Third, ESL and EFL speakers raise intensity on final strong syllables less frequently (112 vs. 212) but lower intensity on final weak syllables more frequently than do English speakers (4/5 vs. 3/5). Note that TM speakers produce relatively louder final syllables than English speakers and their intensity generally 284 remains high throughout the sentences (see Table 5.2). This suggests that TM speakers do not evidence a tendency to raise intensity on final syllables, but the intensity of their final syllables often appear relatively greater than those of NS because of smaller intensity declinations over the utterances.

In summary, for Type A sentences, the durations of final syllables are generally +SRPS and the intensities and pitches are either +SRPS or -SRPS as a function of stress. However, ESL and EFL speakers do not raise intensity on strong syllables as frequently as NS. On top of that, EFL speakers seem to mark final syllables with increased duration and pitch.

Table 7.10 Number of strong final syllables classified as "+SRPS" or "-SRPS" for Type B sentences

T BlPosition Stress Weak (k=O) Movement +SFUPS -SFUPS Dis la No. % No. % NS Duration Intensi 1 14 Pitch ESL Duration Intensi Pitch EFL Duration 1 14 Intensi Pitch 1 14

285 100

90

80 11\111111111--- 70

11.Q ~ 60 =- • Duration 0 50 IIIntensity -& ~ Ii] Pitch GI 40 ..U GI ll. 30

20

10

0 NS +SRPS Strong ESL +SRPS Strong EFL +SRPS Strong TypeBFlnal

Figure 7.6 Percentage of strong final syllables classified as "+SRPS" for Type B sentences

For Type B sentences where all final syllables are strong, the three variables are highly coordinated with one another for each of the three subject groups. The duration, intensity, and pitch generally rise on strong final syllables. For English speakers, duration rises 100% (717) of the time, intensity 86% (617), and pitch 100% (717). For ESL speakers, all three variables rise 100% (717) of the time. For EFL speakers, duration and pitch rises 86% (617) of the time and intensity 100% (717) of the time. The results are consistent with our earlier findings that the three variables are quite coordinated on strong syllables.

7.2.3 DIFFERENTIATION BETWEEN STRONG AND WEAK SYLlABLES IN FINAL POSITION Section 7.2.3 compares the differentiation between strong and weak final syllables of Type A sentences in each ofthe three variables. Because there are no weak final syllables 286 for Type B sentences, only. Type A sentences will be examined. The average duration, intensity, and semitone ratios of the strong syllables are compared with those of the weak syllables within each group and the relative contrasts in each variable are compared across groups.

Table 7.11 Duration, intensity, and semitone ratios of strong versus weak syllables in final positions for Type A sentences

Position GrOll Stress Duration IntenSi Pitch

As shown in Table 7.11, for Type A sentences, the strong final syllables are on average relatively longer, louder, and higher in pitch than the weak ones for all subject groups although the contrasts between strong and weak syllables in intensity and pitch are very small for EFL speakers. Among the three subject groups, EFL on average produce the greatest durational differentiation between strong and weak syllables, and the smallest differentiation in terms of intensity and pitch. English and ESL speakers on average maintain very similar duration, intensity, and pitch contrasts between strong and weak syllables. The durational contrast between strong and weak syllables is 0.07 for NS, 0.08 for ESL, and 0.11 for EFL speakers. The intensity contrast between strong and weak syllables is 0.06 for NS, 0.08 for ESL, and 0.02 for EFL speakers. The pitch contrast between strqng and weak syllables is 0.08 for NS, 0.08 for ESL, and 0.01 for EFL speakers.

287 (a) Contrasts between strong and weak syllables for English speakers

1.00 - ~ 0.90 .. ~ . OJ ...... tilt __ .6. 0.80 ...... - 0.70 ------• Duration I _-Instnesity a:: ...... Pitch 0.60

0.50

0.40

0.30 Strong Final Weak Final Type A Sentences: NS

(b) Contrasts between strong and weak syllables for ESL speakers

1.00 - -. 0.90 & ...... OJ ...... ~ 0.80 ------. 0.70 UI • Duration ~ _ -Instnesity a:: ...... Pitch 0.60

0.50

0.40

0.30 Strong Final Weak Final Type A Sentences: ESL

288 (c) Contrasts between strong and weak syllables forEFL speakers

1.00

0.90 ~ 11------11

0.80

0.70 III • Duration ~ _ -Instnesity a: ...... Pitch 0.60

0.50

0.40

0.30 Strong Rnal WeakRnal Type A Sentences: EFL

Figure 7.7 Duration, intensity, and pitch contrasts between strong and weak syllables in final position for Type A sentences for NS, ESL, and EFL speakers

For NS and ESL speakers, the three variables are quite coordinated in the differentiation between strong and weak syllables, with higher ratios associated with strong syllables and lower ratios associated with weak syllables. For EFL speakers, although duration contrasts are relatively distinct between strong and weak syllables, intensity and pitch contrasts are not. It seems that with final syllables, EFL speakers do not coordinate pitch and intensity closely with duration in maintaining a clear contrast between strong and weak syllables.

7.3 SUMMARY

In non-final position, duration, intensity, and pitch generally vary with stress for all speaker groups although all three variables move more closely with stress for NS than for ESL and EFL and these variables move more closely with stress for Type B than for

289 Type A sentences. For type A sentences, TM speakers demonstrate similar types of difficulties with all three of these variables, i.e., weaker strong syllables and stronger weak syllables for Type A sentences. For Type B sentences, the number of significant differences is small in most cases except for a large portion of relatively longer weak syllables produced by EFL (15/22). While ESL and EFL speakers produce relatively greater ratios on strong and weak non-final syllables, they produce greater semitone ratios on strong and smaller semitone ratios on weak non-final syllables. The distributions of strong and weak syllables classified as +SRPS or -SRPS are also similar among these three variables within each subject group. In terms of the differentiation between strong and weak syllables, higher ratios are generally associated with the strong syllables and lower ratios with the weak syllables for all speaker groups, except for that the intensity and pitch contrasts between strong and weak syllables are rather indistinct for EFL in the Type A sentences. For Type B sentences, TM speakers evidence highly similar contrasts between strong and weak syllables with English speakers using all three variables.

In final position, duration, intensity, and pitch are generally +SRPS on strong syllables and -SRPS on weak syllables for NS and ESL speakers. EFL speakers, however, evidence little intensity and pitch contrast between strong and weak final syllables. Influenced by final lengthening, speakers of all groups lengthen final syllables, although strong syllables are generally longer than the weak ones. Because of smaller intensity declination over the utterances, the intensity of final syllables tends to be relatively higher for ESL and EFL than for NS. But neither group shows a tendency to raise intensity on final syllables with respect to the preceding syllables. Nonetheless, EFL speakers seem to associate final syllables with both increased duration and pitch with respect to the preceding syllables. This suggests that when it comes to weak final syllables, EFL speakers may provide pitch information that conflicts with duration and intensity as to whether a given final syllable is strong or not.

290 CHAPTER 8: CONCLUSION

8.1 FINDINGS AND IMPLICATIONS

ESL professionals and non-professionals alike have long described the speech of Mandarin speakers as being bullet-like when they speak English. However, little empirical evidence is available to explain what exactly makes their speech rhythm different from that of English native speakers. The current study compared English and Taiwan Mandarin speakers with respect to their production of English speech rhythm by analyzing three major correlates of stress in two prosodically diverse sets of sentences: Type A with long stretches of weak syllables and Type B with higWy regular alternating strong and weak syllables.

The results showed that the TM ESL and EFL speakers experienced difficulties with Type A but not with Type B rhythm. Because the TM ESL and EFL speakers both performed relatively well for Type B sentences, it is for Type A sentences where the differences between native and non-native speakers really show.

For Type A sentences featuring long stretches of weak syllables, the TM speakers produced relatively shorter, softer, and lower-pitched strong syllables and relatively longer, louder, and higher-pitched weak syllables than the English speakers (cf. Tables 4.4, 5.4, 6.4, and 7.1). The combination leads to less duration, intensity, and pitch differentiation between the strong and the weak syllables (cf. Tables 4.21,5.23,6.22, 7.5 and Figures 4.9, 5.7, 6.9, 7.3), which may provide at least one explanation of why their speech is often described as syllable-timed by the native listeners ofEnglish.

In addition to difficulties with the realization of stress, the TM speakers also struggled with the placement of stress for Type A sentences. Generally, they did not vary duration, intensity, and pitch with stress as consistently as the English speakers did (cf.

291 Tables 4.20, 5.19, 6.19, 7.3 and Figures 4.7, 5.6, 6.6, 7.1). In particular, EFL speakers had difficulties in reducing the duration, intensity, and pitch of the weakly stressed syllables for Type A sentences (cf. Tables 4.19, 5.21, 6.22). The results showed that English speakers had at least three levels of stress (main-stressed, weakly stressed, unstressed), whereas EFL speakers had a binary division of stressed vs. unstressed syllables, suggesting that they might not realize as many levels of stress in production as English speakers did. This suggests that the TM speakers may have a "flatter" hierarchy of stress than the English speakers. One hypothesis is that for TM speakers all stresses are primary. This suggests that they would tend to place primary stress on whatever syllable bears stress. Because the current study used only monosyllabic words, further research has to be done to verify this hypothesis.

For Type B sentences featuring regular alternating strong and weak syllables, TM speakers varied the three variables with stress as consistently as did English speakers (cf. Figures 4.5, 5.5, 6.5, 7.2 and Table 7.4) and produced near native-like pitch and intensity contrasts between the strong and weak syllables (cf. Tables 5.26, 6.27 and Figures 5.9, 6.9), although EFL speakers evidence slightly less duration differentiation between strong and weak syllables due to their difficulty in shortening non-final weak syllables (cf. Table 4.27, and Figure 4.11). In fact, TM speakers sometimes evidence slightly greater pitch differentiation between strong and weak syllables than do English speakers (cf. Figure 6.9). In contrast to Type A sentences, TM speakers are able to produce native-like alternating rhythmic patterns ofType B sentences.

Generally, TM speakers are able to use the same cues used by English speakers, duration, intensity, and pitch, to signal stress. The three phonetic cues generally vary with stress and with one another for all speaker groups although the degrees of coordination with stress vary across speaker groups and sentence types. Generally the three variables coordinate with stress more closely for NS than for NNS and more closely for Type B than for Type A sentences. For Type B sentences, these three variables are tied to stress quite strongly for all speaker groups (cf. Figures 4.5, 5.5, 6.5, 7.2 and Table 7.4). For Type A sentences, TM speakers evidence similar types of difficulties in these three

292 variables (cf. Table 7.1). Moreover, the distributions of syllables classified as Raised or Lowered are similar for each of these variables (cf. Table 7.2) and they generally agree with one another in the differentiation between strong and weak syllables (cf. Table 7.5 and Figure 7.3).

The attachment to pitch contrasts in the learner's native language does not necessarily cause them to over-produce pitch contrasts between strong and weak syllables in English or to rely more heavily on pitch than on the other two variables in the realization of stress. On average TM speakers produced smaller pitch contrasts between the strong and the weak syllables for Type A sentences (cf. Table 6.25 and Figure 6.7) just as they did with duration (cf. Table 4.23 and Figure 4.10) and intensity (ct. Table 5.22 and Figure 5.7), and that their pitch contrasts were only slightly greater than those of the English speakers for Type B sentences (cf. Figure 6.9). The TM speakers thus did not exaggerate pitch contrasts between the strong and the weak syllables as we had suspected that they might.

Furthermore, second language learners' difficulties with stress may not be the same in non-final versus final positions. On the one hand, final syllables often receive special marking to signal phrasal boundaries. On the other hand, the marking for final boundaries often fuses with the marking for stress. The picture gets complicated when native and non-native speakers employ different cues for marking final boundaries, which is exactly what happened in this study. In final position, TM speakers' difficulties in lengthening strong and shortening weak are somewhat neutralized in final position, in that their final syllables tend to be shorter than those of English speakers regardless of stress. Speakers of all groups generally lengthen final syllables, regardless of their being targeted as strong or weak (cf. Tables 4.18, 7.9, 7.10 and Figures 7.5, 7.6), although TM speakers, especially EFL, generally do not produce final syllables with as much length as English speakers do (cf. Table 4.17). Moreover, their difficulty in final lengthening interacts with their difficulties in lengthening strong syllables and shortening weak syllables, which make their difficulty in lengthening strong syllables appear worse and their difficulty in shortening weak syllables appear better than they actually are in final position. This

293 provides an explanation why TM speakers evidence a very high rate of significantly shorter strong syllables in final position, but do not seem to have as much problem with weak syllables in final position as in non-final position (cf. Table 4.16).

In final position, NS and ESL generally maintain highly similar intensity (Table 5.24 and Figure 5.8) and pitch (Table 6.24 and Figure 6.8) contrasts, with the strong syllables being distinctively stronger than the weak ones, although ESL evidence smaller duration contrasts than NS (cf. Table 4.22 and Figure 4.10). EFL speakers not only produce less duration, intensity, and pitch differentiation than do NS, they also evidence rather indistinct intensity and pitch contrasts between final strong and weak syllables (cf. Figure 7.7c). A further look at the number of syllables classified as Raised or Lowered indicate that EFL speakers have a tendency to raise pitch on final syllables regardless of stress (cf. Table 7.9 and Figure 7.5) in addition to lengthen them. This suggests that less proficient non-native speakers may use both duration and pitch for signaling sentence­ final boundary in English. Ukely to native ears, they sound like that they end every sentence with a punch.

Increased proficiency and exposure is correlated with gradual but positive changes in the use of duration, intensity, and pitch as correlates for stress. Because the TM ESL and EFL speakers both performed relatively well for Type B sentences, it is for Type A sentences where the differences between these two groups of TM speakers really show. The ESL speakers produced fewer instances of significant differences involving difficulties in strengthening strong syllables and weakening weak syllables (cf. Tables 4.4,5.4,6.4, and 7.1), more native-like contrasts between strong and weak syllables (cf. Tables 4.23, 522, 6.22, 7.5 and Figures 4.9, 5.7, 6.7, 7.3), greater and more native-like standard deviations, greater and more native-like ranges (cf. Tables 4.2,5.2, and 6.2), and stronger statistical correlations with the English speakers across the sentences than the EFL speakers (cf. Tables 4.7,5.7, and 6.6). Basically, the advanced ESL speakers and the intermediate EFL speakers experienced similar types of problems. The differences were more in the degrees of the problems (cf. Tables 4.4, 5.4, 6.4, and 7.1). Although we do not have longitudinal data to back this, the cross-section results suggest that a higher

294 proficiency and greater exposure to the English-speaking environment may help improve speech rhythm and that the improvement is probably gradual.

This discrepancy between Type A and Type B sentences, Le., that TM speakers evidence difficulty producing Type A rhythm but are able to produce native-like Type B rhythm, suggests that TM speakers' trouble with the English speech rhythm should not be treated as an ALL or NONE issue. Apparently TM speakers were able to manage at least one type of English stress-timed rhythm equally well as the English native speakers, Le., when the rhythm features a regular alternation between stressed syllables and unstressed syllables. For this reason, the current study strongly challenges the use of "syllable­ timing" as a cover term in describing the speech rhythm ofTM speakers.

Following from the results, we propose multiple parameters under the traditional rhythmical category "stress-timing" by building in possible language-specific variations as to the number of unstressed syllables permitted within a foot or the number of prosodically weak syllables at a higher metrical level. The current results point to at least two such possible parameters. On the one hand, there might be languages that strongly favor the disyllabic foot structure featured in Type B sentences. On the other hand, there might be languages that tolerate a wide range of permissible unstressed syllables, including long stretches of unstressed syllables, as is the case for English. Note that the latter is also inclusive of the former. From this perspective, the TM speakers were indeed capable of producing one type of the stress-timed rhythm and a subset of possible rhythmic patterns in English. Thus in learning to produce the English speech rhythm, their primary task is to accommodate a various number of unstressed syllables within a foot and a various number of prosodically weak syllables at a higher metrical level. Directions for further research, including possible pedagogy for teaching English speech rhythm, will be discussed in section 8.3.

295 8.2 STRENGTHS AND LIMITATIONS OF THE CURRENTY STUDY

8.2.1 STRENGTHS Compared with previous studies on the second language acquisition of English rhythm, the design of the current study shows strengths in the following seven aspects

First, unlike most previous studies that treat rhythm as purely a timing phenomenon, this study went beyond timing and examined rhythm as a three-dimensional entity. Given that rhythm is realized in the organization of stresses and that stress in English is cued primarily by duration, intensity, and pitch, it makes sense to include all three factors in the investigation ofrhythm.

Second, it took into consideration stress patterns over whole sentences. While most previous studies compare stressed and unstressed syllables using isolated words, or highly selected syllables from utterances, the current study examined complete sentences with a stretch of 5-8 syllables, which allowed us to gain a more realistic view of the rhythmic patterns.

Third, unlike most previous studies which asked the subjects to read or repeat words or sentences in isolation, the current study embedded the test sentences within the context of a short monologue or dialogue to help the subjects arrive at similar interpretations of each sentence and to minimize variations in intonation and stress patterns.

Fourth, it used a larger number of subjects than most previous studies. Due to the time needed for analysis, it is typical for acoustic studies to have limited samples. The reliability of results based on a small number of subjects is often questionable, especially in the study ofinter-language where large individual variations are commonly observed.

Fifth, it took into consideration English proficiency as a potential independent variable for the acquisition of speech rhythm. By comparing English learners of two different proficiency levels, we were able to compare the kinds of difficulties with rhythm at different stages of acquisition. 296 Sixth, unlike most previous studies that compared native and non-native speakers based on absolute acoustic measurements, the current study normalized duration, intensity, and pitch of individual syllables within each utterance first before any further computations andlor comparisons were made to minimize variations in speech rate, volume of speech, and pitch range between and within subjects.

Finally, it developed an innovative and theoretically sound approach for measuring fundamental frequency as a correlate for stress in English. Taking into account the characteristics of pitch accents, tonal interpolations, and final boundary tones of English intonation, this approach employed three sampling rules which enable us to capture the upper range of a high pitch accent, the lower range of a low pitch accent, and the furthermost range of a tonal interpolation between two tones. Compared with three existing approaches for measuring pitch, the current approach most closely represented the pitch contours of the original pitch track.

8.2.2 liMITATIONS

Despite several improvements from previous research on rhythm, the current study suffers from at least eleven limitations.

First, although the current study took into consideration the characteristics of pitch accents and intonation in its measurement of pitch as a correlate for stress, it did not investigate the psychological reality of rhythm, Le., what it is that TM speakers conceive as the domain of rhythm. Is it the syllable or the foot? To answer this question, one would have to examine the phonological aspects of speech rhythm. The challenge would be to develop a set of diagnostic tests, which allow us to effectively identify the unit of timing. Possible diagnostic tests will be introduced in section 8.3.

Second, it did not take into account vowel quality as a correlate for stress. In English, the distinction between a full vowel and a reduced vowel largely coincides with the distinction between a stressed vowel and an unstressed one. TM speakers have been commonly observed by researchers and ESL teachers as not reducing unstressed vowels

297 as consistently and/or as frequently as English speakers do. It remains an empirical question whether TM speakers have internalized into their phonological system. The current study did not examine the extent to which difficulties with vowel reduction might have affected the speech rhythm in the two groups of TM speakers nor how failure of vowel reduction may affect the perception of stress in native speakers among other problems.

Third, it is important to note that although duration is known as an important correlate of stress, the correlation could be difficult to capture from examining the duration of syllables in real speech. As the duration of syllables is highly susceptible to variations in the segmental composites across syllables, it could sometimes be difficult to determine whether a syllable is more or less stressed than another based on information from duration alone. Function words, which are usually inherently shorter than content words, may not appear lengthened even when they are indeed stressed.

Despite my efforts to keep the test words simple in terms of their syllable structure and the number segments in each syllable, the control is not perfect. For further research on the duration aspect of rhythm, there are several choices of measures one could adopt to alleviate segmental variations across syllables. One way to do that is to measure the duration of the vowels rather than the duration of the entire syllables by choosing words that would make the segmentation of the vowels fairly straightforward and uncontroversial. One could also develop sentences that deliberately use words that have highly similar syllable structure and segments, such as Ed ate at eight. However, one must be very careful in pursuing this option as this type of sentences may prompt speech errors due to the succession of syllables with highly similar shapes and sounds. The vowels and consonants in these syllables could be alternated between syllables with simple slips of tongue. Another option would be to experiment with disyllabic words in both iambic and trochaic rhythmic patterns made from reiterating syllables, such as BEEbee or beeBEE. To make it somewhat less artificial, one could use pictures to create meanings for these new words and provide examples that actually use this word to form example sentences. After the subjects feel more comfortable in using these words in

298 speech, one could then have the subjects say within a meaningful context the test sentence that contains these test words.

Fourth, it did not employ a formal assessment for the English proficiency of the TM ESL and EFL learners. Although other criteria, such as length of English instruction and exposure to the English-speaking environment may help differentiate proficiency levels, it is not as precise as conducting an English proficiency test prior to using all non­ native subjects as part of the experiment. A CLOZE test, which is reliable but relatively shorter than most formal English assessment tests, could be one of the options (Hua, 1994). Along with the test scores, the correlation between English proficiency and the performance ofspeech rhythm could be more precisely analyzed

Fifth, although the sample size (10 per group) in the current study is larger than in most previous studies, it is desirable to have at least 30 subjects in each group in order to establish robust statistical test reliability. However, given the huge amount of acoustic data processing and analysis, to meet such a requirement would have been beyond the manageable scope of the current study. Future studies may improve the sample size by narrowing down the research scope and/or reducing the number of syllables analyzed.

Sixth, the P-value of the individual t-tests was not divided by the total number of t­ tests performed. The P-value or the alpha level is the chance taken by researchers to incorrectly declare a difference, effect or relationship to be true due to chance. Using the Bonferroni method, the P-value of each individual test is adjusted downwards to ensure that the risk of finding a difference incorrectly for a number of tests remains 0.05. Specifically, suppose that one performed k tests simultaneously with significance level at p<0.05, the appropriate P-value is 0.05/k following the Bonferroni correction.

Seventh, the test sentences were not designed in a way to maximize analytical efficiency. The two types of test sentence each featured sentences of various numbers of syllables and stress patterns. The variety made the speech sample more natural, but it also made it impossible to combine sentences of the same category for t-tests and correlation tests. The analytical inefficiency could have been corrected by using sentences with the

299 same number of syllables using parallel stress patterns and choice of words. In so doing, I could dramatically reduce the number of t-tests and correlation tests, making it possible to achieve statistical difference without an incredibly small P-value even after applying the Bonferroni correction.

Eighth, the current study used predominantly monosyllabic words and did not examine how TM speakers handle the duration, intensity, and pitch of words with two or more syllables. Because of this limitation, I was unable to determine, for example, whether TM speakers may have a tendency to place sentence stress on syllables that carry lexical stress.

Ninth, the current study did not investigate TM speakers' difficulty with stretches of totally unstressed syllables. The stretches of weak syllables for Type A sentences include both weakly stressed and unstressed syllables. Further study could design sentences that feature stretches of totally unstressed syllables. However, given that TM speakers have difficulty weakening long stretches of weak syllables (including weakly stressed and unstressed syllables), it seems reasonable to expect TM speakers to have difficulty with weakening stretches of unstressed syllables.

Tenth, that fact that NS produced at least three levels of stress while EFL speakers produced two levels of stress suggests that further studies on the acquisition of English speech rhythm by TM speakers should analyze these three types of syllables separately. The current study started out with a binary distinction of strong vs. weak syllables. It became clear during the analyses that when there are more than three levels of stress in a sentence, any such binary categorization of stresses does not tell the whole truth of what TM speakers' difficulties are.

Finally, focusing on production, the current study did not examine the perception aspects of speech rhythm. The results did not tell us whether the TM speakers use duration, intensity, and pitch di~erently from the English native speakers for the perception of stress. For example, I do not know whether TM speakers rely more heavily on pitch for the perception of stress than English speakers. The study also did not

300 investigate what it is that English listeners perceive as stressed and what the distribution of stresses sounds like when they hear TM speakers' speech in English.

8.3 DIRECTIONS FOR FURTHER RESEARCH

Further research on the second language acquisition of speech rhythm may proceed in several directions: (1) Production of speech rhythm, (2) Perception of speech rhythm, (3) A quantitative representation of speech rhythm, (4) Rhythm in speech segmentation, and (5) Pedagogy for teaching speech rhythm

8.3.1 PRODUCTION OF SPEECH RHYTHM

The results of the current study showed that TM speakers sometimes produce primary stress on all stressed syllables regardless of what the lead-in sentences said. Because the current study used predominantly monosyllabic words, it is not known how TM speakers handle the duration, intensity, and pitch of words with two or more syllables. Further research should be done to investigate stress placement in polysyllabic words at the sentence level among TM speakers. For instance, one could compare TM and English speakers with respect to their stress patterns of the target multi-syllabic words with an experiment featuring the following two types of sentences. The target sentences are shown in italics and the syllables that are targeted to elicit stress are shown in caps.

(38) Sample stimuli Type One: I don't party with friends BEFORE eleven. I]l.[l:I1)!. with friends AFTER eleven. Type Two: I don't STUDY with friends after eleven. I PARTY with friends after eleven.

Although each of the test sentences is placed within a context to induce certain target stress patterns, the TM speakers seemed to have difficulty in arriving at the expected stress patterns. They were often found to produce primary stress on a greater number of syllables than the English native speakers. It seems reasonable to speculate that the TM speakers may not pay due attention to discourse clues with regard to what information to emphasize. However, this may be something that comes with greater proficiency. As mentioned earlier, ESL speakers seemed to produce stress patterns that 301 were more similar to the target stress patterns than did EFL speakers. Further research could have TM and the English speakers produce the same test sentence in different contexts designed to induce various stress patterns. Here is an example.

(39) Sample stimuli A: Who is the manager? B: JOHN is the manager. A: Not John. I want to talk with the manager. B: But John IS the manager. A: Is John the waiter? B: No, John is the MANAGER.

The results of the current study showed that the TM speakers differ from the English speakers in the differentiation between the strong and the weak syllables along all three parameters. Given that there are no absolute isochronous syllables or inter-stress intervals, it is impossible to determine with this information alone· whether such differences result in a different rhythmic scheme or what it is that is intended as the unit of rhythm by TM speakers. Further research could develop a set of diagnostic tests, which would allow us to effectively identify the psychological units of rhythm. A number ofpossible diagnostic tests are suggested below.

One such test involves cases of stress shift, where stress in certain words is leftward-shifted when a stronger syllable follows, as in thirTEEN, but THIRteen MEN. However, the landing sites for stress shift must observe the following two constraints (Prince, 1983). First, the grid mark must be moved one at a time along the row where the clash occurs. Second, the landing site for a retracted stress must already bear the strongest available stress. Consider for example the phrase Sunset Park Zoo (Hayes, 1995, p.35).

(40)a. x x x x x x x x x =} x x x x x x x x x x x Sunset Park Zoo Sunset Park Zoo

b. x x x x x x x x x =} x x x x x x x x x x x Sunset Park Zoo *Sunset Park Zoo 302 As can be seen in (40b), it is not acceptable for the retracted stress to land on the syllable -set, which is not the highest available landing site and would create a discontinuous grid column, violating the second constraint.

TM speakers could be presented with test items with built-in stress clash. The duration, intensity, and pitch of the target phrases would be measured. The analyses would focus on whether a shift of stress occurs and whether the stress is retracted to its appropriate location.

A second approach is to diagnose the hierarchy of stresses in polysyllabic words or phrases. Consider for example the six-syllable word reconciliation, whose stress patterns are shown in example (7) in Chapter 2. Asked to tape once for each stress, English speakers would find it natural to tap once, twice, three, or six times, but not to tap four or five time.

(41) x [ a ] 1 taps x x [re a ] 2 taps x xx [re cia ] 3 taps x x xxxx [reconciliation] 6 taps reconciliation

A computer-based test could be set up with the help of a computer programmer. First, the polysyllabic test word or phrase would be visually presented on the screen to both native and non-native speakers. Next, the subjects would be instructed to click on an icon to go to the next page where each syllable of the test item would be placed in a separate box.

The subjects would be told that if they could choose only ONE syllable, which one do they feel most natural to place a beat on. They would be instructed to say the entire test word and click within the appropriate box when they reach the syllable that they feel comfortable to place the beat on. The click of the mouse would produce the sound of a drum beat. After that, the subject would be asked to click "NEXT" where slhe would be 303 asked to place a beat on two syllables or to click twice. Instead of forcing the subjects to arrive at a selection of syllables to place the beats on, the subjects will be provided an option for "doesn't make sense" when they find it impossible to tap a certain times. In addition, the subject would be able to click "CLEAR" to redo the procedure if for example they clicked within the wrong box. The procedure would repeat n times for each test word if it consists of n syllables. Both the syllables clicked and the point in time each click takes place would be recorded. The assumption is that it would be easier or take less time when there are grid layers corresponding to the number of beats and it would be harder or take longer when there are not. The choice of specific syllables at different number of clicks would be analyzed for possible differences in stress patterns between native and non-native speakers.

Last but not least, further study needs to be done to investigate the speech rhythm of Beijing Mandarin and Taiwan Mandarin. For BM, one could compare the durations of the Heavy-heavy vs. Heavy-light feet using minimal pairs of disyllabic words in Beijing Mandarin (see examples in (24)). Results of such a study could have important implications on the classification of Beijing Mandarin rhythm. If the results indicate that the durations of these two types of feet are more or less equal, it would be a strong indication that BM is foot-timed. Further study needs to be done to compare the speech rhythm of BM with TM. For instance, one could measure the duration, intensity, pitch of syllables in spontaneous speech, reading poetry, or more controlled sentences, to see the extent to which BM and TM show isochronous feet in different speech styles.

8.3.2 PERCEPTION OF STRESS

Further research could investigate the perception of duration, intensity, and pitch as correlates for stress in English. For example, one could investigate the extent to which changes in milliseconds, dB, and Hz each relates to the perception of prominence among native and non-native speakers ofEnglish.

One way of carrying out such an experiment would be to have TM speakers listen to the same basic disyllabic words recorded in both iambic and trochaic rhythmic 304 patterns. These test words could be made from reiterating syllables such as BEEbee or beeBEE to control for segmental variations. Three sets of the syllables could be synthesized to be acoustically identical except for duration in one set of syllables, intensity in another, and pitch in the other. The duration, intensity, and pitch of the stressed syllable could be artificially varied in steps from the bottom to the top range of normal speech. The analysis would compare native and non-native speakers with regard to the required amount of contrast in milliseconds, dB, and Hz between the stressed and unstressed syllables in order to perceive prominence on a syllable.

As shown in Fry (1955; 1958), for English native speakers, duration, intensity, and pitch are not equally important for the perception of stress. Little is known, however, regarding the relative importance of the three variables for non-native speakers' perception of stress in English. One way of investigating this issue would be to compare the three variables one pair at a time. Two sets of materials would be used: (1) Pairs of disyllabic words where a difference of rhythm is associated with a difference of grammatical function, such as subject, digest, permit, and perfect. (2) Disyllabic nonsense words created from reiterating syllables, such as BEEbee or beeBEE, which are identical except for their stress patterns. The second set of test words is not as natural as the first set, but it allows us to minimize segmental variations. Each test word would be synthesized under three different conditions, where one variable is kept constant while the ratios of the other two would be varied independently in five steps within ranges based on the actual productions of these words by 30 native speakers of English. Thus each test word would have 25 variations under each of the three conditions: (1) The duration and intensity ratios are independently varied while pitch is kept constant; (2) The pitch and duration ratios are independently varied while intensity is kept constant; (3) The pitch and intensity ratios are independently varied while duration is kept constant.

30 subjects would be asked to listen to these synthesized test words and identify the accented syllable in each case. The analyses would focus on the number of "Noun" judgments versus the number of "Verb" judgments in two kinds of situations: (1) when the two variables vary in the same direction, and (2) when they vary in opposite

305 directions. Results from the second case would be of particular significance in that the variable that "wins" most of the time would potentially carry a heavier weight for the perception of stress. The analyses would also look into the average percentage increase of the "Noun" vs. "Verb" judgments contributed by the whole range of variations in one variable vs. the other. The assumption is that the variable whose full range of variations contributes to a greater percentage increase in the "Stress" judgment plays a more important role than the other one. Once all three pairs of variables are compared, I would be able to distinguish the relative importance of the three variables for the perception of stress.

8.3.3 A QUANTITAIVE REPRESENTATION OF SPEECH RHYTHM An ideal quantitative model for representing rhythm would take both production and perception into account. The problem with formulating a model solely based on production is that it only provides a partial explanation of the phenomenon and does not explain what it is the native listener perceives. A similar criticism could be made against a model solely based on perception. The model should be able to take the actual phonetic measurements of each individual syllable, factor in the weight each variable carries for the perception of stress by native listeners, calculate a prominence index for each syllable, and generate the approximant prominence relations among syllables of an utterance. Note that this current proposal is aimed at generating a close representation of speech rhythm using quantifiable variables and does not take into consideration less readily quantifiable correlates for stress such as the ·vowel qualities or the kinesthetic memory associated with the listener's own production of the syllables slhe is perceiving.

Further research could be done to create such a model. Conceptually, the model would consist of the following components.

Step 1. Measurement ofduration, intensity, and pitch

For instance, one could measure the vowel duration, the peak intensity, and the

EPFo (see 3.3.4.1.3 for a detailed definition) of each syllable. To quantify pitch

306 prominence, one could measure the distance between the EPFo and the pitch minimum if the EPFo is the pitch maximum or vice versa.

Step 2. Normalization ofduration, intensity, and pitch

Next, one may follow the normalization procedures used in the current study to normalize the absolute measurements in each variable within each utterance to eliminate variations between and within speakers.

Step 3. Estimate the relative weight for the perception ofstress

As shown in earlier studies, the three variables ~e not expected to be equally important for the perception of stress. A separate experiment needs to be conducted to establish the relative weight each variable carries for the perception of stress. One way to achieve this objective is to have a large number of native speakers listen to the recording of a passage read by a native speaker and ask them to identify syllables that they perceive as prominent. Now I have two sets of data for each syllable: the number of "prominent" judgments and the average relative duration, intensity, and pitch. A correlation coefficient could then be obtained between the two for each variable. The correlation coefficient would serve as a weighted index.

Step 4. Calculate the Prominence Index

Suppose the results show that duration carries a 0.7 weight, intensity a 0.5 weight, and pitch a 0.9 weight in the perception of stress. The prominence index for a syllable would be the summation of the multiplication of the weighted index and the normalized value for each variable.

After I obtain the prominence index for each syllable of the utterance, its rhythmical pattern could be easily graphed and compared between native and non-native speakers. This model could be applied as a diagnostic for problems in speech rhythm, a supplementary assessment tool for second language fluency, or further developed to aid processing of prosodic information in speech recognition.

307 8.3.4 RHYTHMAND SPEECH PROCESSING

Further research could investigate how difficulties in speech rhythm may affect speech. processing. For instance, one could examine whether non-native speakers tend to segment English at every syllable, at more syllables than desired, or only at stressed syllables as was found with English native speakers. One approach would be to compare the response time for identifying stressed versus unstressed syllables. This syllable­ monitoring task would employ two sets of materials: experimental words and target syllables. The experimental words could be disyllabic words where a change of function is associated with a change in the stress pattern, such as 'object vs. obJect, or carefully chosen based on the following criteria. Given that half of the words are trochaic and the other half are iambic, I would be able to control for the rhythmic patterns. The experimental words would have to be similar in terms of frequency of usage in the target language so that the response time would be minimally impacted by the speaker's familiarity to the words. They should also be carefully chosen so that they do not go beyond the speaker's lexical proficiency. All chosen words would have to have clear uncontroversial syllable boundaries to eliminate syllabification as a potential variable. Thus for every experimental word e.g., 'object, there will be two potential targets, ob- and -ject one that matches the stressed syllable and the other that matches the unstressed syllable. Distracting words and target syllables would also be constructed.

The audio-recorded test words would be randomly presented to native and non­ native speakers of English using the computer program Psyscope (Cohen, MacWhinney, Flatt, & Provost, 1993). Each of the test word, e.g., 'object, would be presented six times, three times with the stressed target syllable and three times with the unstressed target syllable. The target syllables will be presented in two orders. Half of the subjects would be asked to respond to the stressed target fust and the other half would respond to the unstressed targets first. The target syllables would be presented visually on the computer screen. The subjects would be instructed to look at the target syllable, e.g., -ob, on a screen, listen to the stimuli, e.g., 'object, and press the appropriate color-coded buttons (one for a match and the other for a no-match). Both the button pressed and the response time in milliseconds would be recorded. The analysis would compare native and non- 308 native speakers of English with regard to their response time for identifying stressed versus unstressed syllables.

8.3.5 PEDAGOGYFOR TEACHING SPEECH RHYTHM

A preliminary survey on how stress and rhythm in English are introduced to non-native learners would serve two purposes. First, it may provide a partial explanation of the difficulties learners experience with English speech rhythm. Second, it would serve as a diagnostic as to where we are in the teaching of English rhythm in the EFL classroom. The survey would be conducted with non-native English teachers and students in junior and senior high schools where the most intensive English instruction takes place. The survey questions would be oriented toward (1) The student's understanding of stress, and (2) The pedagogy used to teach stress and rhythm in English, if any.

Further research could be conducted to develop innovative approaches to the teaching of English speech rhythm and to evaluate the effectiveness of these methods. For example, the teaching methods could focus on the following areas of rhythm where non-native speakers seem to show weakness.

Placement of stress

The results of the current study showed that the TM speakers produced stress on more syllables than the English speakers at the sentence level. In particular, they seem to place primary stress on all stressed syllables. One of the contributing reasons for such difficulties seems to be the inability to use stress as a means to distinguish information that is important, new, or in the foreground from information that is unimportant, old, or in the background. Practice exercises could use the same sentence to show the link between the sentence focus and the placement of stress and to highlight the contrast between strong and weak content words as well as between strong and. weak function words. Examples are given below.

309 (42) a. SHE gave it to me. b. She GAVB it to me. c. She gave it TO me. d. SHE gave it to ME.

(43) a. SHE is my boss. b. She IS my boss. c. She is MY boss. d. She is my BOSS.

Students could also be instructed to use the preceding sentences as clues to the right interpretation of the following sentences. Practice exercises could include short passages that focus on the link between the context and the location of stress. An example is given below.

(44)Yesterday was 'Sunday./1 It was a 'sunny day./1 My 'father 'went to the 'beach./1 My 'mother went to the 'mall. /I My 'brothers went to a 'ballgame. /I And my 'sisters went to the 'movies. II But I was 'studying at 'home 'all day be'cause I have an ex'am to'day.ll

Isochrony of stresses

Results of the current study showed that TM speakers experience difficulties reducing weak syllables and that the problem is more pronounced in sentences where there are long stretches of weak syllables. Practice exercises could orient the students toward the tendency toward isochrony in English by gradually increasing the number of unstressed syllables between stresses.

(45) a. Ken is here. b. Kenny is here. c. Kennedy is here. (reproduced from example (3) in Chapter 2)

(46) a. Jen is at home. b. Jenny is at home. c. Jennifer is at home.

310 Intonation

Given that pitch accents are realized on stressed syllables at the sentence level, students need to know about the various shapes of pitch accents in English. For example, non-native speakers may not be aware that stressed syllables can also be said with a low tone. Students could practice saying the underlined stressed syllable using a High or a Low pitch accents.

(47) a. I Bought the RED car. b. TIllS is the key. c. I DON'T underSTAND.

Students could also practice saying the same sentence with the pitch accent on different syllables as in (48) or with different numbers of pitch accents (see examples below).

(48) a. SHE is my SISter. (two pitch accents) b. She is my SISter. (one pitch accent)

Besides production, practice exercises could also be oriented toward the perception of stress. An example would be to ask the students to listen to a statement and choose from a list of four questions the one that best serves as the question to the statement. An example is given below.

(49) Statement: I surf in Hawai'i. (This statement is most likely an answer to which one of the following questions?) a. Where do you surf? b. Who surfs in Hawai'i? c. Do you surf in California? d. Do you work in Hawai 'i?

Further study could be conducted to evaluate the effectiveness of these approaches. For example, one could compare the speech rhythm of a control group which receives no training and an experimental group which receives the above training. A pretest would be administered to all students in the beginning and a parallel posttest would be administered at the end of the training. Comparisons could be made between the two 311 groups with regard to their speech rhythm in the pretest versus in the posttest. One could also follow up with subsequent posttests in three months, six months, or a year to see if the training has a long-lasting effect.

312 APPENDIX A: LIST OF EXPERIMENTAL SENTENCES

A.l TYPE A SENTENCES

AI. Did John write the letter with you?

No, Jim wrote it with me.

A2. This is a beautiful bag! Did Mary make it for you?

No, Jane made it/or me.

A3. I don't know what to wear to work.

Dad likes me to wear the suit.

Mom likes me to wear the dress.

You like me to wear the jeans.

A4. I don't know what to bring to Ken's birthday party.

Jane wants me to bring the cake.

John wants me to bring the beer.

You want me to bring the wine.

AS. Wow! Look at this! Did you make the lemon pie yourself?

No, my mom made the lemon pie, not me.

A6. So, did the young man give the wine to you?

No, the oldman gave it to me, not the young one. 313 A7.Who is Jane? Is she the woman wearing the red dress?

No, Jane's the one wearing the blue dress.

A.2 TYPE B SENTENCES

B1. Excuse me, Jane. May I borrow your book?

Sure, but I need it back by noon. Is that okay?

B2. The little kids are so cute.

Every time they come to our house, they play with dad and laugh.

B3. So what do you do at school?

Well, we learn to read and write.

B4. So, did you do anything special today?

Yes, I met with John at noon.

B5. Well, I didn't know what happened.

But anyway, Mom and dad were mad at Jim.

B6. You won't believe it.

John is good at baking bread.

B7. Hey, could you do me a favor?

I need a ride to work at once.

314 APPENDIX B: SEGMENTATION CRITERIA

B.1 TYPE A SENTENCES

SENTENCE Ai: JIM WROTE IT WITH ME.

1. The onset of Jim was placed at the sudden burst of energy indicative ofthe release ofthe stop portion of the affricate.

2. The boundary between Jim and wrote was placed at the end of the faint formant bands of the nasal, a visible increase of acoustic energy, as well as the lowering of formant frequencies caused by the lip rounding of the retroflex.

3. The boundary between wrote and it was placed at the onset of dark formant bands of the second voweL

4. The boundary between it and with was placed at the onset of quasi-periodic wave.

5. The boundary between with and me was placed at the end of quasi-random wave and the onset of a quasi-periodic wave.

6. The end ofme was placed at the end of the quasi-periodic wave.

SENTENCE A2: JANE MADE ITFOR ME.

1. The onset of Jane was placed at the sudden burst of energy associated with the release ofthe stop portion ofthe affricate.

2. The boundary between Jane and made was placed at a visible change of formant structure because [n] typically showing anti-formants at higher frequencies than [m]. When few spectrographic or waveform clues are available to draw the boundary, it was placed at the midpoint of the two nasals on the spectrogram. 315 3. The boundary between made and it was placed at the onset of the dark formant bands associated with the second vowel.

4. The onset of for was placed at the onset of quasi-random wave, which is usually accompanied by the onset of a high frequency noise region on the spectrogram.

5. The boundary between for and me was placed at the visible decrease of acoustic energy shown as much fainter formant bands.

6. The end of me was placed at the end of quasi-periodic wave.

SENTENCE A3: YOUUKE ME TO WEAR THE JEANS.

1. The onset ofyou was placed at the onset of quasi-periodic wave.

2. The boundary between you and like was placed at the change of formant structure because lip rounding of the final bilabial approximant has the effect of lowering

formant frequencies. The lateral liquid shows an anti-formant between F2 and F3

3. The boundary between like and me was placed at the onset quasi-periodic wave and the onset of faint formant bands typical for nasals.

4. The boundary between me and to was placed at the end of the dark formant bands of the vowel of me and at a visible decrease· of acoustic energy due to the initial consonant of the syllable to.

5. The boundary between to and wear was placed at the end of the dark formant bands of the vowel of the syllable to and with help from perception. In cases where the formant structure was not clear, the boundary was placed at the point where the second formant starts to rise due to the second vowel.

6. The boundary between wear and the was placed at the end of quasi-periodic wave, which usually corresponds to a visible decrease of waveform amplitude.

316 7. The boundary between the andjeans was placed at the end of quasi-periodic wave signaling the end of the first vowel and the onset of silence shown as a sudden decrease of amplitude on the waveform and a void region on the spectrogram.

8. The end of jeans was placed at the end of quasi-periodic wave or the onset of a high frequency noise region. The boundary was not placed at the end of the fricative because it can be difficult to determine precisely where it ends due to its extremely low intensity

SENTENCE A4: YOU WANT ME TO BRING THE WINE.

1. The onset ofyou was placed at the onset of quasi-periodic wave.

2. The boundary between you and want was placed at a visible decrease of acoustic energy shown as much fainter formant bands on the spectrogram.

3. The boundary between want and me was placed at the onset quasi-periodic wave of the nasal.

4. The boundary between me and to was placed at the end of the dark formant bands associated with the vowel ofme and a sudden decrease of acoustic energy.

5. The boundary bet\yeen to and bring was placed at the end of the quasi-periodic wave associated with the vowel of to and the onset of silence from the closure of the bilabial stop.

6. The boundary between bring and the was placed at the onset of the quasi-random wave and a high frequency noise region accompanying the initial fricative of the as well as at a visible decrease of waveform amplitude.

7. The boundary between the and wine was placed at a visible decrease of acoustic energy shown as fainter formant bands on the spectrogram.

8. The end of wine was placed at the end of quasi-periodic wave. 317 SENTENCEA5: MY MOM MADE THE LEMON PIE.

1. The onset of my was placed at the onset of faint formant bands on the spectrogram, which are typical for nasals.

2. The boundary between my and mom was placed at a visible decrease of acoustic energy shown as fainter formant bands on the spectrogram.

3. The boundary between mom and made was placed at the midpoint of the two nasals, namely, between the end of the dark formant bands of the first vowel and the onset ofthe dark formant bands of the second vowel.

4. The boundary between made and the was placed at the onset of quasi-random wave, which is usually accompanied by a high frequency noise region on the spectrogram.

5. The boundary between the and le was placed at the end of dark formant bands of the first vowel and a visible decrease of acoustic energy. The lateral liquid shows

an anti-formant between F2 and F3

6. The boundary between le- and -man was placed at a visible decrease of acoustic energy shown as the onset of fainter formant bands on the spectrogram.

7. The boundary between -man and pie was placed at the end of quasi-periodic wave and the onset of silence shown as a sudden decrease of acoustic -energy on the waveform and a void region on the spectrogram.

8. The end of pie was placed at the end of quasi-periodic wave. When it was immediately followed by the nasal onset of the subsequent word Not, boundary was placed at a visible decrease of acoustic energy shown as fainter formant bands.

318 SENTENCE A6: JANE'S THE ONE WEARING THE BLUE DRESS.

1. The onset of Jane's was placed at the burst of energy from the stop release.

2. The boundary between Jane's and the was placed at the onset of formant bands of the second vowel.

3. The boundary between the and one was placed at the visible decrease of energy shown as the onset of fainter formant bands and where formant frequencies start lowering due to the lip rounding of the initial bilabial approximant.

4. The boundary between one and wear was placed at a change of formant structure.

• The alveolar nasal has an anti-formant between F2 and F3

5. The boundary between wear and -ing was placed at the onset of formant structure

accompanying the vowel of-ing. The high front vowel shows a typical low F I and

high Fr When the spectrographic clue is not clear, boundary was placed at the

point where F I and F2 begin to depart from each other and with help from perception.

6. The boundary between -ing and the was placed at the onset of the dark formant bands accompanying the second vowel.

7. The boundary between the and blue was placed at the end of quasi-periodic wave and dark formant bands associated with the frrst vowel, which usually corresponds to a marked decrease of energy due to the closure of the following stop.

8. The boundary between blue and dress was placed at the end of the dark formant bands ofthe first vowel.

9. The end of the syllable dress was placed at the end of quasi-periodic wave and the onset of a high frequency noise region on the spectrogram. The boundary was not

319 placed at the end of the fricative because it is difficult to determine precisely where the fricative ends due to its extremely low intensity

SENTENCE A7: THE OW MAN GAVE IT TO ME.

1. The onset of the was placed at the onset of quasi-random wave and the onset of high frequency noise region on the spectrogram.

2. The boundary between the and old was placed at the visible change of vowel formant structure with help from perception.

3. The boundary between old and man was placed at the onset of faint formant bands associated with the initial nasal of man.

4. The boundary between man and gave was placed at the end of quasi-periodic wave of the nasal.

5. The boundary between gave and it was placed at the onset of the dark formant bands associated with the second vowel.

6. The boundary between it and to was placed at the rel~ase of the stop when only the second It! was released. When both stops were released, the boundary was placed at the onset of silence associated with the closure of the second It!. When neither of them were released, e.g. when the two formed a single flap, the boundary was placed at the midpoint between the end of the fIrst vowel and the onset ofthe second vowel.

7. The boundary between to and me was placed at the onset faint formant bands of the nasal.

8. The end of the syllable me was placed at the end of the quasi-periodic wave.

320 B.2 TYPE B SENTENCES

SENTENCE Bl: I NEED ITBACKBYNOON.

1. The onset of I was placed at the onset of quasi-periodic wave.

2. The boundary between I and need was placed at a visible decrease of acoustic energy shown as fainter formant bands.

3. The boundary between need and it was placed at the onset of dark formant bands of the second voweL

4. The boundary between it and back was placed at the release ofthe bilabial stop.

5. The boundary between back and by was placed at the release ofthe bilabial stop.

6. The boundary between by and noon was placed at a visible decrease of acoustic energy shown as fainter formant bands.

7. The end of noon was placed at the end of the quasi-periodic wave.

SENTENCE B2: THEYPLAYWITH DAD AND LAUGH.

1. The onset of they was placed at the onset of quasi-random wave and the onset of high frequency region on the spectrogram.

2. The boundary between they and play was placed at the end of quasi-periodic wave and the onset of silence.

3. The boundary between play and with was placed at a change of waveform and spectrographic structure. The fIrst and second formants are both low with [w] and its lip rounding has the effect of lowering formant frequencies. In addition, the transition from the fInal vowel to the semivowel is generally accompanied by a visible decrease of acoustic energy.

321 4. The boundary between with and dad was placed at the onset of silence associated with the stop closure.

5. The boundary between dad and and was placed at the onset of quasi-periodic wave and the onset offormant structure associated with the second vowel.

6. The boundary between and and laugh was placed at the end of faint formant bands of the nasal, a visible change of waveform structure, and a visible increase of waveform amplitude when the final stop of and was either unreleased or dropped. A very small number of non-native speakers inserted a vowel after the final stop. When that happened, the boundary was placed at the end of the dark formant bands of the inserted vowel. When the final stop of and was released, the boundary was placed at the onset of quasi-periodic wave.

7. The end of laugh was placed at the end of quasi-periodic wave and the onset of the high frequency noise region. The boundary was not placed at the end of the fricative because where it actually ends can be hard to determine due to its extremely low intensity.

SENTENCE B3: WE LEARN TO READ AND WRITE.

1. The onset of we was placed at the onset of quasi-periodic wave.

2. The boundary between we and learn was placed at a visible decrease of acoustic energy shown as fainter formant bands and at a change of formant structure. The

lateral liquid shows an anti-formant between F2 and F3•

3. The boundary between learn and to was placed at the release of the stop.

4. The boundary between to and read was placed at a visible decrease of acoustic energy shown as fainter formant bands. The lip rounding of the retroflex has the effect of lowering formant frequencies.

322 5. The boundary between read and and was placed at the onset of formant structure of second vowel. When and was reduced to a syllabic nasal, the boundary was placed at a visible decrease of acoustic energy shown as fainter formant bands.

6. The boundary between and and write was placed at a visible change of waveform and formant structure. The lip rounding of the initial retroflex shows lowering of all formants.

7. The end of write was placed at the release of the stop or at the end of quasi­ periodic wave if the stop was either dropped or unreleased.

SENTENCE B4: I MET WITH JOHNATNOON.

1. The onset ofI was placed at the onset of quasi-periodic wave.

2. The boundary between I and met was placed at a visible decrease of acoustic energy shown as fainter formant bands.

3. The boundary between met and with was placed at the onset of quasi-periodic wave. Also the first and second formants are both low with [w].

4. The boundary between with and John was placed at the onset of silence associated with the initial stop closure of the affricate.

5. The boundary between John and at was placed at the end of faint formant bands of the nasal and the onset of dark formant bands of the second vowel.

6. The boundary between at and noon was placed at the onset of quasi-periodic wave.

7. The end of noon was placed at the end of the quasi-periodic wave.

323 SENTENCE B5: MOMAND DAD WERE MAD ATJIM.

1. When there is no break between mom and its preceding syllable anyway as part of the lead-in sentences, the onset of mom was placed at a visible decrease of acoustic energy shown as fainter fonnant bands Otherwise, the onset of mom was otherwise placed at the onset of quasi-periodic wave and faint fonnant bands indicative ofthe initial nasal consonant.

2. The boundary between mom and and was placed at a visible increase of acoustic energy shown as darker fonnant bands indicative of the second vowel.

3. The boundary between and and dad was placed at the burst of energy from the release ofthe second stop.

4. The boundary between dad and were was placed at the onset of quasi-periodic wave or fonnant bands structure of the second syllable.

5. The boundary between were and mad was placed at a visible decrease of acoustic energy shown as fainter fonnant bands.

6. The boundary between mad and at was placed at the onset of fonnant structure and the onset of quasi-periodic wave indicative of the second vowel.

7. The boundary between at and Jim was placed at the onset of silence indicative of the stop closure portion ofthe affricate.

8. The end of Jim was placed at the end of quasi-periodic wave.

SENTENCE B6: JOHN IS GOOD ATBAKING BREAD.

1. The onset ofJohn was placed at the release of the stop portion of the affricate.

2. The boundary between John and is was placed at the onset of fonnant structure of the second vowel.

324 3. The boundary between is and good was placed at the end of the quasi-random wave or the end of a high frequency noise region on the spectrogram.

4. The boundary between good and at was placed at the onset of quasi-periodic wave or dark formant bands of the second vowel.

5. The boundary between at and bake was placed at the release ofthe bilabial stop.

6. The boundary between bake and -ing was placed at the onset of dark formant bands of the second vowel.

7. The boundary between -ing and bread was placed at the end of quasi-periodic wave and the onset of silence due to the closure of the bilabial stop.

8. The end of bread was placed at the release of the stop, or the end of quasi­ periodic wave when the final alveolar stop was either dropped or unreleased.

SENTENCE B7: I NEED A RIDE TO WORKAT ONCE.

1. The onset ofI was placed at the onset of quasi-periodic wave.

2. The boundary between I and need was placed at a visible decrease of acoustic energy shown as fainter formant bands.

3. The boundary between need and a was placed at the onset quasi-periodic wave and the onset of formant bands of the second vowel.

4. The boundary between a and ride was placed at the end of formant bands of the fIrst vowel and at a visible decrease of acoustic energy as well as at the lowering offormant frequencies from the lip rounding ofthe retroflex.

5. The boundary between ride and to was placed at the release of the second stop. (The fInal stop of ride was generally either unreleased or dropped by speakers.)

325 6. The boundary between to and work was placed at a visible decrease of energy on the waveform and with help from perception.

7. The boundary between work and at was placed at the onset of quasi-periodic wave or formant structure of the second vowel.

8. The boundary between at and once was placed at the onset of quasi-periodic wave.

9. The end of once was placed at the end of quasi-periodic wave and the onset of a high frequency noise region. The boundary was not placed at the end of the . fricative because where it ends can be difficult to determine precisely due to its extremely low intensity.

326 BIDLIOGRAPHY

Abercrombie, D. (1967). Elements ofGeneral Phonetics. Edinburgh: Edinburgh University Press.

Adams, C. and R. R. Munro. (1978). In search of the acoustic correlates of stress: Fundamental frequency, amplitude, and duration in the connected utterance of some native and non-native speakers. Phonetica, 35, 125-156.

Adams, C. (1979). English speech rhythm and the foreign learner. The Hague: Mouton.

Allen, G. D. & S. Hawkins. (1980). Phonological rhythm: Definition and development. In G. H. Yeni-Komanshian, J. F. Kavanagh, & C. A. Gerguson (Eds.), Child Phonology volume 1: Production. New York: Academic Press.

Allen, G. D. (1975). Speech rhythm: Its relation to performance universals and articulatory timing. Journal ofPhonetics, 3, 75-86.

Anderson-Hsieh, J. & H. Venkatagiri. (1994). Syllable duration and pausing in the speech of Chinese ESL speakers. TESOL Quarterly, 28(4),807-812.

Anderson-Hsieh, J., R. Johnson, & K. Koehler. (1992). The relationship between native speaker judgments ofnonnative pronunciation and deviance in segmentals, prosody, and syllable structure. Language Learning, 42(4), 529-555.

Balasubramanian, T. (1980). Timing in TamiL journal ofPhonetics, 8,449-468.

Beckman, M. (1982). Segment duration and the mora in Japanese. Phonetica, 39, 113­ 135.

Bolinger, D. L. (1958). A theory ofpitch accent in English. Word, 14,109-149.

327 Bolinger, D. L. (1965). Pitch accent and sentence rhythm. In 1. Abe & T. Kanekiyo (Eds.), Forms ofEnglish: Accent, morpheme, order. Cambridge: Harvard University Press.

Bond, Z. S. and J. Fokes. (1985). Non-native patterns ofEnglish syllable timing. Journal ofPhonetics, 13,407-420.

Chang, J. (1987). Chinese speakers. In M. Swan & B. Smith (Eds.), Learner English: A teacher's guide to inteiference and otherproblems. Cambridge: Cambridge University Press.

Chao, Y. R. (1932). A preliminary study ofEnglish intonation (with American variants) and its Chinese equivalents. Bulletin ofthe Institute ofHistory and Philology (The Tsai YUan p'ei anniversary volume supplementary vol. I), 105-156.

Chao, Y. R. (1933). Tone and intonation in Chinese. Bulletin ofthe Institute ofHistory and Philology, 4, 121-134.

Chao, Y. R. (1948). Mandarin primer. Cambridge: Harvard University Press.

Chao, Y. R. (1956). Tone, intonation, singsong, chanting, recitative, tonal composition, and atonal composition in Chinese. In For R. Jakobson, 52-59. The Hague: Mouton.

Chao, Y. R. (1968). A grammar ofspoken Chinese. Berkeley: University of California Press.

Chen, C. Y. (1984). Neutral tone in Mandarin. Journal ofChinese Linguistics 12(2), 299­ 333.

Cheng, C. C. (1973). A synchronic phonology ofMandarin Chinese. The Hague: Mouton:

Classe, A. (1939). The rhythm ofEnglish prose. Oxford: Blackwell.

328 Cutler, A. & D. Norris. (1988). The role of strong syllables in segmentation for lexical access. Journal ofExperimental Psychology: Human Perception and Performance, 14, 113-121.

Cutler, A. & S. Butterfield. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal ofMemory and Language, 31, 118-136.

Cutler, A. (1980). Syllable omission errors and isochrony. In. H.W. Dechert & M. Raupach (Eds.), Temporal variables in speech: Studies in honor ofFrieda Goldman­ Eisler. The Hague: Mouton.

Cutler, A. (1994). The perception ofrhythm in language. Cognition, 50, 79-81.

Cutler, A., J. Mehler, D. Norris, & J. Segui. (1986). The syllable's differing role in the segmentation ofFrench and English. Journal ofMemory and Language, 25, 385-400.

Dauer, R. M. (1982). Stress-timing and syllable-timing reanalyzed. Journal ofPhonetics, 11,51-62.

Delattre, P. (1966). A comparison of syllable length conditioning among languages. International Review ofApplied Linguistics, 4, 183-198.

Dreher, J. & P. C. Lee (1966). Instrumental investigation of single and paired Mandarin tonemes. Douglas Advanced Research Laboratory.

Duanmu, S. (1999). Metrical structure and tone: Evidence from Mandarin and Shanghai. Journal ofEast Asian Linguistics, 8(1), 1-38.

Duanmu, S. (2001). Stress in Chinese. In D. B. Xu (Ed.), Chinese phonology in Generative Grammar (pp. 117-138). San Diego, CA: Academic Press.

Faure, G., D. J. Hirst, & M. Chafcouloff. (1980). Rhythm in English: Isochronism, pitch, and perceived stress. In L. Waugh (Ed.), The Melody ofLanguage. Baltimore: University Park Press.

329 Fry. D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal ofthe Acoustical Society ofAmerica, 27(4), 765-768.

Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech, 1, 126­ 152.

Fujisaki, H., K. Nakamura, & T. Imoto. (1973). Auditory perception of duration of speech and non-speech stimuli. Research Institute ofLogopedics andPhoniatrics Annual Bulletin, 7, 45-64. Tokyo: University of Tokyo Press.

Gao, Y. Z. (1980). Beijing hua de qingsheng wenti. Yuyan Jiaoxue yu Yanjiu, 2, 82-98.

Gerken, L. (1996). Prosodic structure in young children's language production. Language, 72(4),683-712.

Gerken, L. A. (1994). Young children's representation ofprosodic phonology: Evidence from English speakers' weak syllable omissions. Journal ofMemory and Language, 33,19-38.

Gimson, A. (1962). An introduction to the pronunciation ofEnglish. London: Edward Arnold.

Halliday, M. (1970). A course in spoken English. London: Oxford University Press.

Hayes, B. (1995). Metrical stress theory: Principles and case studies. Chicago: University of Chicago Press.

Howie, J. M. (1976). Acoustical studies ofMandarin vowels and tones. New York: Cambridge University Press.

Hua. T. F. (1994). The interpretation ofL2 reflexive pronouns by adult Chinese learners ofEnglish. University ofHawai'i Working Papers in English as a Second Language, 13(1),53-88.

330 James, E. (1976). The acquisition of prosodic features using a speech visualizer. International Review ofApplied Linguistics in Language Teaching, 14, 227-243.

Jensen, J. T. (1993). English phonology. Amsterdam: John Benjamins.

Johansson, S. (1978). Studies oferror gravity: Native reactions to errors produced by Swedish learners ofEnglish. Goteborg, Sweden: Acta Universitas Gothoburgensis.

Juffs, A. (1990). Tone, syllable structure and interlanguage phonology: Chinese learners' stress errors. IRAL, 28(2),99-115.

Jusczyk, P. W., A. Cutler, & L. Rendanz (1993). Infants' sensitivity to predominant word stress patterns in English. Child Development, 64, 675-687.

Kager, R. (1995). The Metrical Theory ofWord Stress. In J. A. Goldsmith (Ed.), The handbook ofphonological theory (pp.367-402). Cambridge, Mass.: Blackwell.

Klatt, D. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal ofPhonetics, 3, 129-140.

Klein, H. (1978). The relationship between perceptual strategies andproductive strategies in learning the phonology ofearly lexical items. Doctoral dissertation, New York University.

Klein, H. (1981). Production strategies for the pronunciation of early polysyllabic lexical items. Journal ofSpeech and Hearing Research, 24, 389-405.

Ladd, D. R. (1996). Intonational phonology. Cambridge: Cambridge University Press.

Ladefoged, P. (1993). A course in phonetics. New York: Harcourt Brace Jovanovich.

Lea, W. A. (1980). Prosodic aids to speech recognition. In W. A. Lea. Trends in speech recognition (166-205). Englewood Cliffs, NJ: Prentice-Hall.

331 Lehiste,1. (1972). The timing of utterances and linguistic boundaries. Journal ofthe Acoustical Society ofAmerica, 51,2018-2024.

Lehiste, 1. (1975). The perception of duration within sequences of four intervals. Paper presented at the 8th international Congress of Phonetic Sciences, Leeds.

Lehiste, 1. (1977). Isochrony reconsidered. Journal ofPhonetics, 5, 253-263.

Lieberman, P. (1960). Some acoustic correlates of word stress in American English. Journal ofthe Acoustical Society ofAmerica, 32, 451-454.

Lin, H. (2001). Stress and the distribution of the neutral tone in Mandarin. In D. B. Xu (Ed.), Chinese phonology in Generative Grammar. San Diego, CA: Academic Press.

Lin, M. C., J. Z. Yan, & G. H. Sun. (1983). The stress pattern and its acoustic correlates in Beijing Mandarin. Proceedings ofthe 1rI' International Congress ofPhonetic Sciences, 504-514.

McCawley, J. (1968). The phonological component of a grammar of Japanese. The Hauge: Mouton.

Mehler, J., J. Y. Dommergues, U. Frauenfelder, & J. Segui. (1981). The syllable's role in speech segmentation. Journal ofVerbal Learning and Verbal Behavior, 20, 298-305.

Mehler, J., P. W. Jusczyk, G. Lambertz, N. Halstead, J. Bertoncini, & C. Amie1-Tison. (1988). A precursor oflanguage acquisition in young infants. Cognition, 29, 143-178.

Moon, C., R. P. Cooper, & W. Fifer. (1993). Two-day olds prefer their native language. Infant Behavior and Development, 16, 495-500.

O'Conner, J. D. (1968). The duration of the foot in relation to the number of component sound segments. Phonetics laboratory, University College, London. Progress Report, 3, 1-16.

332 Olsen, C. L. (1972). Rhythmical patterns and syllabic features of the Spanish sense group. Proceedings ofthe Seventh International Congress ofPhonetic Sciences, 990­ 995. The Hague: Mouton.

Otake, T., G. Hatano, A. Cutler, & J. Mehler. (1993). Mora or syllable? Speech segmentation in Japanese. Journal ofMemory and Language, 32, 358-378.

Pierrehumbert, J. and J. Hirshberg. (1990). The meaning of intonational contours in interpretation of discourse. In: P. Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication (pp. 271-311). Cambridge, Massachusetts: MIT Press.

Pierrehumbert, J. B. (1980). The phonology andphonetics ofEnglish intonation. Doctoral dissertation, Massachusetts Institute ofTechnology.

Pike, K. (1945). The intonation ofAmerican English. Ann Arbor: University ofMichigan Press.

Podesva, R. (2003). The effects offoot structure in syllable-timed languages: the cases of Buginese and Toba batek. Paper presented at the Austronesian Formal Linguistics Association (AFLA) 10, Honolulu.

Pointon, G. (1980). Is Spanish really syllable-timed? Journal ofPhonetics, 8, 293-305.

Shen, X. S. (1992). Mandarin neutral tones revisited. Acta Linguistica Hafniensia,24, 131-152.

Shen, X. S. (1993). Relative duration as a perceptual cue to stress in Mandarin. Language and Speech, 36(4), 415-433.

Shen, X. S. (1994). The prosody ofMandarin Chinese. Berkeley: University of California Press.

Taylor, D. S. (1981). Non-native speakers and the rhythm ofEnglish. IRAL, 19,235-244.

333 Treiman, R. and C. Danis. (1988). Syllabification ofintervocalic consonants. Journal of Memory and language, 27,87-104.

Tseng, C. Y. (1981). An acoustic phonetic study on tones in Mandarin Chinese. Doctoral dissertation, Brown University.

Uldall, E. T. (1971). Isochronous stresses in R. P. In L. Hammerich, R. Jacobson, & E. 1 Zwiner (Eds.) Form and substance. Copenhagen: Akademisk Forlag.

Wang, R. (1987). Teaching Pronunciation: Focus on rhythm. Englewood Cliff, NJ: Prentice Hall.

Wang, J. and L. J. Wang (1993). Putonghua duo yinjie ci yinjie shi chang fenbu moshi. Zhongguo yuwen, 2,112-116.

Wenk B. and F. Wioland (1982). Is French really syllable-timed? Journal ofPhonetics, 10, 193-216.

Wighman, C. W., S. Shattuck-Hufnagel, M. Ostendorf, and P. J. Price. (1992). Segmental durations in the vicinity ofprosodic phrase boundaries. Journal ofthe Acoustical Society ofAmerica, 91,1707-1717.

Wijnen, E, E. Krikhaar, and E. Den Os. (1994). The (non)realization of unstressed elements in children's utterances: Evidence for a rhythmic constraint. Journal of Child Language, 21, 59-83.

Yan, J. Z. and M. Lin (1988). Beijinghua sanzizu zhongyin de shengxue biaoixian. Fanyan, 3, 227-237.

334